WO2022257458A1

WO2022257458A1 - Vehicle insurance claim behavior recognition method, apparatus, and device, and storage medium

Info

Publication number: WO2022257458A1
Application number: PCT/CN2022/071477
Authority: WO
Inventors: 朱磊; 徐赛奕; 张霖; 俞丽娟; 朱艳乔
Original assignee: 平安科技（深圳）有限公司
Priority date: 2021-06-08
Filing date: 2022-01-12
Publication date: 2022-12-15
Also published as: CN113256434B; CN113256434A

Abstract

The present invention relates to the field of big data. Disclosed are a vehicle insurance claim behavior recognition method, apparatus, and device, and a storage medium. The method comprises: dividing historical vehicle insurance claim data into positive samples and negative samples, performing near-neighbor processing on the positive samples to obtain expanded samples, and downsampling the negative samples to obtain sub-samples; combining the sub-samples with the positive samples and the expanded samples respectively to obtain first and second data sets and inputting the first and second data sets to a behavior recognition model for recognition so as to obtain first and second behavior recognition results, on the basis of which the error rate of the behavior recognition model and the relative entropy loss of the first and second recognition results are calculated; updating the behavior recognition model according to the error rate and the relative entropy loss until the behavior recognition model converges, and then stopping; and finally, inputting vehicle insurance claim data to be recognized into the behavior recognition model to recognize a behavior category corresponding to the vehicle insurance claim data. The present invention solves the imbalance between positive and negative samples stored in vehicle insurance anti-fraud data sets, thereby improving the accuracy of recognizing abnormal vehicle insurance compensation.

Description

Auto insurance claim settlement behavior identification method, device, equipment and storage medium

This application claims the priority of the Chinese patent application with the application number 202110635315.3 and the title of the invention "method, device, equipment and storage medium for identification of auto insurance claim settlement" submitted to the China Patent Office on June 08, 2021, the entire contents of which are incorporated by reference incorporated in the application.

technical field

The present application relates to the field of big data, and in particular to a method, device, equipment and storage medium for identifying an auto insurance claim settlement behavior.

Background technique

Automobile insurance refers to a kind of commercial insurance that is liable for personal injury or property loss caused by motor vehicles due to natural disasters or accidents. Automobile insurance is produced and developed with the emergence and popularization of automobiles. With the development of society, the quality of life of the people has improved, and more and more people are buying cars. However, some lawbreakers defraud insurance companies of compensation by applying for auto insurance and forging accident scenes. Insurance incidents occur frequently, especially in the field of auto insurance. The means of defrauding insurance emerge in endlessly and in various ways, causing heavy losses to auto insurance companies at any time. Insurance fraudsters often commit crimes in gangs, and they cooperate with auto repair companies and even bribe loss assessment personnel to defraud insurance.

The inventor realizes that in the existing anti-fraud data set of auto insurance, since the behavior of auto insurance fraud is still a small number compared with the normal compensation behavior, that is, when the data set is used for machine learning to identify whether the auto insurance compensation is abnormal, the cases of fraudulent behavior and normal compensation The cases of compensation behavior are unbalanced, resulting in an imbalance in the number of positive and negative samples. In this case, if the general machine learning modeling method is used, the effect will be very poor. The model will cater to the characteristics of the data and tend to As a result, it was judged as a category with a large number of samples. That is, the existing auto insurance anti-fraud data set stores an imbalance of positive and negative samples, which leads to low accuracy of model training for auto insurance claim anomaly recognition.

Contents of the invention

The main purpose of this application is to solve the problem of the imbalance of positive and negative samples stored in the existing anti-fraud data set of auto insurance, which leads to the low accuracy of model training for abnormal identification of auto insurance claims.

The first aspect of the present application provides a method for identifying auto insurance claims, including: obtaining historical auto insurance claims data, and dividing the historical auto insurance claims data into positive samples and negative samples; performing neighbor propagation processing on the positive samples to obtain A plurality of expanded samples, and performing down-sampling processing on the negative samples to obtain a plurality of sub-samples; combining each of the sub-samples with the positive samples and the expanded samples to obtain the first data set and the second data set Data set; respectively input the first data set and the second data set into a preset behavior recognition model to identify the behavior type, and obtain the first behavior recognition result corresponding to the first data set and the corresponding The second behavior recognition result corresponding to the second data set; according to the first behavior recognition result and the behavior type corresponding to the first data set, calculate the misclassification of the first data set by the behavior recognition model rate, and calculate the relative entropy loss between the first behavior recognition result and the second behavior recognition result; update the behavior recognition model according to the misclassification rate and the relative entropy loss until the Stop when the behavior recognition model converges; acquire auto insurance claim data to be identified, input the auto insurance claim data to be identified into the behavior identification model, and identify the behavior category corresponding to the auto insurance claim data to be identified.

The second aspect of the present application provides an auto insurance claim settlement behavior recognition device, including a memory, a processor, and computer-readable instructions stored on the memory and operable on the processor, and the processor executes the computer The following steps are implemented when the instructions are readable: obtain historical auto insurance claims data, and divide the historical auto insurance claims data into positive samples and negative samples; perform neighbor propagation processing on the positive samples to obtain multiple expanded samples, and The negative sample is subjected to down-sampling processing to obtain multiple sub-samples; each of the sub-samples is combined with the positive sample and the expanded sample to obtain a first data set and a second data set; the first data set The first behavior recognition result corresponding to the first data set and the second behavior recognition result corresponding to the second data set are respectively input into the preset behavior recognition model to identify the behavior type. Two behavior recognition results; according to the first behavior recognition result and the behavior type corresponding to the first data set, calculate the misclassification rate of the behavior recognition model for the first data set, and calculate the first behavior Relative entropy loss between the recognition result and the second behavior recognition result; update the behavior recognition model according to the misclassification rate and the relative entropy loss until the behavior recognition model converges; obtain the pending The auto insurance claim data is identified, and the auto insurance claim data to be identified is input into the behavior recognition model, and the behavior category corresponding to the auto insurance claim data to be identified is identified.

The third aspect of the present application provides a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are run on the computer, the computer is made to perform the following steps: obtaining historical auto insurance claim data , and divide the historical auto insurance claims data into positive samples and negative samples; perform neighbor propagation processing on the positive samples to obtain multiple expanded samples, and perform down-sampling processing on the negative samples to obtain multiple sub-samples; Each of the sub-samples is combined with the positive sample and the expanded sample to obtain a first data set and a second data set; respectively input the first data set and the second data set to a preset Identify the behavior type in the behavior recognition model, and obtain the first behavior recognition result corresponding to the first data set and the second behavior recognition result corresponding to the second data set; according to the first behavior recognition result For the behavior type corresponding to the first data set, calculate the misclassification rate of the behavior recognition model for the first data set, and calculate the difference between the first behavior recognition result and the second behavior recognition result Relative entropy loss; update the behavior recognition model according to the misclassification rate and the relative entropy loss until the behavior recognition model converges; obtain the auto insurance claims data to be identified, and settle the auto insurance claims to be identified The data is input into the behavior recognition model to identify the behavior category corresponding to the auto insurance claims data to be recognized.

The fourth aspect of the present application provides an auto insurance claim settlement behavior recognition device, including: an expansion module for obtaining historical auto insurance claim settlement data, and dividing the historical auto insurance claim settlement data into positive samples and negative samples; Neighbor propagation processing, to obtain a plurality of extended samples, and perform down-sampling processing on the negative samples, to obtain a plurality of sub-samples; a combination module, for performing each sub-sample with the positive samples and the extended samples respectively Combining to obtain the first data set and the second data set; the training module is used to input the first data set and the second data set into the preset behavior recognition model to identify the behavior type, and obtain the same The first behavior recognition result corresponding to the first data set and the second behavior recognition result corresponding to the second data set; an update module, configured to correspond to the first behavior recognition result according to the first data set behavior type, calculate the misclassification rate of the behavior recognition model for the first data set, and calculate the relative entropy loss between the first behavior recognition result and the second behavior recognition result; according to the misclassification The ratio and the relative entropy loss update the behavior recognition model until the behavior recognition model converges; the recognition module is used to obtain the auto insurance claim data to be identified, and input the auto insurance claim data to be identified The behavior recognition model is used to identify the behavior category corresponding to the auto insurance claims data to be recognized.

In the technical solution provided by this application, the problem of unbalanced positive and negative samples is solved by expanding the positive samples with a small number and downsampling the negative samples with a large number; Combine the samples to obtain the first and second data sets, and input two identical behavior recognition models for training, correspondingly obtain the first and second distribution probabilities; then process the output results, and measure the first by the error rate The accuracy of the data set, the difference between the first data set and the second data set is measured by the relative entropy loss. When the accuracy and the difference between the two data sets meet the conditions, the behavior recognition model can be obtained, which fully weakens the difference between the samples. The identification bias brought by the balance; finally, the behavior category of the auto insurance claim to be identified is identified through the behavior identification model, and the identification result obtained is more accurate.

Description of drawings

Fig. 1 is the schematic diagram of the first embodiment of the method for identifying the behavior of auto insurance claim settlement in the embodiment of the present application;

Fig. 2 is the schematic diagram of the second embodiment of the method for identifying the behavior of auto insurance claim settlement in the embodiment of the present application;

Fig. 3 is the schematic diagram of the third embodiment of the method for identifying the behavior of auto insurance claims in the embodiment of the present application;

FIG. 4 is a schematic diagram of an embodiment of an auto insurance claim settlement behavior recognition device in the embodiment of the present application;

FIG. 5 is a schematic diagram of another embodiment of the auto insurance claim settlement behavior recognition device in the embodiment of the present application;

Fig. 6 is a schematic diagram of an embodiment of an auto insurance claim settlement behavior recognition device in the embodiment of the present application.

Detailed ways

The embodiment of the present application provides a method, device, equipment and storage medium for identifying the behavior of auto insurance claims, which divides the historical auto insurance claims data into positive samples and negative samples, processes the positive samples to obtain expanded samples, and down-samples the negative samples to obtain sub-samples; Combining the sub-samples with the positive sample and the expanded sample respectively, the first and second data sets are obtained and input into the behavior recognition model for recognition, and the first and second behavior recognition results are obtained, so as to calculate the misclassification rate and the second behavior recognition model. 1. The relative entropy loss of the second distribution probability; update the behavior recognition model according to the misclassification rate and relative entropy loss, and stop until the behavior recognition model converges; finally, input the auto insurance claims data to be identified into the behavior recognition model to identify the auto insurance to be identified Behavior category corresponding to claims data. The present application solves the problem of unbalanced storage of positive and negative samples in the anti-fraud data set of auto insurance, thereby improving the accuracy of abnormal identification of auto insurance claims.

The terms "first", "second", "third", "fourth", etc. (if any) in the specification and claims of the present application and the above drawings are used to distinguish similar objects, and not necessarily Used to describe a specific sequence or sequence. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the term "comprising" or "having" and any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to those explicitly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.

For ease of understanding, the following describes the specific process of the embodiment of the present application, please refer to Figure 1, the first embodiment of the method for identifying the behavior of auto insurance claims in the embodiment of the present application includes:

101. Obtain the historical auto insurance claims data, divide the historical auto insurance claims data into positive samples and negative samples, perform neighbor propagation processing on the positive samples, obtain multiple expanded samples, and perform down-sampling processing on the negative samples to obtain multiple sub-samples;

It can be understood that the executor of the present application may be an auto insurance claim settlement behavior recognition device, and may also be a terminal or a server, which is not specifically limited here. The embodiment of the present application is described by taking the server as an execution subject as an example.

In this embodiment, historical auto insurance claims refer to the records of auto insurance claims in past enterprises, including normal auto insurance claims records and abnormal (such as fraudulent insurance behavior) auto insurance claims records, which may specifically include historical claim records, accident scene records, Historical auto insurance claim information such as maintenance records, policy records, policyholder information, and marks of normal or abnormal auto insurance claims.

Among them, for the above-mentioned historical auto insurance claim information, first describe each underlying factor and describe it in terms of vocabulary. Then process these underlying factors, such as LBS (Location Based Services, location-based service) factor processing, WIFI factor expansion, and other fraud-related factor processing. Then, further feature cleaning is performed on the processed underlying factors, and the underlying factors that meet the requirements of data saturation and model correlation are screened. Finally, the screened underlying factors are encoded through feature engineering to obtain the numerical underlying factors, which is the final expression form of the first data set.

Furthermore, LBS factor processing refers to processing the customer’s life trajectory in the latest period based on the customer’s latitude and longitude and POI (Point of Interest); WIFI factor expansion refers to the user’s WIFI link information and historical blacklist records Process the correlation between the current customer and the fraudulent insurance blacklist; processing of other fraud-related factors can include the Euclidean distance between the place of accident and the repair shop, whether the driver of the accident and the insured are the same person, etc. And feature engineering can encode the underlying factors through data feature normalization, data binning, and discrete feature numericalization.

In this embodiment, in the historical auto insurance claim data, the historical auto insurance claim data of normal auto insurance claim records is used as negative samples, and the historical auto insurance claim data of abnormal auto insurance claim records is used as positive samples, wherein the number of positive samples is relative to The number of negative samples is small, and the result of direct model training tends to be normal. Therefore, the positive samples are expanded here, and the negative samples are down-sampled. On the one hand, the number of positive samples is increased, and on the other hand, the number of negative samples is reduced. The ratio of the two is reduced, and the influence of the tendency result caused by the amount of training data identified by the model is reduced.

102. Combine each sub-sample with a positive sample and an expanded sample to obtain a first data set and a second data set;

In this embodiment, the behavior recognition model training for auto insurance claims consists of stacking two identical training models. The first model is used to train the first data set, which is used to predict the behavior type of auto insurance claims in each sample; the second model is used Train the second data set to compare the prediction results of the first model and iterate on the first model.

Among them, when expanding the positive samples, the K-nearest neighbors of each positive sample can be calculated by the K-Nearest Neighbor (KNN) classification algorithm, and the positive samples that can be used to synthesize new samples, such as those close to the classification boundary, can be screened. Positive samples, the number of screening can be set according to the ratio of positive samples to negative samples, and then construct new positive samples with the original positive samples.

103. Input the first data set and the second data set into the preset behavior recognition model to identify the behavior type, and obtain the first behavior recognition result corresponding to the first data set and the first behavior recognition result corresponding to the second data set. Two behavior recognition results;

In this embodiment, two identical behavior recognition models are used to train the first data set and the second data set respectively, and respectively identify the recognition distribution probability of the behavior type of each sample in the first data set, and the second data set Collect the identification distribution probabilities of the behavior types of each sample, and then perform linear fitting on the identification distribution probabilities corresponding to the two data sets respectively, so as to obtain the first distribution probability and the second distribution probability.

Specifically, the behavior recognition model can be trained using the Random Forest (Random Forest) algorithm. Each sample in the first data set or the second data set is trained by a learner to obtain multiple decision trees, that is, one sample corresponds to one Decision tree, the nodes in the decision tree correspond to the characteristic attributes of the sample, according to the decision tree, calculate the recognition probability of the behavior type corresponding to each characteristic attribute in each sample, and fit the recognition probability corresponding to each characteristic attribute, that is The identification distribution probability of the behavior type of each sample can be obtained.

104. According to the first behavior recognition result and the behavior type corresponding to the first data set, calculate the misclassification rate of the behavior recognition model for the first data set, and calculate the relative entropy between the first behavior recognition result and the second behavior recognition result Loss, and update the behavior recognition model according to the misclassification rate and relative entropy loss until the behavior recognition model converges;

In this embodiment, the misclassification rate of each sample in the first data set is calculated through the first distribution probability, where the misclassification rate refers to the number of correct behavior types and the total number of predicted results predicted by the same sample in different learners The ratio. Among them, the behavior type with the most identical prediction results in the sample is regarded as the correctly predicted behavior type, and the other behavior types are regarded as the wrongly predicted behavior types.

Specifically, the behavior types of the sample corresponding to auto insurance claims here may only include normal and abnormal, and the calculation method of the misclassification rate here is: the number of abnormal characteristic attributes/the total number of characteristic attributes.

105. Obtain the auto insurance claim data to be identified, input the auto insurance claim data to be identified into the behavior recognition model, and identify the behavior category corresponding to the auto insurance claim data to be identified.

In this embodiment, after the behavior recognition model is trained, the second data set of auto insurance claims to be identified is directly input into the behavior recognition model, and the behavior type of the auto insurance claims to be identified can be directly output, and the behavior types may only include normal Or abnormal.

In the embodiment of the present application, the problem of unbalanced positive and negative samples is solved by expanding the positive samples with a small number and downsampling the negative samples with a large number; Combine to obtain the first and second data sets, and input two identical behavior recognition models for training respectively, correspondingly obtain the first and second distribution probabilities; then process the output results, and measure the first data set by the misclassification rate The accuracy of the first data set and the second data set are measured by the relative entropy loss. When the accuracy and the difference between the two data sets meet the conditions, the behavior recognition model can be obtained, which fully weakens the sample imbalance band. In the end, the behavior recognition model is used to identify the behavior category of auto insurance claims to be recognized, and the recognition results obtained are more accurate.

Please refer to Fig. 2, the second embodiment of the method for identifying the behavior of auto insurance claims in the embodiment of the present application includes:

201. Obtain historical auto insurance claims data, divide historical auto insurance claims data into positive samples and negative samples, and perform down-sampling processing on negative samples to obtain multiple sub-samples;

202. Calculate the Euclidean distance between each two positive samples in turn, and determine the nearest neighbor sample of each positive sample according to the Euclidean distance;

203. Randomly screen a preset number of neighboring samples for linear interpolation processing, and construct extended samples according to the processing results;

In this embodiment, when expanding positive samples, K-nearest neighbors (K-Nearest Neighbor, KNN) classification algorithm can be used to calculate the K-nearest neighbors of each positive sample, and filter positive samples that can be used to synthesize new samples, such as close to For the positive samples of the classification boundary, the number of screening can be set according to the ratio of positive samples and negative samples, and then construct new positive samples with the original positive samples.

Specifically, when downsampling the negative samples, the negative samples can be randomly sampled with replacement according to the preset sampling multiple, and the negative samples can be down-sampled to a state that is relatively balanced in proportion to the number of positive samples. Among them, the sampling multiple It can be set according to the ratio of positive samples and negative samples.

204. Combine each sub-sample with a positive sample and an expanded sample to obtain a first data set and a second data set;

205. Input the data sets into a preset behavior recognition model, wherein the behavior recognition model includes an input layer and a decision-making layer, and the data sets include a first data set and a second data set;

206. Randomly sample the data set through the input layer to obtain multiple feature subsets;

207. Input each feature subset into different learners in the decision-making layer, and use the learner to identify each feature subset, and output the recognition results of each learner for the corresponding feature subset;

In this embodiment, the data balance of positive samples and negative samples is maintained through the data set, and then through further data segmentation of the data set, multiple feature subsets are obtained. At this time, the data of positive samples and negative samples in each feature subset Like the positive and negative samples in the data set, the relative balance of the data can be maintained.

In addition, each learner outputs the recognition probability of the behavior type of each characteristic attribute in each sample, and performs fitting to obtain the recognition distribution probability, and then further fits the distribution probability of the behavior type recognition of the data set.

Further, the training process of the feature subset through the learner specifically includes the following steps:

(1) Use the current learner to select a sub-sample, extended sample or positive sample from the feature subset to construct a sample node, and select m features from the selected sub-sample, expanded sample or positive sample according to the preset feature selection parameters Attributes;

(2) Select a feature attribute from the selected multiple sample attributes by the learner to construct a feature node under the sample node;

(3) Re-select m feature attributes from the selected sub-samples, expanded samples or positive samples through the learner, and construct the lower-level feature nodes of the feature nodes, stop building feature nodes until the number of feature nodes is m, and obtain the corresponding decision tree;

(4) Use the next learner to re-screen an unselected sub-sample, extended sample or positive sample from the feature subset to construct a decision tree until each sub-sample, extended sample and/or positive sample in the feature subset is constructed. The decision tree of the sample stops when;

(5) According to the plurality of decision trees corresponding to the feature subsets, the recognition distribution probability of the behavior type corresponding to each learner is calculated by the learner and output.

In this embodiment, when using the random forest algorithm for model training, assuming that there are A samples (the sum of sub-samples, expanded samples and positive samples) in the feature subset, then randomly select one from each sample of A with replacement. Sample, as a sample at the root node of the decision tree, that is, to construct a sample node; if each sample contains K attributes, when each sample node of the decision tree needs branch extension, randomly select k from the K attributes feature attributes, where k≤K, use a preset strategy such as information gain from the k feature attributes, select one feature attribute as the branch attribute of the node, that is, construct the feature node; repeat the above steps until the feature node Stop extending when the number is m; the combination of decision trees obtained by multiple learners can form the entire random forest.

208. Determine the recognition result of the behavior recognition model for the data set according to the recognition results output by each learner, wherein the recognition result of the behavior recognition model for the data set includes a first behavior recognition result and a second behavior recognition result;

209. According to the first behavior recognition result and the behavior type corresponding to the first data set, calculate the misclassification rate of the behavior recognition model for the first data set, and calculate the relative entropy between the first behavior recognition result and the second behavior recognition result Loss, and update the behavior recognition model according to the misclassification rate and relative entropy loss until the behavior recognition model converges;

210. Obtain the auto insurance claim data to be identified, input the auto insurance claim data to be identified into the behavior recognition model, and identify the behavior category corresponding to the auto insurance claim data to be identified.

In the embodiment of the present application, through the method of sampling neighboring samples and linear interpolation processing, a small number of positive samples are expanded to obtain expanded samples, and a large number of negative samples are down-sampled to solve the problem of imbalance between positive and negative samples problem, increase the accuracy of model training; and measure the output accuracy of the behavior recognition model through the misclassification rate and loss value to ensure that the model can fully consider the characteristics of the positive sample when the sample is unbalanced, so that the output result is more accurate .

Please refer to Fig. 3, the third embodiment of the method for identifying the behavior of auto insurance claims in the embodiment of the present application includes:

301. Obtain historical auto insurance claim data, divide historical auto insurance claim data into positive samples and negative samples, and perform neighbor propagation processing on positive samples to obtain multiple expanded samples, and perform down-sampling processing on negative samples to obtain multiple sub-samples;

302. Combine each sub-sample with a positive sample and an expanded sample to obtain a first data set and a second data set;

303. Input the first data set and the second data set into the preset behavior recognition model to identify the behavior type, and obtain the first behavior recognition result corresponding to the first data set and the first behavior recognition result corresponding to the second data set. Two behavior recognition results;

304. Make statistics on the behavior types in the first behavior recognition results, obtain the first distribution probability of the behavior types in the first data set, and determine the first distribution probability of the behavior recognition model according to the first distribution probability and the behavior type corresponding to the first data set. The number of misclassified samples in the dataset;

305. Calculate the ratio between the number of misclassified samples and the total number of samples in the first data set, and use the ratio as the misclassification rate of the behavior recognition model for the first data set, and calculate the first behavior recognition result and the second behavior recognition result The relative entropy loss between;

In this embodiment, if the behavior types of auto insurance compensation include normal and abnormal, sample 1 contains A feature attributes, where a1 is the number of feature attributes predicted to be normal, and a2 is the number of feature attributes predicted to be abnormal, and a1>a2, then The classification result of this sample is normal, and the misclassification rate is: a2/A.

306. Calculate the cross-entropy loss between the misclassification rate and the relative entropy loss, and judge whether the cross-entropy loss and the misclassification rate meet the preset loss conditions;

307. If not satisfied, adjust the feature selection parameters in the behavior recognition model according to the cross-entropy loss and the misclassification rate;

308. Update the behavior recognition model according to the adjusted feature selection parameters, and stop until the behavior recognition model converges;

In this embodiment, the relative entropy loss of the first distribution probability and the second distribution probability is used to measure the degree of differentiation of the prediction results of the two models, that is, when the positive sample and the expanded sample after the positive sample are used for model training, the two The degree of differentiation of the prediction results of the authors is used to iteratively update the model. Specifically, the relative entropy loss calculation formula of the first distribution probability and the second distribution probability is as follows:

Among them, R(P∥Q) is the relative entropy loss, λ is the balance coefficient of positive sample expansion, p(x1) is each probability value in the first distribution probability, q(x1) is each probability in the second distribution probability value.

The accuracy of the first data set is measured by the misclassification rate, and the difference between the two models is measured by the cross-entropy loss. When the accuracy and the difference between the two models meet the conditions, it can be judged that the behavior recognition model is converged. Specifically, it can be passed Set the misclassification rate threshold and cross-entropy loss threshold to determine whether the misclassification rate and cross-entropy loss meet the loss conditions. Among them, you can first judge whether the misclassification rate meets the loss conditions. If not, the accuracy of the first data set Insufficient, there is no need for subsequent cross-entropy loss discrimination.

309. Obtain the auto insurance claim data to be identified, input the auto insurance claim data to be identified into the behavior recognition model, and identify the behavior category corresponding to the auto insurance claim data to be identified.

In the embodiment of this application, multiple learners in the training model are used to learn the decision tree in the data set to identify the category probability of the auto insurance claim settlement behavior, reduce the result bias caused by sample imbalance, and correct the bias problem of the model output.

The method for identifying the auto insurance claim settlement behavior in the embodiment of the present application has been described above. The following describes the identification device for the auto insurance claim settlement behavior in the embodiment of the present application. Please refer to FIG. 4. An embodiment of the auto insurance claim settlement behavior identification device in the embodiment of the application includes:

The expansion module 401 is used to obtain historical auto insurance claims data, and divide the historical auto insurance claims data into positive samples and negative samples; perform neighbor propagation processing on the positive samples to obtain a plurality of expanded samples, and perform the processing on the negative samples Perform down-sampling processing to obtain multiple sub-samples;

A combination module 402, configured to combine each of the sub-samples with the positive sample and the expanded sample to obtain a first data set and a second data set;

A training module 403, configured to respectively input the first data set and the second data set into a preset behavior recognition model for behavior type recognition, and obtain a first behavior recognition corresponding to the first data set a result and a second behavior recognition result corresponding to the second data set;

An update module 404, configured to calculate the misclassification rate of the first data set by the behavior recognition model according to the first behavior recognition result and the behavior type corresponding to the first data set, and calculate the first A relative entropy loss between the behavior recognition result and the second behavior recognition result; updating the behavior recognition model according to the misclassification rate and the relative entropy loss until the behavior recognition model converges;

The identification module 405 is configured to acquire the auto insurance claim data to be identified, input the auto insurance claim data to be identified into the behavior identification model, and identify the behavior category corresponding to the auto insurance claim data to be identified.

Please refer to Figure 5, another embodiment of the auto insurance claim settlement behavior recognition device in the embodiment of the present application includes:

Specifically, the expansion module 401 includes:

The distance calculation unit 4011 is used to sequentially calculate the Euclidean distance between every two positive samples, and determine the neighbor samples of each positive sample according to the Euclidean distance;

The interpolation processing unit 4012 is configured to randomly select a preset number of neighboring samples for linear interpolation processing, and construct extended samples according to the processing results.

Specifically, the training module 403 includes:

The input unit 4031 is used to input the data sets into the preset behavior recognition model, wherein the behavior recognition model includes an input layer and a decision layer, and the data set includes the first data set and the second data set. data set;

The training unit 4032 is configured to perform random sampling processing on the data set through the input layer to obtain multiple feature subsets; input each of the feature subsets into different learners in the decision-making layer, and pass the The learner identifies each of the feature subsets, and outputs the recognition result of each of the learners for the corresponding feature subset;

The output unit 4033 is configured to determine the recognition result of the behavior recognition model for the data set according to the recognition results output by each of the learners, wherein the recognition result of the behavior recognition model for the data set includes the first Behavior recognition results and second behavior recognition results.

Specifically, the update module 404 includes:

A statistics unit 4041, configured to perform statistics on the behavior types in the first behavior recognition result, to obtain a first distribution probability of behavior types in the first data set;

A ratio calculation unit 4042, configured to determine the number of misclassified samples of the first data set by the behavior recognition model according to the first distribution probability and the behavior type corresponding to the first data set; calculate the misclassification The ratio between the number of samples and the total number of samples in the first data set, and use the ratio as the misclassification rate of the behavior recognition model for the first data set.

Specifically, the training unit is also used for:

Selecting a feature sample from the feature subset by the current learner to construct a sample node, and selecting m feature attributes from the selected feature samples according to preset feature selection parameters;

A feature attribute is randomly selected from the selected m feature attributes by the learner to construct a child node under the sample node;

Re-selecting m feature attributes from the selected feature samples by the learner, and constructing lower-level child nodes under the child nodes, stopping until the number of the child nodes is m, and obtaining a corresponding decision tree;

Rescreening an unselected feature sample from the feature subset by the next learner to construct a decision tree until the decision tree of each feature sample in the feature subset is obtained;

Using each of the decision trees to identify the behavior type of the corresponding feature sample in the feature subset, to obtain the identification result of the feature subset.

Specifically, the updating module 404 also includes:

A loss calculation unit 4043, configured to calculate a cross-entropy loss between the misclassification rate and the relative entropy loss, and determine whether the cross-entropy loss and the misclassification rate meet a preset loss condition;

The adjustment unit 4044 is used to adjust the feature selection parameters in the behavior recognition model according to the cross-entropy loss and the misclassification rate if not satisfied;

The determining unit 4045 is configured to update the behavior recognition model according to the adjusted feature selection parameters until the behavior recognition model converges.

In the embodiment of the present application, through the method of sampling neighboring samples and linear interpolation processing, a small number of positive samples are expanded to obtain expanded samples, and a large number of negative samples are down-sampled to solve the problem of imbalance between positive and negative samples problem, increase the accuracy of model training; and measure the output accuracy of the behavior recognition model through the misclassification rate and loss value to ensure that the model can fully consider the characteristics of the positive sample when the sample is unbalanced, so that the output result is more accurate ; In addition, multiple learners in the training model are used to learn the decision tree in the data set to identify the category probability of auto insurance claims, reduce the result bias caused by sample imbalance, and correct the bias problem of the model output.

Figures 4 and 5 above describe the auto insurance claim settlement behavior recognition device in the embodiment of the present application in detail from the perspective of modular functional entities. The following describes the auto insurance claim settlement behavior recognition device in the embodiment of the present application in detail from the perspective of hardware processing.

Fig. 6 is a schematic structural diagram of an auto insurance claim settlement behavior recognition device provided by an embodiment of the present application. The auto insurance claim settlement behavior recognition device 600 may have relatively large differences due to different configurations or performances, and may include one or more than one processor (central processing units (CPU) 610 (for example, one or more processors) and memory 620, one or more storage media 630 for storing application programs 633 or data 632 (for example, one or more mass storage devices). Wherein, the memory 620 and the storage medium 630 may be temporary storage or persistent storage. The program stored in the storage medium 630 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the auto insurance claim settlement behavior recognition device 600 . Furthermore, the processor 610 may be configured to communicate with the storage medium 630 , and execute a series of instruction operations in the storage medium 630 on the auto insurance claim settlement behavior recognition device 600 .

The auto insurance claim settlement behavior recognition device 600 can also include one or more power supplies 640, one or more wired or wireless network interfaces 650, one or more input and output interfaces 660, and/or, one or more operating systems 631, such as Windows Serve, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art can understand that the structure of the auto insurance claim settlement behavior recognition device shown in Figure 6 does not constitute a limitation on the auto insurance claim settlement behavior recognition device, and may include more or less components than those shown in the illustration, or combine certain components, or Different component arrangements.

The present application also provides an auto insurance claim settlement behavior recognition device. The computer device includes a memory and a processor, and computer readable instructions are stored in the memory. When the computer readable instructions are executed by the processor, the processor executes the above-mentioned embodiments The steps of the method for identifying the auto insurance claim settlement behavior.

The present application also provides a computer-readable storage medium. The computer-readable storage medium may be a non-volatile computer-readable storage medium. The computer-readable storage medium may also be a volatile computer-readable storage medium. Instructions are stored in the computer-readable storage medium, and when the instructions are run on the computer, the computer is made to execute the steps of the method for identifying the auto insurance claim settlement behavior.

Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, and are not intended to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still understand the foregoing The technical solutions described in each embodiment are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the application.

Claims

A method for identifying an auto insurance claim settlement behavior, wherein the auto insurance claim settlement behavior identification method includes:

Obtain historical auto insurance claim data, and divide the historical auto insurance claim data into positive samples and negative samples;

performing neighbor propagation processing on the positive samples to obtain multiple expanded samples, and performing downsampling processing on the negative samples to obtain multiple sub-samples;

combining each of the sub-samples with the positive sample and the expanded sample to obtain a first data set and a second data set;

Input the first data set and the second data set into the preset behavior recognition model to identify the behavior type, and obtain the first behavior recognition result corresponding to the first data set and the first behavior recognition result corresponding to the second data set. The second behavior recognition result corresponding to the second data set;

According to the first behavior recognition result and the behavior type corresponding to the first data set, calculate the misclassification rate of the behavior recognition model for the first data set, and calculate the first behavior recognition result and the The relative entropy loss between the recognition results of the second behavior;

Updating the behavior recognition model according to the misclassification rate and the relative entropy loss until the behavior recognition model converges;

Acquiring auto insurance claim data to be identified, inputting the auto insurance claim data to be identified into the behavior recognition model, and identifying the behavior category corresponding to the auto insurance claim data to be identified.
The auto insurance claim settlement behavior identification method according to claim 1, wherein said performing neighbor propagation processing on said positive sample to obtain a plurality of expanded samples includes:

Calculate the Euclidean distance between each two positive samples in turn, and determine the nearest neighbor sample of each positive sample according to the Euclidean distance;

Randomly screen a preset number of neighboring samples for linear interpolation processing, and construct expanded samples according to the processing results.
The auto insurance claim settlement behavior recognition method according to claim 1, wherein the first data set and the second data set are respectively input into a preset behavior recognition model for behavior type recognition, and the obtained behavior is consistent with the The first behavior recognition result corresponding to the first data set and the second behavior recognition result corresponding to the second data set include:

Input the data sets into the preset behavior recognition model, wherein the behavior recognition model includes an input layer and a decision layer, and the data sets include the first data set and the second data set;

performing random sampling processing on the data set through the input layer to obtain multiple feature subsets;

Input each of the feature subsets into different learners in the decision-making layer, and identify each of the feature subsets by the learner, and output the recognition results of each of the learners for the corresponding feature subsets;

According to the recognition results output by each of the learners, the recognition result of the behavior recognition model for the data set is determined, wherein the recognition result of the behavior recognition model for the data set includes the first behavior recognition result and the second behavior recognition result. behavior recognition results.
The auto insurance claim settlement behavior recognition method according to claim 1, wherein, according to the first behavior recognition result and the behavior type corresponding to the first data set, the calculation of the behavior recognition model for the first data set Misclassification rates include:

performing statistics on the behavior types in the first behavior recognition result to obtain a first distribution probability of the behavior types in the first data set;

According to the first distribution probability and the behavior type corresponding to the first data set, determine the number of misclassified samples of the first data set by the behavior recognition model;

Calculate the ratio between the number of misclassified samples and the total number of samples in the first data set, and use the ratio as the misclassification rate of the behavior recognition model for the first data set.
The auto insurance claim settlement behavior identification method according to claim 3, wherein, each described feature subset is identified by the learner, and the output of each described learner to the identification result of the corresponding feature subset includes:

Selecting a feature sample from the feature subset by the current learner to construct a sample node, and selecting m feature attributes from the selected feature samples according to preset feature selection parameters;

A feature attribute is randomly selected from the selected m feature attributes by the learner to construct a child node under the sample node;

Re-selecting m feature attributes from the selected feature samples by the learner, and constructing lower-level child nodes under the child nodes, stopping until the number of the child nodes is m, and obtaining a corresponding decision tree;

Rescreening an unselected feature sample from the feature subset by the next learner to construct a decision tree until the decision tree of each feature sample in the feature subset is obtained;

Using each of the decision trees to identify the behavior type of the corresponding feature sample in the feature subset, to obtain the identification result of the feature subset.
The auto insurance claim settlement behavior recognition method according to any one of claims 1-5, wherein the behavior recognition model is updated according to the misclassification rate and the relative entropy loss until the behavior recognition model Stopping on Convergence includes:

calculating a cross-entropy loss between the misclassification rate and the relative entropy loss, and judging whether the cross-entropy loss and the misclassification rate meet a preset loss condition;

If not satisfied, then adjust the feature selection parameters in the behavior recognition model according to the cross-entropy loss and the misclassification rate;

According to the adjusted feature selection parameters, the behavior recognition model is updated until the behavior recognition model converges.

The identification module is used to obtain the auto insurance claim data to be identified, input the auto insurance claim data to be identified into the behavior identification model, and identify the behavior category corresponding to the auto insurance claim data to be identified.
An auto insurance claim settlement behavior recognition device, wherein the auto insurance claim settlement behavior recognition device includes: a memory and at least one processor, instructions are stored in the memory;

The at least one processor invokes the instructions in the memory, so that the auto insurance claim settlement behavior recognition device executes the auto insurance claim settlement behavior recognition method as follows:

Obtain historical auto insurance claim data, and divide the historical auto insurance claim data into positive samples and negative samples;

performing neighbor propagation processing on the positive samples to obtain multiple expanded samples, and performing downsampling processing on the negative samples to obtain multiple sub-samples;

combining each of the sub-samples with the positive sample and the expanded sample to obtain a first data set and a second data set;

Input the first data set and the second data set into the preset behavior recognition model to identify the behavior type, and obtain the first behavior recognition result corresponding to the first data set and the first behavior recognition result corresponding to the second data set. The second behavior recognition result corresponding to the second data set;

According to the first behavior recognition result and the behavior type corresponding to the first data set, calculate the misclassification rate of the behavior recognition model for the first data set, and calculate the first behavior recognition result and the The relative entropy loss between the recognition results of the second behavior;

Updating the behavior recognition model according to the misclassification rate and the relative entropy loss until the behavior recognition model converges;

Acquiring auto insurance claim data to be identified, inputting the auto insurance claim data to be identified into the behavior recognition model, and identifying the behavior category corresponding to the auto insurance claim data to be identified.
The auto insurance claim settlement behavior identification device according to claim 7, wherein said performing neighbor propagation processing on said positive sample to obtain a plurality of expanded samples includes:

Calculate the Euclidean distance between each two positive samples in turn, and determine the nearest neighbor sample of each positive sample according to the Euclidean distance;

Randomly screen a preset number of neighboring samples for linear interpolation processing, and construct expanded samples according to the processing results.
The auto insurance claim settlement behavior recognition device according to claim 7, wherein, said inputting said first data set and said second data set into a preset behavior recognition model to identify the behavior type, and obtain the same The first behavior recognition result corresponding to the first data set and the second behavior recognition result corresponding to the second data set include:

Input the data sets into the preset behavior recognition model, wherein the behavior recognition model includes an input layer and a decision layer, and the data sets include the first data set and the second data set;

performing random sampling processing on the data set through the input layer to obtain multiple feature subsets;

Input each of the feature subsets into different learners in the decision-making layer, and identify each of the feature subsets by the learner, and output the recognition results of each of the learners for the corresponding feature subsets;

According to the recognition results output by each of the learners, the recognition result of the behavior recognition model for the data set is determined, wherein the recognition result of the behavior recognition model for the data set includes the first behavior recognition result and the second behavior recognition result. behavior recognition results.
The auto insurance claim settlement behavior recognition device according to claim 7, wherein, according to the first behavior recognition result and the behavior type corresponding to the first data set, the calculation of the behavior recognition model for the first data set Misclassification rates include:

performing statistics on the behavior types in the first behavior recognition result to obtain a first distribution probability of the behavior types in the first data set;

According to the first distribution probability and the behavior type corresponding to the first data set, determine the number of misclassified samples of the first data set by the behavior recognition model;

Calculate the ratio between the number of misclassified samples and the total number of samples in the first data set, and use the ratio as the misclassification rate of the behavior recognition model for the first data set.
The auto insurance claim settlement behavior recognition device according to claim 10, wherein said learner is used to identify each of said feature subsets, and outputting the recognition results of each of said learners for corresponding feature subsets includes:

Selecting a feature sample from the feature subset by the current learner to construct a sample node, and selecting m feature attributes from the selected feature samples according to preset feature selection parameters;

A feature attribute is randomly selected from the selected m feature attributes by the learner to construct a child node under the sample node;

Re-selecting m feature attributes from the selected feature samples by the learner, and constructing lower-level child nodes under the child nodes, stopping until the number of the child nodes is m, and obtaining a corresponding decision tree;

Rescreening an unselected feature sample from the feature subset by the next learner to construct a decision tree until the decision tree of each feature sample in the feature subset is obtained;

Using each of the decision trees to identify the behavior type of the corresponding feature sample in the feature subset, to obtain the identification result of the feature subset.
The auto insurance claim settlement behavior recognition device according to any one of claims 7-11, wherein the behavior recognition model is updated according to the misclassification rate and the relative entropy loss until the behavior recognition model Stopping on Convergence includes:

calculating a cross-entropy loss between the misclassification rate and the relative entropy loss, and judging whether the cross-entropy loss and the misclassification rate meet a preset loss condition;

If not satisfied, then adjust the feature selection parameters in the behavior recognition model according to the cross-entropy loss and the misclassification rate;

According to the adjusted feature selection parameters, the behavior recognition model is updated until the behavior recognition model converges and stops.
A computer-readable storage medium, where instructions are stored on the computer-readable storage medium, wherein, when the instructions are executed by a processor, the following method for identifying behavior of auto insurance claim settlement is implemented:

Obtain historical auto insurance claim data, and divide the historical auto insurance claim data into positive samples and negative samples;

performing neighbor propagation processing on the positive samples to obtain multiple expanded samples, and performing downsampling processing on the negative samples to obtain multiple sub-samples;

combining each of the sub-samples with the positive sample and the expanded sample to obtain a first data set and a second data set;

Input the first data set and the second data set into the preset behavior recognition model to identify the behavior type, and obtain the first behavior recognition result corresponding to the first data set and the first behavior recognition result corresponding to the second data set. The second behavior recognition result corresponding to the second data set;

According to the first behavior recognition result and the behavior type corresponding to the first data set, calculate the misclassification rate of the behavior recognition model for the first data set, and calculate the first behavior recognition result and the The relative entropy loss between the recognition results of the second behavior;

Updating the behavior recognition model according to the misclassification rate and the relative entropy loss until the behavior recognition model converges;

Acquiring auto insurance claim data to be identified, inputting the auto insurance claim data to be identified into the behavior recognition model, and identifying the behavior category corresponding to the auto insurance claim data to be identified.
The computer-readable storage medium according to claim 13, wherein said performing neighbor propagation processing on said positive samples to obtain a plurality of extended samples comprises:

Calculate the Euclidean distance between each two positive samples in turn, and determine the nearest neighbor sample of each positive sample according to the Euclidean distance;

Randomly screen a preset number of neighboring samples for linear interpolation processing, and construct expanded samples according to the processing results.
The computer-readable storage medium according to claim 13, wherein the first data set and the second data set are respectively input into a preset behavior recognition model to identify behavior types, and the obtained The first behavior recognition result corresponding to the first data set and the second behavior recognition result corresponding to the second data set include:

Input the data sets into the preset behavior recognition model, wherein the behavior recognition model includes an input layer and a decision layer, and the data sets include the first data set and the second data set;

performing random sampling processing on the data set through the input layer to obtain multiple feature subsets;

Input each of the feature subsets into different learners in the decision-making layer, and identify each of the feature subsets by the learner, and output the recognition results of each of the learners for the corresponding feature subsets;

According to the recognition results output by each of the learners, the recognition result of the behavior recognition model for the data set is determined, wherein the recognition result of the behavior recognition model for the data set includes the first behavior recognition result and the second behavior recognition result. behavior recognition results.
The computer-readable storage medium according to claim 13, wherein, according to the first behavior recognition result and the behavior type corresponding to the first data set, the calculation of the behavior recognition model for the first data set Misclassification rates include:

performing statistics on the behavior types in the first behavior recognition result to obtain a first distribution probability of the behavior types in the first data set;

According to the first distribution probability and the behavior type corresponding to the first data set, determine the number of misclassified samples of the first data set by the behavior recognition model;

Calculate the ratio between the number of misclassified samples and the total number of samples in the first data set, and use the ratio as the misclassification rate of the behavior recognition model for the first data set.
The computer-readable storage medium according to claim 16, wherein the identifying each of the feature subsets by the learner, and outputting the recognition result of each of the learners for the corresponding feature subset includes:

Selecting a feature sample from the feature subset by the current learner to construct a sample node, and selecting m feature attributes from the selected feature samples according to preset feature selection parameters;

A feature attribute is randomly selected from the selected m feature attributes by the learner to construct a child node under the sample node;

Re-selecting m feature attributes from the selected feature samples by the learner, and constructing lower-level child nodes under the child nodes, stopping until the number of the child nodes is m, and obtaining a corresponding decision tree;

Rescreening an unselected feature sample from the feature subset by the next learner to construct a decision tree until the decision tree of each feature sample in the feature subset is obtained;

Using each of the decision trees to identify the behavior type of the corresponding feature sample in the feature subset, to obtain the identification result of the feature subset.
The computer-readable storage medium according to any one of claims 13-17, wherein the behavior recognition model is updated according to the misclassification rate and the relative entropy loss until the behavior recognition model Stopping on Convergence includes:

calculating a cross-entropy loss between the misclassification rate and the relative entropy loss, and judging whether the cross-entropy loss and the misclassification rate meet a preset loss condition;

If not satisfied, then adjust the feature selection parameters in the behavior recognition model according to the cross-entropy loss and the misclassification rate;

According to the adjusted feature selection parameters, the behavior recognition model is updated until the behavior recognition model converges.
An auto insurance claim settlement behavior recognition device, wherein the auto insurance claim settlement behavior recognition device includes:

The expansion module is used to obtain historical auto insurance claim data, and divide the historical auto insurance claim data into positive samples and negative samples; perform neighbor propagation processing on the positive samples to obtain a plurality of expanded samples, and perform processing on the negative samples Downsampling processing to obtain multiple sub-samples;

A combination module, configured to combine each of the sub-samples with the positive sample and the expanded sample to obtain a first data set and a second data set;

A training module, configured to input the first data set and the second data set into a preset behavior recognition model to identify behavior types, and obtain a first behavior recognition result corresponding to the first data set and a second behavior recognition result corresponding to the second data set;

An update module, configured to calculate the misclassification rate of the first data set by the behavior recognition model according to the first behavior recognition result and the behavior type corresponding to the first data set, and calculate the first behavior A relative entropy loss between the recognition result and the second behavior recognition result; updating the behavior recognition model according to the misclassification rate and the relative entropy loss until the behavior recognition model converges;

The identification module is used to obtain the auto insurance claim data to be identified, input the auto insurance claim data to be identified into the behavior identification model, and identify the behavior category corresponding to the auto insurance claim data to be identified.
The auto insurance claim settlement behavior identification device according to claim 19, wherein the expansion module includes:

The distance calculation unit is used to calculate the Euclidean distance between each two positive samples in turn, and determine the nearest neighbor sample of each positive sample according to the Euclidean distance;

The interpolation processing unit is used for randomly screening a preset number of neighboring samples for linear interpolation processing, and constructing extended samples according to the processing results.