CN112330476A

CN112330476A - Method and device for predicting group insurance business

Info

Publication number: CN112330476A
Application number: CN202011365036.1A
Authority: CN
Inventors: 王帅; 侯成文; 梁曦; 林鹏程
Original assignee: China Life Insurance Co Ltd China
Current assignee: China Life Insurance Co Ltd China
Priority date: 2020-11-27
Filing date: 2020-11-27
Publication date: 2021-02-05

Abstract

The method and the device for predicting the group insurance business provided by one or more embodiments of the specification firstly extract the current characteristic value and the historical characteristic value in the current group insurance business and the historical group insurance business, respectively encode the current characteristic value and the historical characteristic value by using TF-IDF, respectively reduce the dimension by using PCA, and calculate the probability that the odds ratio of the current dimension-reduced characteristic value falls into the subinterval by using a Bayesian algorithm; and selecting the subinterval corresponding to the maximum probability as the prediction result of the odds of the current bill insurance service to be output, so that the odds of different clients in the new bill insurance service can be accurately predicted.

Description

Method and device for predicting group insurance business

Technical Field

One or more embodiments of the present disclosure relate to the technical field of insurance business prediction, and in particular, to a method and an apparatus for predicting group insurance business.

Background

At present, the insurance industry in China mainly adopts an ARMA-based prediction model to predict the odds ratio.

Group insurance business: the insurance contract is a business for insurance provided by insurance people, wherein the insurance people are insurance applicants or specific groups in the group units, and the insurance people are insurance members (which can comprise member spouses, children and parents) of specific groups of more than 5 persons (except the terms are specified).

However, the inventors have found that the prediction model based on ARMA is not able to accurately predict the odds of different customers in a new bill of business. Because the prediction objects suitable for the prediction model based on the ARMA must satisfy a certain linear relationship, in the existing bill service, the bills of the new client and the old client are different and belong to a nonlinear relationship.

Disclosure of Invention

In view of the above, one or more embodiments of the present disclosure are directed to a method and an apparatus for predicting a group insurance policy, so as to solve the technical problems in the prior art.

In view of the above, one or more embodiments of the present specification provide a method for predicting a bouquet insurance business, including:

acquiring a current group policy insurance service and a historical group policy insurance service;

respectively extracting a current characteristic value in the current group insurance business and a historical characteristic value in the historical group insurance business;

coding the extracted current characteristic value and the extracted historical characteristic value by using TF-IDF respectively;

using PCA to respectively perform dimensionality reduction on the coded current characteristic value and the coded historical characteristic value, and respectively obtaining a current dimensionality reduction characteristic value and a historical dimensionality reduction characteristic value;

determining the interval distribution of the odds according to the odds of the historical group policy insurance service, and calculating the odds of the historical dimension reduction characteristic values;

inputting the odds ratio of the historical dimension reduction characteristic value into a Bayesian algorithm aiming at each section of subinterval in the interval distribution, and calculating the probability that the odds ratio of the current group insurance business falls into the subinterval;

and selecting the subinterval corresponding to the maximum probability as the prediction result of the odds ratio of the current group insurance business and outputting the prediction result.

As an optional implementation, the subinterval includes:

profit, profit less, loss and severe loss.

As an alternative embodiment, if the predicted result is a loss or a serious loss, the method further includes:

and outputting the modification suggestion.

As an alternative embodiment, the outputting of the modification suggestion includes:

constructing N sub-classifiers, each of which outputs an initial suggestion, wherein N is an integer greater than 2;

and outputting the initial suggestion with the highest vote among all the initial suggestions as the modification suggestions on the basis of a voting principle.

As an alternative embodiment, constructing the sub-classifiers includes:

randomly selecting alpha x beta current dimension reduction characteristic values from the data set formed by the current dimension reduction characteristic values to form alpha schemes, wherein each scheme comprises beta current dimension reduction characteristic values, and the alpha schemes form a sub data set;

and finding the optimal segmentation mode of the sub data set based on the Gini index.

Corresponding to the group insurance business prediction method, the embodiment of the invention also provides a group insurance business prediction device, which comprises the following steps:

the acquisition module is used for acquiring the current group insurance service and the historical group insurance service;

the extraction module is used for respectively extracting the current characteristic value in the current group insurance business and the historical characteristic value in the historical group insurance business;

the coding module is used for coding the extracted current characteristic value and the extracted historical characteristic value by using TF-IDF respectively;

the dimensionality reduction module is used for respectively reducing the dimensionality of the coded current characteristic value and the coded historical characteristic value by using PCA (principal component analysis), and respectively obtaining the current dimensionality reduction characteristic value and the historical dimensionality reduction characteristic value;

the first calculation module is used for determining the interval distribution of the odds according to the odds situation of the historical policy insurance service and calculating the odds of the historical dimension reduction characteristic value;

the second calculation module is used for inputting the odds of the historical dimension reduction characteristic values into a Bayesian algorithm aiming at each section of subinterval in the interval distribution and calculating the probability that the odds of the current group insurance business falls into the subinterval;

and the output module is used for selecting the subinterval corresponding to the maximum probability as the prediction result of the loss rate of the current group insurance business and outputting the prediction result.

As an optional implementation, the subinterval includes:

profit, profit less, loss and severe loss.

a suggestion module to output a modification suggestion.

As an optional implementation, the suggestion module includes:

the construction unit is used for constructing N sub-classifiers, each sub-classifier outputs an initial suggestion, wherein N is an integer larger than 2;

and the computing unit is used for outputting the initial suggestion with the highest vote among all the initial suggestions as the modification suggestions on the basis of the voting principle.

As an optional implementation, the construction unit is configured to:

As can be seen from the above, in the method and apparatus for predicting group insurance business provided in one or more embodiments of the present specification, first, the current feature value and the historical feature value in the current group insurance business and the historical group insurance business are extracted, the current feature value and the historical feature value are respectively encoded by using TF-IDF, then, the dimensionality reduction is respectively performed by using PCA, and the probability that the odds ratio of the current dimensionality reduction feature value falls into the subinterval is calculated by using the bayes algorithm; and selecting the subinterval corresponding to the maximum probability as the prediction result of the odds of the current bill insurance service to be output, so that the odds of different clients in the new bill insurance service can be accurately predicted.

Drawings

In order to more clearly illustrate one or more embodiments or prior art solutions of the present specification, the drawings that are needed in the description of the embodiments or prior art will be briefly described below, and it is obvious that the drawings in the following description are only one or more embodiments of the present specification, and that other drawings may be obtained by those skilled in the art without inventive effort from these drawings.

FIG. 1 is a logic diagram of a method according to one embodiment of the present description;

FIG. 2 is a logic diagram of a method according to another embodiment of the present disclosure;

FIG. 3 is a logic diagram of an output modification suggestion in accordance with one or more embodiments of the present description;

fig. 4 is a logic diagram of an apparatus according to an embodiment of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the present disclosure more apparent, the present disclosure is further described in detail below with reference to specific embodiments.

In order to achieve the above object, an embodiment of the present invention provides a method for predicting a group insurance business, including:

In the embodiment of the invention, the current characteristic value and the historical characteristic value in the current group insurance service and the historical group insurance service are extracted firstly, the current characteristic value and the historical characteristic value are respectively encoded by using TF-IDF, then the dimensionality reduction is respectively carried out by using PCA, and the probability that the odds ratio of the current dimensionality reduction characteristic value falls into the subinterval is calculated by using a Bayesian algorithm; and selecting the subinterval corresponding to the maximum probability as the prediction result of the odds of the current bill insurance service to be output, so that the odds of different clients in the new bill insurance service can be accurately predicted.

Fig. 1 shows an embodiment of a method for predicting a bouquet insurance service, comprising:

s100, obtaining the current group insurance business and the historical group insurance business.

S200, respectively extracting the current characteristic value in the current group insurance business and the historical characteristic value in the historical group insurance business.

Wherein the current feature value includes: the insurance applicant information and the scheme information, wherein the insurance applicant information comprises insurance application time, unit property, industry type, occupation type, member number, job number, insurance acceptance number, insurance applicant age distribution and the like; the scheme information comprises scheme types, contract forms, service categories, dangerous species, property groups, insurance amounts, premium fees, discount rates, commission rates, sales areas, whether the specified effective dates are traced or not and the like.

Wherein, the historical characteristic value includes: the system comprises insurance applicant information, scheme information and historical claim settlement information, wherein the insurance applicant information comprises insurance application time, unit property, industry type, occupation type, member number, number of persons who are at work, insurance acceptance number, insurance applicant age distribution and the like; the scheme information comprises scheme types, contract forms, service categories, dangerous species, property groups, premium, discount rate, commission rate, sales area, whether the specified effective date is traced or not and the like; the historical claim settlement information includes major risk, effective date, expiration date, policy beginning, policy end, policy benefits rate, etc.

S300, coding the extracted current characteristic value and the extracted historical characteristic value by using TF-IDF respectively.

Wherein the content of the first and second substances,

TF-IDF＝tf*idf

and (4) coding the calculated TF-IDF value as a characteristic value (a current characteristic value and a historical characteristic value), and recording the result as Z, wherein the Z has n schemes, and each scheme has a data set with m characteristics.

Wherein

Is a horizontal vector and represents the characteristic of a certain scheme after TF-IDF coding.

S400, using PCA to respectively perform dimension reduction on the coded current characteristic value and the coded historical characteristic value, and respectively obtaining the current dimension reduction characteristic value and the historical dimension reduction characteristic value.

Decentralizing the data set Z to obtain a matrix X:

wherein

And (5) calculating to obtain a covariance matrix C of m by m.

Calculating to obtain the eigenvalue lambda of covariance matrix C_iAnd feature vectors corresponding to the feature values

Is a column vector.

Sorting the eigenvalues λ from large to small_iAnd selecting the eigenvectors corresponding to the first K large eigenvalues to form a matrix Q.

Unitized feature vectors, denoted as matrix D:

and calculating the reduced dimension data set, and recording as Y:

the dataset after PCA dimension reduction is noted as a dataset with n solutions, each solution having k features.

Wherein

Wherein

Is a transverse vector representing the feature of a certain scheme after PCA coding, f_jTo represent

A certain attribute value.

The main purpose of PCA is to reduce the dimension of data and make the characteristic values independent to each other as much as possible to meet the requirement of Bayesian algorithm, so that the Bayesian algorithm has higher accuracy.

S500, determining the interval distribution of the odds according to the odds situation of the historical policy insurance service, and calculating the odds of the historical dimension reduction characteristic value.

According to the claim condition of the historical group policy insurance service, determining the interval distribution of the claim rate, for example, each sub-interval in the interval distribution is respectively

Wherein the sum of the values of ω and ω,

the sub-intervals in each segment in the distribution interval may be defined as "profit", "profit is less", "loss", "severe loss", etc., and are denoted as R as a prediction result.

Calculating the odds ratio P (f) of the historical dimension reduction characteristic value according to the odds condition of the historical group policy insurance service_k). For the discrete historical dimension reduction characteristic value, the corresponding payout rate P (f) can be obtained through statistics_k) (ii) a For the continuous historical dimension reduction characteristic value, the corresponding payout rate P (f) can be calculated through a probability density function_k) Or discretizing the continuous historical dimension-reduction characteristic value, namely dividing the continuous historical dimension-reduction characteristic value into a plurality of sections, and then counting to obtain the corresponding payout rate P (f) of each section_k)。

S600, aiming at each section of subinterval in the interval distribution, inputting the odds of the historical dimension reduction characteristic values into a Bayesian algorithm, and calculating the probability that the odds of the current group insurance business falls into the subinterval.

Wherein, R represents subinterval, namely profit, less profit, loss, serious loss and the like;

and P (R) represents the probability of the corresponding subinterval, and is obtained according to the statistics of the historical group insurance policy. f. of_jOne of the characteristics of a certain group of single cases after PCA dimension reduction is shown, namely

P(R|f₁，f₂，...，f_k) Is represented by containing f₁，f₂，...，f_kThe probability of the current group policy insurance service odds of the current dimension reduction characteristic values in the R interval.

S700, selecting the subinterval corresponding to the maximum probability as the prediction result of the loss rate of the current group insurance business and outputting the prediction result.

For example, P (profit | | f)₁，f₂，...，f_k) Probability of (d) is 20%, P (less profitable | f)₁，f₂，...，f_k) Probability of (c) is 40%, P (deficit | f)₁，f₂，...，f_k) Probability of (2)% P (severe deficit | f)₁，f₂，...，f_k) Is 10%, then the odds for the current group policy insurance service is in the "less profitable" sub-interval.

As an alternative embodiment, if the predicted result is a loss or a serious loss, as shown in fig. 2, the method further includes:

and S800, outputting a modification suggestion.

Optionally, as shown in fig. 3, the outputting the modification suggestion includes:

s801, constructing N sub-classifiers, wherein each sub-classifier outputs an initial suggestion, and N is an integer greater than 2;

optionally, constructing the sub-classifier includes:

randomly selecting alpha x beta current dimension reduction characteristic values from the data set formed by the current dimension reduction characteristic values to form alpha schemes, wherein each scheme comprises beta current dimension reduction characteristic values, and the alpha schemes form a sub data set Y';

based on the Gini index, finding the optimal segmentation mode of the sub data set Y':

where θ represents the category, i.e., profit, less profit, loss, severe loss, non-empty subset, p_iRepresenting the probability that a certain current group policy insurance service is correctly classified based on the feature f, (1-p)_i) Representing the probability that the scheme is misclassified.

By calculating the characteristic f_jThe sum of the kini index of the classification category and the product of the kini index and the weight of other feature classifications in the sample set Y' is used to measure the basis for selecting attributes:

selecting the feature f with the smallest Gini index_jAnd (5) as a classification node for constructing the decision tree at this time, thereby obtaining the constructed sub-classifiers.

And S802, outputting the initial suggestion with the highest vote among all the initial suggestions as a modification suggestion based on a voting principle.

The embodiment of the invention provides a method for predicting the claim rate of an party single insurance service, which utilizes a Bayesian algorithm to obtain the probability value of the claim rate of the current party single insurance service, and recommends modification suggestions for operators according to the predicted claim rate, thereby facilitating the operators to adjust the current party single insurance service scheme and reducing the economic loss of insurance companies caused by unreasonable quotation of the current party single insurance service.

It is to be appreciated that the method can be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities.

It should be noted that the method of one or more embodiments of the present disclosure may be performed by a single device, such as a computer or server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may perform only one or more steps of the method of one or more embodiments of the present disclosure, and the devices may interact with each other to complete the method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Corresponding to the group insurance business prediction method, an embodiment of the present invention further provides a group insurance business prediction apparatus, as shown in fig. 4, including:

an obtaining module 10, configured to obtain a current group insurance service and a historical group insurance service;

an extracting module 20, configured to respectively extract a current feature value in the current group insurance service and a historical feature value in a historical group insurance service;

the encoding module 30 is configured to encode the extracted current feature value and the extracted historical feature value by using TF-IDF, respectively;

a dimension reduction module 40, configured to perform dimension reduction on the encoded current feature value and the encoded historical feature value respectively by using PCA, and obtain the current dimension reduction feature value and the historical dimension reduction feature value respectively;

the first calculation module 50 is configured to determine interval distribution of odds according to the odds paid condition of the historical policy insurance service, and calculate the odds paid by the historical dimension reduction feature value;

the second calculation module 60 is configured to input the odds of the historical dimension reduction feature values into a bayesian algorithm for each segment of sub-interval in the interval distribution, and calculate the probability that the odds of the current group insurance policy fall into the sub-interval;

and the output module 70 is configured to select a sub-interval corresponding to the maximum probability as a prediction result of the payout rate of the current policy insurance service to be output.

As an optional implementation, the subinterval includes:

profit, profit less, loss and severe loss.

a suggestion module to output a modification suggestion.

Optionally, the suggesting module includes:

Optionally, the building unit is configured to:

It is to be noted that unless otherwise defined, technical or scientific terms used in one or more embodiments of the present specification should have the ordinary meaning as understood by those of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in one or more embodiments of the specification is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another.

For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the modules may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present description.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the spirit of the present disclosure, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of different aspects of one or more embodiments of the present description as described above, which are not provided in detail for the sake of brevity.

It is intended that the one or more embodiments of the present specification embrace all such alternatives, modifications and variations as fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of one or more embodiments of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method for predicting group insurance business, comprising:

2. The method of claim 1, wherein the subintervals include:

profit, profit less, loss and severe loss.

3. The method for predicting group insurance business of claim 2, wherein if the prediction result is loss or serious loss, the method further comprises:

and outputting the modification suggestion.

4. The method of claim 3, wherein outputting the modification recommendation comprises:

5. The method of predicting group insurances of traffic according to claim 4, wherein constructing the sub-classifiers comprises:

6. A bouquet insurance business prediction apparatus, comprising:

7. The bouquet insurance business prediction device of claim 6, wherein said subintervals include:

profit, profit less, loss and severe loss.

8. The prediction device of group insurance business according to claim 7, further comprising, if the prediction result is loss or serious loss:

a suggestion module to output a modification suggestion.

9. The apparatus according to claim 8, wherein the recommendation module comprises:

10. The bolus insurance business prediction device of claim 9, wherein the construction unit is configured to:

randomly selecting alpha x beta current dimension reduction characteristic values from the data set formed by the current dimension reduction characteristic values to form alpha schemes, wherein each scheme comprises beta current dimension reduction characteristic values, and the alpha schemes form a sub data set; and finding the optimal segmentation mode of the sub data set based on the Gini index.