CN112330476A - Method and device for predicting group insurance business - Google Patents

Method and device for predicting group insurance business Download PDF

Info

Publication number
CN112330476A
CN112330476A CN202011365036.1A CN202011365036A CN112330476A CN 112330476 A CN112330476 A CN 112330476A CN 202011365036 A CN202011365036 A CN 202011365036A CN 112330476 A CN112330476 A CN 112330476A
Authority
CN
China
Prior art keywords
current
historical
characteristic value
odds
insurance business
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011365036.1A
Other languages
Chinese (zh)
Inventor
王帅
侯成文
梁曦
林鹏程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Life Insurance Co Ltd China
Original Assignee
China Life Insurance Co Ltd China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Life Insurance Co Ltd China filed Critical China Life Insurance Co Ltd China
Priority to CN202011365036.1A priority Critical patent/CN112330476A/en
Publication of CN112330476A publication Critical patent/CN112330476A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Development Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Finance (AREA)
  • Human Resources & Organizations (AREA)
  • Accounting & Taxation (AREA)
  • Evolutionary Computation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Game Theory and Decision Science (AREA)
  • Technology Law (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The method and the device for predicting the group insurance business provided by one or more embodiments of the specification firstly extract the current characteristic value and the historical characteristic value in the current group insurance business and the historical group insurance business, respectively encode the current characteristic value and the historical characteristic value by using TF-IDF, respectively reduce the dimension by using PCA, and calculate the probability that the odds ratio of the current dimension-reduced characteristic value falls into the subinterval by using a Bayesian algorithm; and selecting the subinterval corresponding to the maximum probability as the prediction result of the odds of the current bill insurance service to be output, so that the odds of different clients in the new bill insurance service can be accurately predicted.

Description

Method and device for predicting group insurance business
Technical Field
One or more embodiments of the present disclosure relate to the technical field of insurance business prediction, and in particular, to a method and an apparatus for predicting group insurance business.
Background
At present, the insurance industry in China mainly adopts an ARMA-based prediction model to predict the odds ratio.
Group insurance business: the insurance contract is a business for insurance provided by insurance people, wherein the insurance people are insurance applicants or specific groups in the group units, and the insurance people are insurance members (which can comprise member spouses, children and parents) of specific groups of more than 5 persons (except the terms are specified).
However, the inventors have found that the prediction model based on ARMA is not able to accurately predict the odds of different customers in a new bill of business. Because the prediction objects suitable for the prediction model based on the ARMA must satisfy a certain linear relationship, in the existing bill service, the bills of the new client and the old client are different and belong to a nonlinear relationship.
Disclosure of Invention
In view of the above, one or more embodiments of the present disclosure are directed to a method and an apparatus for predicting a group insurance policy, so as to solve the technical problems in the prior art.
In view of the above, one or more embodiments of the present specification provide a method for predicting a bouquet insurance business, including:
acquiring a current group policy insurance service and a historical group policy insurance service;
respectively extracting a current characteristic value in the current group insurance business and a historical characteristic value in the historical group insurance business;
coding the extracted current characteristic value and the extracted historical characteristic value by using TF-IDF respectively;
using PCA to respectively perform dimensionality reduction on the coded current characteristic value and the coded historical characteristic value, and respectively obtaining a current dimensionality reduction characteristic value and a historical dimensionality reduction characteristic value;
determining the interval distribution of the odds according to the odds of the historical group policy insurance service, and calculating the odds of the historical dimension reduction characteristic values;
inputting the odds ratio of the historical dimension reduction characteristic value into a Bayesian algorithm aiming at each section of subinterval in the interval distribution, and calculating the probability that the odds ratio of the current group insurance business falls into the subinterval;
and selecting the subinterval corresponding to the maximum probability as the prediction result of the odds ratio of the current group insurance business and outputting the prediction result.
As an optional implementation, the subinterval includes:
profit, profit less, loss and severe loss.
As an alternative embodiment, if the predicted result is a loss or a serious loss, the method further includes:
and outputting the modification suggestion.
As an alternative embodiment, the outputting of the modification suggestion includes:
constructing N sub-classifiers, each of which outputs an initial suggestion, wherein N is an integer greater than 2;
and outputting the initial suggestion with the highest vote among all the initial suggestions as the modification suggestions on the basis of a voting principle.
As an alternative embodiment, constructing the sub-classifiers includes:
randomly selecting alpha x beta current dimension reduction characteristic values from the data set formed by the current dimension reduction characteristic values to form alpha schemes, wherein each scheme comprises beta current dimension reduction characteristic values, and the alpha schemes form a sub data set;
and finding the optimal segmentation mode of the sub data set based on the Gini index.
Corresponding to the group insurance business prediction method, the embodiment of the invention also provides a group insurance business prediction device, which comprises the following steps:
the acquisition module is used for acquiring the current group insurance service and the historical group insurance service;
the extraction module is used for respectively extracting the current characteristic value in the current group insurance business and the historical characteristic value in the historical group insurance business;
the coding module is used for coding the extracted current characteristic value and the extracted historical characteristic value by using TF-IDF respectively;
the dimensionality reduction module is used for respectively reducing the dimensionality of the coded current characteristic value and the coded historical characteristic value by using PCA (principal component analysis), and respectively obtaining the current dimensionality reduction characteristic value and the historical dimensionality reduction characteristic value;
the first calculation module is used for determining the interval distribution of the odds according to the odds situation of the historical policy insurance service and calculating the odds of the historical dimension reduction characteristic value;
the second calculation module is used for inputting the odds of the historical dimension reduction characteristic values into a Bayesian algorithm aiming at each section of subinterval in the interval distribution and calculating the probability that the odds of the current group insurance business falls into the subinterval;
and the output module is used for selecting the subinterval corresponding to the maximum probability as the prediction result of the loss rate of the current group insurance business and outputting the prediction result.
As an optional implementation, the subinterval includes:
profit, profit less, loss and severe loss.
As an alternative embodiment, if the predicted result is a loss or a serious loss, the method further includes:
a suggestion module to output a modification suggestion.
As an optional implementation, the suggestion module includes:
the construction unit is used for constructing N sub-classifiers, each sub-classifier outputs an initial suggestion, wherein N is an integer larger than 2;
and the computing unit is used for outputting the initial suggestion with the highest vote among all the initial suggestions as the modification suggestions on the basis of the voting principle.
As an optional implementation, the construction unit is configured to:
randomly selecting alpha x beta current dimension reduction characteristic values from the data set formed by the current dimension reduction characteristic values to form alpha schemes, wherein each scheme comprises beta current dimension reduction characteristic values, and the alpha schemes form a sub data set;
and finding the optimal segmentation mode of the sub data set based on the Gini index.
As can be seen from the above, in the method and apparatus for predicting group insurance business provided in one or more embodiments of the present specification, first, the current feature value and the historical feature value in the current group insurance business and the historical group insurance business are extracted, the current feature value and the historical feature value are respectively encoded by using TF-IDF, then, the dimensionality reduction is respectively performed by using PCA, and the probability that the odds ratio of the current dimensionality reduction feature value falls into the subinterval is calculated by using the bayes algorithm; and selecting the subinterval corresponding to the maximum probability as the prediction result of the odds of the current bill insurance service to be output, so that the odds of different clients in the new bill insurance service can be accurately predicted.
Drawings
In order to more clearly illustrate one or more embodiments or prior art solutions of the present specification, the drawings that are needed in the description of the embodiments or prior art will be briefly described below, and it is obvious that the drawings in the following description are only one or more embodiments of the present specification, and that other drawings may be obtained by those skilled in the art without inventive effort from these drawings.
FIG. 1 is a logic diagram of a method according to one embodiment of the present description;
FIG. 2 is a logic diagram of a method according to another embodiment of the present disclosure;
FIG. 3 is a logic diagram of an output modification suggestion in accordance with one or more embodiments of the present description;
fig. 4 is a logic diagram of an apparatus according to an embodiment of the present disclosure.
Detailed Description
To make the objects, technical solutions and advantages of the present disclosure more apparent, the present disclosure is further described in detail below with reference to specific embodiments.
In order to achieve the above object, an embodiment of the present invention provides a method for predicting a group insurance business, including:
acquiring a current group policy insurance service and a historical group policy insurance service;
respectively extracting a current characteristic value in the current group insurance business and a historical characteristic value in the historical group insurance business;
coding the extracted current characteristic value and the extracted historical characteristic value by using TF-IDF respectively;
using PCA to respectively perform dimensionality reduction on the coded current characteristic value and the coded historical characteristic value, and respectively obtaining a current dimensionality reduction characteristic value and a historical dimensionality reduction characteristic value;
determining the interval distribution of the odds according to the odds of the historical group policy insurance service, and calculating the odds of the historical dimension reduction characteristic values;
inputting the odds ratio of the historical dimension reduction characteristic value into a Bayesian algorithm aiming at each section of subinterval in the interval distribution, and calculating the probability that the odds ratio of the current group insurance business falls into the subinterval;
and selecting the subinterval corresponding to the maximum probability as the prediction result of the odds ratio of the current group insurance business and outputting the prediction result.
In the embodiment of the invention, the current characteristic value and the historical characteristic value in the current group insurance service and the historical group insurance service are extracted firstly, the current characteristic value and the historical characteristic value are respectively encoded by using TF-IDF, then the dimensionality reduction is respectively carried out by using PCA, and the probability that the odds ratio of the current dimensionality reduction characteristic value falls into the subinterval is calculated by using a Bayesian algorithm; and selecting the subinterval corresponding to the maximum probability as the prediction result of the odds of the current bill insurance service to be output, so that the odds of different clients in the new bill insurance service can be accurately predicted.
Fig. 1 shows an embodiment of a method for predicting a bouquet insurance service, comprising:
s100, obtaining the current group insurance business and the historical group insurance business.
S200, respectively extracting the current characteristic value in the current group insurance business and the historical characteristic value in the historical group insurance business.
Wherein the current feature value includes: the insurance applicant information and the scheme information, wherein the insurance applicant information comprises insurance application time, unit property, industry type, occupation type, member number, job number, insurance acceptance number, insurance applicant age distribution and the like; the scheme information comprises scheme types, contract forms, service categories, dangerous species, property groups, insurance amounts, premium fees, discount rates, commission rates, sales areas, whether the specified effective dates are traced or not and the like.
Wherein, the historical characteristic value includes: the system comprises insurance applicant information, scheme information and historical claim settlement information, wherein the insurance applicant information comprises insurance application time, unit property, industry type, occupation type, member number, number of persons who are at work, insurance acceptance number, insurance applicant age distribution and the like; the scheme information comprises scheme types, contract forms, service categories, dangerous species, property groups, premium, discount rate, commission rate, sales area, whether the specified effective date is traced or not and the like; the historical claim settlement information includes major risk, effective date, expiration date, policy beginning, policy end, policy benefits rate, etc.
S300, coding the extracted current characteristic value and the extracted historical characteristic value by using TF-IDF respectively.
Wherein the content of the first and second substances,
Figure BDA0002805158360000061
Figure BDA0002805158360000062
TF-IDF=tf*idf
and (4) coding the calculated TF-IDF value as a characteristic value (a current characteristic value and a historical characteristic value), and recording the result as Z, wherein the Z has n schemes, and each scheme has a data set with m characteristics.
Figure BDA0002805158360000071
Wherein
Figure BDA0002805158360000072
Is a horizontal vector and represents the characteristic of a certain scheme after TF-IDF coding.
S400, using PCA to respectively perform dimension reduction on the coded current characteristic value and the coded historical characteristic value, and respectively obtaining the current dimension reduction characteristic value and the historical dimension reduction characteristic value.
Decentralizing the data set Z to obtain a matrix X:
Figure BDA0002805158360000073
wherein
Figure BDA0002805158360000074
And (5) calculating to obtain a covariance matrix C of m by m.
Figure BDA0002805158360000075
Calculating to obtain the eigenvalue lambda of covariance matrix CiAnd feature vectors corresponding to the feature values
Figure BDA0002805158360000076
Figure BDA0002805158360000077
Is a column vector.
Sorting the eigenvalues λ from large to smalliAnd selecting the eigenvectors corresponding to the first K large eigenvalues to form a matrix Q.
Figure BDA0002805158360000078
Unitized feature vectors, denoted as matrix D:
Figure BDA0002805158360000079
Figure BDA00028051583600000710
and calculating the reduced dimension data set, and recording as Y:
Figure BDA0002805158360000081
the dataset after PCA dimension reduction is noted as a dataset with n solutions, each solution having k features.
Figure BDA0002805158360000082
Wherein
Figure BDA0002805158360000083
Wherein
Figure BDA0002805158360000084
Is a transverse vector representing the feature of a certain scheme after PCA coding, fjTo represent
Figure BDA0002805158360000088
A certain attribute value.
The main purpose of PCA is to reduce the dimension of data and make the characteristic values independent to each other as much as possible to meet the requirement of Bayesian algorithm, so that the Bayesian algorithm has higher accuracy.
S500, determining the interval distribution of the odds according to the odds situation of the historical policy insurance service, and calculating the odds of the historical dimension reduction characteristic value.
According to the claim condition of the historical group policy insurance service, determining the interval distribution of the claim rate, for example, each sub-interval in the interval distribution is respectively
Figure BDA0002805158360000086
Wherein the sum of the values of ω and ω,
Figure BDA0002805158360000087
the sub-intervals in each segment in the distribution interval may be defined as "profit", "profit is less", "loss", "severe loss", etc., and are denoted as R as a prediction result.
Calculating the odds ratio P (f) of the historical dimension reduction characteristic value according to the odds condition of the historical group policy insurance servicek). For the discrete historical dimension reduction characteristic value, the corresponding payout rate P (f) can be obtained through statisticsk) (ii) a For the continuous historical dimension reduction characteristic value, the corresponding payout rate P (f) can be calculated through a probability density functionk) Or discretizing the continuous historical dimension-reduction characteristic value, namely dividing the continuous historical dimension-reduction characteristic value into a plurality of sections, and then counting to obtain the corresponding payout rate P (f) of each sectionk)。
S600, aiming at each section of subinterval in the interval distribution, inputting the odds of the historical dimension reduction characteristic values into a Bayesian algorithm, and calculating the probability that the odds of the current group insurance business falls into the subinterval.
Figure BDA0002805158360000091
Wherein, R represents subinterval, namely profit, less profit, loss, serious loss and the like;
and P (R) represents the probability of the corresponding subinterval, and is obtained according to the statistics of the historical group insurance policy. f. ofjOne of the characteristics of a certain group of single cases after PCA dimension reduction is shown, namely
Figure BDA0002805158360000092
P(R|f1,f2,...,fk) Is represented by containing f1,f2,...,fkThe probability of the current group policy insurance service odds of the current dimension reduction characteristic values in the R interval.
S700, selecting the subinterval corresponding to the maximum probability as the prediction result of the loss rate of the current group insurance business and outputting the prediction result.
For example, P (profit | | f)1,f2,...,fk) Probability of (d) is 20%, P (less profitable | f)1,f2,...,fk) Probability of (c) is 40%, P (deficit | f)1,f2,...,fk) Probability of (2)% P (severe deficit | f)1,f2,...,fk) Is 10%, then the odds for the current group policy insurance service is in the "less profitable" sub-interval.
As an alternative embodiment, if the predicted result is a loss or a serious loss, as shown in fig. 2, the method further includes:
and S800, outputting a modification suggestion.
Optionally, as shown in fig. 3, the outputting the modification suggestion includes:
s801, constructing N sub-classifiers, wherein each sub-classifier outputs an initial suggestion, and N is an integer greater than 2;
optionally, constructing the sub-classifier includes:
randomly selecting alpha x beta current dimension reduction characteristic values from the data set formed by the current dimension reduction characteristic values to form alpha schemes, wherein each scheme comprises beta current dimension reduction characteristic values, and the alpha schemes form a sub data set Y';
Figure BDA0002805158360000101
based on the Gini index, finding the optimal segmentation mode of the sub data set Y':
Figure BDA0002805158360000102
where θ represents the category, i.e., profit, less profit, loss, severe loss, non-empty subset, piRepresenting the probability that a certain current group policy insurance service is correctly classified based on the feature f, (1-p)i) Representing the probability that the scheme is misclassified.
By calculating the characteristic fjThe sum of the kini index of the classification category and the product of the kini index and the weight of other feature classifications in the sample set Y' is used to measure the basis for selecting attributes:
Figure BDA0002805158360000103
selecting the feature f with the smallest Gini indexjAnd (5) as a classification node for constructing the decision tree at this time, thereby obtaining the constructed sub-classifiers.
And S802, outputting the initial suggestion with the highest vote among all the initial suggestions as a modification suggestion based on a voting principle.
The embodiment of the invention provides a method for predicting the claim rate of an party single insurance service, which utilizes a Bayesian algorithm to obtain the probability value of the claim rate of the current party single insurance service, and recommends modification suggestions for operators according to the predicted claim rate, thereby facilitating the operators to adjust the current party single insurance service scheme and reducing the economic loss of insurance companies caused by unreasonable quotation of the current party single insurance service.
It is to be appreciated that the method can be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities.
It should be noted that the method of one or more embodiments of the present disclosure may be performed by a single device, such as a computer or server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may perform only one or more steps of the method of one or more embodiments of the present disclosure, and the devices may interact with each other to complete the method.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Corresponding to the group insurance business prediction method, an embodiment of the present invention further provides a group insurance business prediction apparatus, as shown in fig. 4, including:
an obtaining module 10, configured to obtain a current group insurance service and a historical group insurance service;
an extracting module 20, configured to respectively extract a current feature value in the current group insurance service and a historical feature value in a historical group insurance service;
the encoding module 30 is configured to encode the extracted current feature value and the extracted historical feature value by using TF-IDF, respectively;
a dimension reduction module 40, configured to perform dimension reduction on the encoded current feature value and the encoded historical feature value respectively by using PCA, and obtain the current dimension reduction feature value and the historical dimension reduction feature value respectively;
the first calculation module 50 is configured to determine interval distribution of odds according to the odds paid condition of the historical policy insurance service, and calculate the odds paid by the historical dimension reduction feature value;
the second calculation module 60 is configured to input the odds of the historical dimension reduction feature values into a bayesian algorithm for each segment of sub-interval in the interval distribution, and calculate the probability that the odds of the current group insurance policy fall into the sub-interval;
and the output module 70 is configured to select a sub-interval corresponding to the maximum probability as a prediction result of the payout rate of the current policy insurance service to be output.
As an optional implementation, the subinterval includes:
profit, profit less, loss and severe loss.
As an alternative embodiment, if the predicted result is a loss or a serious loss, the method further includes:
a suggestion module to output a modification suggestion.
Optionally, the suggesting module includes:
the construction unit is used for constructing N sub-classifiers, each sub-classifier outputs an initial suggestion, wherein N is an integer larger than 2;
and the computing unit is used for outputting the initial suggestion with the highest vote among all the initial suggestions as the modification suggestions on the basis of the voting principle.
Optionally, the building unit is configured to:
randomly selecting alpha x beta current dimension reduction characteristic values from the data set formed by the current dimension reduction characteristic values to form alpha schemes, wherein each scheme comprises beta current dimension reduction characteristic values, and the alpha schemes form a sub data set;
and finding the optimal segmentation mode of the sub data set based on the Gini index.
It is to be noted that unless otherwise defined, technical or scientific terms used in one or more embodiments of the present specification should have the ordinary meaning as understood by those of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in one or more embodiments of the specification is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another.
For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the modules may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present description.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the spirit of the present disclosure, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of different aspects of one or more embodiments of the present description as described above, which are not provided in detail for the sake of brevity.
It is intended that the one or more embodiments of the present specification embrace all such alternatives, modifications and variations as fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of one or more embodiments of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (10)

1. A method for predicting group insurance business, comprising:
acquiring a current group policy insurance service and a historical group policy insurance service;
respectively extracting a current characteristic value in the current group insurance business and a historical characteristic value in the historical group insurance business;
coding the extracted current characteristic value and the extracted historical characteristic value by using TF-IDF respectively;
using PCA to respectively perform dimensionality reduction on the coded current characteristic value and the coded historical characteristic value, and respectively obtaining a current dimensionality reduction characteristic value and a historical dimensionality reduction characteristic value;
determining the interval distribution of the odds according to the odds of the historical group policy insurance service, and calculating the odds of the historical dimension reduction characteristic values;
inputting the odds ratio of the historical dimension reduction characteristic value into a Bayesian algorithm aiming at each section of subinterval in the interval distribution, and calculating the probability that the odds ratio of the current group insurance business falls into the subinterval;
and selecting the subinterval corresponding to the maximum probability as the prediction result of the odds ratio of the current group insurance business and outputting the prediction result.
2. The method of claim 1, wherein the subintervals include:
profit, profit less, loss and severe loss.
3. The method for predicting group insurance business of claim 2, wherein if the prediction result is loss or serious loss, the method further comprises:
and outputting the modification suggestion.
4. The method of claim 3, wherein outputting the modification recommendation comprises:
constructing N sub-classifiers, each of which outputs an initial suggestion, wherein N is an integer greater than 2;
and outputting the initial suggestion with the highest vote among all the initial suggestions as the modification suggestions on the basis of a voting principle.
5. The method of predicting group insurances of traffic according to claim 4, wherein constructing the sub-classifiers comprises:
randomly selecting alpha x beta current dimension reduction characteristic values from the data set formed by the current dimension reduction characteristic values to form alpha schemes, wherein each scheme comprises beta current dimension reduction characteristic values, and the alpha schemes form a sub data set;
and finding the optimal segmentation mode of the sub data set based on the Gini index.
6. A bouquet insurance business prediction apparatus, comprising:
the acquisition module is used for acquiring the current group insurance service and the historical group insurance service;
the extraction module is used for respectively extracting the current characteristic value in the current group insurance business and the historical characteristic value in the historical group insurance business;
the coding module is used for coding the extracted current characteristic value and the extracted historical characteristic value by using TF-IDF respectively;
the dimensionality reduction module is used for respectively reducing the dimensionality of the coded current characteristic value and the coded historical characteristic value by using PCA (principal component analysis), and respectively obtaining the current dimensionality reduction characteristic value and the historical dimensionality reduction characteristic value;
the first calculation module is used for determining the interval distribution of the odds according to the odds situation of the historical policy insurance service and calculating the odds of the historical dimension reduction characteristic value;
the second calculation module is used for inputting the odds of the historical dimension reduction characteristic values into a Bayesian algorithm aiming at each section of subinterval in the interval distribution and calculating the probability that the odds of the current group insurance business falls into the subinterval;
and the output module is used for selecting the subinterval corresponding to the maximum probability as the prediction result of the loss rate of the current group insurance business and outputting the prediction result.
7. The bouquet insurance business prediction device of claim 6, wherein said subintervals include:
profit, profit less, loss and severe loss.
8. The prediction device of group insurance business according to claim 7, further comprising, if the prediction result is loss or serious loss:
a suggestion module to output a modification suggestion.
9. The apparatus according to claim 8, wherein the recommendation module comprises:
the construction unit is used for constructing N sub-classifiers, each sub-classifier outputs an initial suggestion, wherein N is an integer larger than 2;
and the computing unit is used for outputting the initial suggestion with the highest vote among all the initial suggestions as the modification suggestions on the basis of the voting principle.
10. The bolus insurance business prediction device of claim 9, wherein the construction unit is configured to:
randomly selecting alpha x beta current dimension reduction characteristic values from the data set formed by the current dimension reduction characteristic values to form alpha schemes, wherein each scheme comprises beta current dimension reduction characteristic values, and the alpha schemes form a sub data set; and finding the optimal segmentation mode of the sub data set based on the Gini index.
CN202011365036.1A 2020-11-27 2020-11-27 Method and device for predicting group insurance business Pending CN112330476A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011365036.1A CN112330476A (en) 2020-11-27 2020-11-27 Method and device for predicting group insurance business

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011365036.1A CN112330476A (en) 2020-11-27 2020-11-27 Method and device for predicting group insurance business

Publications (1)

Publication Number Publication Date
CN112330476A true CN112330476A (en) 2021-02-05

Family

ID=74309321

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011365036.1A Pending CN112330476A (en) 2020-11-27 2020-11-27 Method and device for predicting group insurance business

Country Status (1)

Country Link
CN (1) CN112330476A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8275707B1 (en) * 2005-10-14 2012-09-25 The Chubb Corporation Methods and systems for normalized identification and prediction of insurance policy profitability
CN106104615A (en) * 2013-12-11 2016-11-09 天巡有限公司 For providing method and the server of one group of price evaluation value, such as air fare price evaluation value
CN107423279A (en) * 2017-04-11 2017-12-01 美林数据技术股份有限公司 A kind of information extraction and analysis method of credit financing short message
CN107480895A (en) * 2017-08-19 2017-12-15 中国标准化研究院 A kind of reliable consumer goods methods of risk assessment based on Bayes enhancing study
CN108256691A (en) * 2018-02-08 2018-07-06 成都智宝大数据科技有限公司 Refund Probabilistic Prediction Model construction method and device
CN108492197A (en) * 2018-03-15 2018-09-04 北京百度网讯科技有限公司 Prediction technique of being in danger, device and the server of insurance
CN110084627A (en) * 2018-01-23 2019-08-02 北京京东金融科技控股有限公司 The method and apparatus for predicting target variable
CN110147925A (en) * 2019-04-10 2019-08-20 阿里巴巴集团控股有限公司 A kind of Application of risk decision method, device, equipment and system
WO2019218751A1 (en) * 2018-05-16 2019-11-21 阿里巴巴集团控股有限公司 Processing method, apparatus and device for risk prediction of insurance service
CN111401798A (en) * 2020-06-02 2020-07-10 南京百敖软件有限公司 Enterprise waste escaping and debt risk early warning system and construction method
CN111401431A (en) * 2020-03-12 2020-07-10 成都小步创想慧联科技有限公司 Group renting house identification method and system and storage medium
CN111813823A (en) * 2020-05-25 2020-10-23 泰康保险集团股份有限公司 Insurance service policy adjustment system, vehicle-mounted recording device and server

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8275707B1 (en) * 2005-10-14 2012-09-25 The Chubb Corporation Methods and systems for normalized identification and prediction of insurance policy profitability
CN106104615A (en) * 2013-12-11 2016-11-09 天巡有限公司 For providing method and the server of one group of price evaluation value, such as air fare price evaluation value
CN107423279A (en) * 2017-04-11 2017-12-01 美林数据技术股份有限公司 A kind of information extraction and analysis method of credit financing short message
CN107480895A (en) * 2017-08-19 2017-12-15 中国标准化研究院 A kind of reliable consumer goods methods of risk assessment based on Bayes enhancing study
CN110084627A (en) * 2018-01-23 2019-08-02 北京京东金融科技控股有限公司 The method and apparatus for predicting target variable
CN108256691A (en) * 2018-02-08 2018-07-06 成都智宝大数据科技有限公司 Refund Probabilistic Prediction Model construction method and device
CN108492197A (en) * 2018-03-15 2018-09-04 北京百度网讯科技有限公司 Prediction technique of being in danger, device and the server of insurance
WO2019218751A1 (en) * 2018-05-16 2019-11-21 阿里巴巴集团控股有限公司 Processing method, apparatus and device for risk prediction of insurance service
CN110147925A (en) * 2019-04-10 2019-08-20 阿里巴巴集团控股有限公司 A kind of Application of risk decision method, device, equipment and system
CN111401431A (en) * 2020-03-12 2020-07-10 成都小步创想慧联科技有限公司 Group renting house identification method and system and storage medium
CN111813823A (en) * 2020-05-25 2020-10-23 泰康保险集团股份有限公司 Insurance service policy adjustment system, vehicle-mounted recording device and server
CN111401798A (en) * 2020-06-02 2020-07-10 南京百敖软件有限公司 Enterprise waste escaping and debt risk early warning system and construction method

Similar Documents

Publication Publication Date Title
Agarwal et al. Fair regression: Quantitative definitions and reduction-based algorithms
CN112070125A (en) Prediction method of unbalanced data set based on isolated forest learning
CN109739844B (en) Data classification method based on attenuation weight
CN112990386B (en) User value clustering method and device, computer equipment and storage medium
CN113255908B (en) Method, neural network model and device for service prediction based on event sequence
CN112668822B (en) Scientific and technological achievement transformation platform sharing system, method, storage medium and mobile phone APP
Lin et al. Tourism demand forecasting: Econometric model based on multivariate adaptive regression splines, artificial neural network and support vector regression
Swishchuk et al. General semi-Markov model for limit order books
CN113962160A (en) Internet card user loss prediction method and system based on user portrait
CN112541635A (en) Service data statistical prediction method and device, computer equipment and storage medium
CN115545886A (en) Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium
CN115271886A (en) Financial product recommendation method and device, storage medium and electronic equipment
Li et al. Billion-user customer lifetime value prediction: an industrial-scale solution from Kuaishou
CN114187125A (en) Claims case distribution method, device, equipment and storage medium
CN110213239B (en) Suspicious transaction message generation method and device and server
CN112330476A (en) Method and device for predicting group insurance business
CN116885697A (en) Load prediction method based on combination of cluster analysis and intelligent algorithm
CN117194966A (en) Training method and related device for object classification model
CN116029766A (en) User transaction decision recognition method, incentive strategy optimization method, device and equipment
CN115983982A (en) Credit risk identification method, credit risk identification device, credit risk identification equipment and computer readable storage medium
CN113421154B (en) Credit risk assessment method and system based on control chart
CN113850483A (en) Enterprise credit risk rating system
Oh et al. Developing time-based clustering neural networks to use change-point detection: Application to financial time series
Mammadzada et al. Application of bg/nbd and gamma-gamma models to predict customer lifetime value for financial institution
CN117670393A (en) Method and device for determining target intention clients, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination