CN115409630A

CN115409630A - Insurance product accurate recommendation method based on mixed recommendation algorithm

Info

Publication number: CN115409630A
Application number: CN202210880920.1A
Authority: CN
Inventors: 徐淑宏; 孙秋霞; 吕玉敏; 李勍
Original assignee: Qingdao Xiaobei Information Technology Co ltd
Current assignee: Qingdao Xiaobei Information Technology Co ltd
Priority date: 2022-07-26
Filing date: 2022-07-26
Publication date: 2022-11-29
Anticipated expiration: 2042-07-26
Also published as: CN115409630B

Abstract

The invention discloses an insurance product accurate recommendation method based on a hybrid recommendation algorithm, which comprises the steps of calculating the correlation M between new customer guarantee project requirements and insurance clauses by using a TF-IDF model to obtain an insurance product set A containing an insurance clause set with high correlation; searching a similar customer set and an insurance product set B selected by a similar user in an existing customer information database by using a cosine similarity method; constructing a three-order score tensor of insurance data of client-guarantee project-insurance product, predicting the score of a new client on the insurance product by using a Tucker tensor decomposition algorithm, and obtaining an insurance product set C; and outputting an insurance product set V with higher interest degree of the new client by using a Personal Rank algorithm to obtain an insurance product set Z which is interesting and meets the requirements of the new client. The invention considers the limit of insurance clauses to guarantee projects, and recommends insurance products which really meet the requirements of clients for guarantee to users by using a TF-IDF keyword extraction method, thereby reducing the probability of mistakenly selecting the insurance products based on subjective judgment of the clients.

Description

Accurate insurance product recommendation method based on mixed recommendation algorithm

Technical Field

The invention relates to the field of personalized recommendation, in particular to an insurance product accurate recommendation method based on a mixed recommendation algorithm.

Background

When a customer purchases an insurance product, the platform usually displays the guarantee responsibility range of the insurance product, such as a certain insurance product A, wherein the guarantee responsibility range is the major illness medical expense and the hospitalization medical expense, but the specific disease limitation and the limitation on the hospital grade and the ward grade in the guarantee responsibility range are not directly explained. The information is explained in detail in insurance clauses contained in the insurance products, but most users do not choose to purchase the insurance products after reading the insurance clauses in detail due to the large number of words and long length of the insurance clauses, and only judge whether the insurance products meet the requirements of the users through the guarantee responsibility range given by the platform. The client cannot determine whether the insurance products selected and purchased in the mode really meet the self guarantee requirements or not.

The recommendation system has the main functions of comprehensively analyzing information such as attribute characteristics and historical behaviors of a client and providing personalized information service for a user, and the recommendation system is based on the principle of mining the correlation degree between a user and a target, searching the information which is most likely to be interested by the user through a series of recommendation algorithms, reducing the time for the user to search the information of interest from mass information and improving the user experience. The recommendation system is widely applied in the fields of accurate marketing of commodities, accurate short-video recommendation and the like. At present, the mainstream recommendation algorithms include a tag-based recommendation algorithm, a user-based recommendation algorithm, a collaborative filtering-based recommendation algorithm and the like, however, because the insurance products are different from general commodities, a lot of applicable conditions and limitations exist, and thus the accuracy of the traditional recommendation algorithm is not high when the insurance products are sold. For example, two customers with the same attribute characteristics may have different requirements for the insurance item, and the collaborative filtering based recommendation algorithm may lose effectiveness.

The TF-IDF model is based on the assumption that the importance of a word increases in proportion to the number of times the word appears in a file, but decreases in inverse proportion to the frequency of the word appearing in a corpus, and the weight of the word in the file is judged by calculating the product of the word frequency (TF) and the inverse file frequency (IDF), so that the method is widely applied to the fields of data mining and information retrieval.

The Tucker tensor decomposition is a high-order form of principal component analysis and can be regarded as high-order expansion of tensor singular value decomposition, the essence of the Tucker tensor decomposition is that an original tensor is decomposed into a core tensor and factor matrixes corresponding to different dimensions, and the problem of filling the sparse tensor can be solved through a Tucker tensor decomposition method.

The Personal Rank algorithm is a graph-based recommendation algorithm (random walk algorithm), and represents the relationship between users and products in the form of a bipartite graph, so as to recommend products for the user a, and actually calculate the interest level of the user a in all commodities.

The existing recommendation algorithm has the following problems when acting on insurance product recommendation:

1. the existing insurance product recommendation mode cannot consider the problem of the coincidence degree of the guarantee items defined by insurance terms and the client guarantee requirements, and only judges whether the guarantee requirements of the user are met through the guarantee responsibility of the insurance product.

2. The method has the advantages that high-dimensional data can be subjected to dimensionality reduction and generally used for complementing sparse data, but the precision is not high when the method is directly applied to insurance product recommendation.

3. Because of the particularity of the insurance products during purchase, insurance products with different guarantee items can have the same guarantee responsibility range description on a sales platform, customers are confused when browsing, the insurance products which are most suitable for the customers cannot be distinguished, and the recommendation precision of the Personal Rank algorithm is reduced under the condition.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides the insurance product accurate recommendation method based on the mixed recommendation algorithm, which is reasonable in design, overcomes the defects of the prior art and has a good effect.

The invention adopts the following technical scheme:

an insurance product accurate recommendation method based on a hybrid recommendation algorithm comprises the following steps:

s1, constructing an insurance clause data set, an existing customer information database and a guarantee project requirement vocabulary library according to existing data of an insurance company;

s2, collecting characteristic information of a new client and guarantee project requirement information, and extracting a new client guarantee project requirement keyword through a supervised learning method;

s3, calculating the correlation M between the new customer guarantee project requirements and insurance clauses by using a TF-IDF model, setting a correlation threshold a, outputting an insurance clause set with M being larger than a, further obtaining an insurance product set A containing the insurance clause set, and calculating new customer guarantee project requirement scores by using a TF-IDF algorithm;

s4, searching a similar customer set and an insurance product set B purchased by a similar user in an existing customer information database by using a cosine similarity method, extracting the guarantee project requirements of the similar user, and supplementing the guarantee project requirements of a new customer;

s5, constructing a three-order score tensor of insurance data of a client-guarantee project-insurance product, perfecting client-guarantee project information through the guarantee project requirement score calculated in the S3 and the new client guarantee project requirement supplemented in the S4, supplementing the three-order score tensor of the insurance data by using a Tucker tensor decomposition algorithm, predicting the score of the new client on the insurance product, and obtaining an insurance product set C meeting the requirements of the new client;

s6, merging insurance product sets A, B and C output by the S3, the S4 and the S5 to obtain an insurance product set D, wherein D = VBU;

and S7, analyzing data of the new client when browsing insurance products by using a Personal Rank algorithm, outputting an insurance product set V with higher interest degree of the new client, and comparing the insurance product set V with the insurance product set D in the S6 to obtain an insurance product set Z which is interesting and meets the requirements of the new client.

Further, in the S2, the method comprises the following steps of inputting characters and wordsConverting new customer requirements collected by each channel including voice input into text information, comparing the text information with insurance industry professional vocabularies through a supervised keyword extraction method, and extracting a guarantee project requirement keyword set H = (H) of the new customer ₁ ,h ₂ ,…,h _i ,…)。

Further, in S3, the specific process is as follows:

s31, searching related insurance clauses by using a TF-IDF algorithm;

let insurance clause data set N = (y) ₁ ,y,…,y _j ,…)，

Keyword h _i In insurance clause y _j Frequency of occurrence of TF _ij The calculation formula of (2) is as follows:

TF _ij ＝d _ij /(∑ _k d _kj ) (1)

wherein d is _ij Keyword h for representing customer guarantee project requirement _i In insurance clause y _j The number of occurrences in (a);

inverse document frequency IDF _i The calculation formula of (2) is as follows:

where | D | represents the number of all insurance clauses in the data set, | j: h _i ∈y _j I indicates the inclusion of the keyword h _i The number of insurance clauses of (1) is added to prevent the operation error caused by the number of 0;

then insurance clause y _j The correlation degree M with the new customer guarantee project requirements is calculated by the formula:

M＝∑ _i TF _ij ·IDF _i (3)

outputting the previous b insurance clauses with M > a, if b insurance clauses are insufficient, continuing from high to low according to the degree of correlation in the insurance clauses lower than the threshold value a, and outputting an insurance product set A related to the insurance clauses;

s32, calculating the new customer guarantee project demand score by using a TF-IDF algorithm;

in the b insurance clauses in S31, the TF-IDF weights of the same guarantee item requirement keyword in different insurance clauses are added to serve as the score of the new client for the guarantee item requirement.

Further, in S4, the specific process is as follows:

let the feature attribute vector describing the new customer be

If the new client has the attribute p _i Then p is _i =1, otherwise p _i ＝0；

The similarity degree of the new client s and the client t is calculated by the cosine similarity theorem:

and if omega is larger than the set threshold value c, the client t is a similar client, the guarantee project requirements of the similar client t are extracted and supplemented into the guarantee project requirements of the new client, and an insurance product set B purchased by the similar client t is obtained.

Further, in S5, the following steps are included:

s51, constructing a third-order grading tensor of insurance data of a client-guarantee project-insurance product;

s52, completing a three-order scoring tensor x formed by a client, a guarantee project and an insurance product through a Tucker tensor decomposition algorithm;

is set to be n ₁ ×n ₂ ×n ₃ Recording the position index set as S, and performing the Tucker decomposition to obtain an expression:

χ≈g _×1 U _×2 V _×3 W (5)

wherein g is a core vector and represents the interaction level between different factor matrixes; u is n ₁ ×r ₁ V is n ₂ ×r ₂ W is n ₃ ×r ₃ Support project factor matrix of；

Then equation (5) is equivalent to:

wherein x is _ijk Representing the grade of an insurance product j containing a guarantee item k by a user i in the third-order tensor of the insurance data;

converting the tensor decomposition problem into an optimization problem:

for u in the objective function J _im 、v _jn 、w _kl And g _mnl The partial derivative can be found:

according to the gradient descent method, u _im 、v _jn 、w _kl And g _mnl The update formula during each iteration is:

and S53, after filling up missing data in the third-order scoring tensor, predicting the scoring of the new client on the insurance products by using a formula (6) to obtain an insurance product set C with higher scoring.

Further, in S7, the specific process is as follows:

generating a bipartite graph of the relationship between the client and the insurance product according to whether the new client has interested behaviors in the insurance product, wherein nodes in the bipartite graph comprise the client and the insurance product;

if the PR value of the initialized target client node is 1, the target client node is shown as an initial node, and the PR initialization values of the other nodes are all 0, then:

where α represents the probability of continuing the migration from a certain node, (1) represents the probability of stopping the migration and restarting the migration from the target client node, v represents a certain node, v represents a node, and _u representing a wandering start node (target client node), in (v) representing a node set connected with a v node, v ' representing a specific node in (v), and | out (v ') | representing the output degree of the node v ';

after repeated iterative computation, the PR values of different nodes gradually converge to a certain value, and the importance degree of different insurance products to a client is obtained, an insurance product set V with a larger PR value is obtained, and the insurance product set V is compared with the insurance product set D output in S6, and a final recommended insurance product set Z = V andd is obtained.

The invention has the following beneficial effects:

the invention provides a recommendation algorithm for insurance products, which takes the limit of insurance terms on guarantee items into consideration, and recommends the insurance products which really meet the requirements of clients for guarantee to users by using a TF-IDF keyword extraction method, thereby reducing the probability that the clients mistakenly select the insurance products based on subjective judgment. The method comprises the steps of constructing a third-order tensor of insurance data of client-guarantee demand-insurance products, performing dimensionality reduction processing through a Tucker tensor decomposition method, completing data filling, predicting the scoring of the client on the insurance products, and effectively solving the problems of high dimensionality, difficulty in calculation and the like caused by multiple attributes and types of the insurance products. And processing browsing data of the client through a Personal Rank algorithm, searching for interested products based on subjective preference of the client, combining the interested products with insurance products recommended by other algorithms, and finally outputting the insurance products which are interested by the client and meet the guarantee requirements of the client.

Drawings

FIG. 1 is a flow chart of an insurance precision recommendation method based on a hybrid recommendation algorithm in the present invention;

FIG. 2 is a bipartite graph of the relationship between customers and insurance products in accordance with the present invention;

Detailed Description

The following description of the embodiments of the present invention will be made with reference to the accompanying drawings:

an insurance product accurate recommendation method based on a hybrid recommendation algorithm is shown in fig. 1 and comprises the following steps:

specifically, new customer requirements collected by various channels including character entry and voice entry are converted into text information, and the text information is compared with insurance industry professional vocabularies through a supervised keyword extraction method to extract a guarantee project requirement keyword set H = (H) of the new customer ₁ ,h ₂ ,…,h _i ,…)。

S3, calculating the correlation M between the new customer guarantee project requirements and insurance clauses by using a TF-IDF model, setting a correlation threshold a, outputting an insurance clause set with M being larger than a, further obtaining an insurance product set A containing the insurance clause sets, and finally calculating new customer guarantee project requirement scores by using a TF-IDF algorithm;

the specific process is as follows:

s31, searching related insurance clauses by using a TF-IDF algorithm;

let insurance clause data set N = (y) ₁ ,y,…,y _j ,…)，

TF _ij ＝d _ij /(∑ _k d _kj ) (1)

wherein d is _ij Keyword h for representing requirements of customer care project _i In insurance clause y _j The number of occurrences in (1);

inverse document frequency IdF _i The calculation formula of (c) is:

the more insurance clauses appear in a certain guarantee requirement, the lower the feature contribution degree of the current insurance clause, so the less important the insurance clause feature is obtained, when the guarantee item key word appears little in a certain insurance clause, if the occurrence times of other insurance clauses in the insurance clause set are less, the high TF-IDF value is still possible to be obtained, because the insurance item key word can highlight the feature of the document;

then insurance clause y _j With new customersThe calculation formula of the correlation degree M of the requirements of the user security project is as follows:

M＝∑ _i TF _ij ·IDF _i (3)

in the insurance clauses with M being more than a, selecting b insurance clauses from high to low, if b insurance clauses are insufficient, complementing from high to low according to the relevant degree M from the insurance clauses lower than the threshold value a, and outputting an insurance product set A related to the insurance clauses;

in the b insurance clauses in the S31, the TF-IDF weights of the same guarantee item requirement keyword in different insurance clauses are added to serve as the score of the new client for the guarantee item requirement.

And S4, because the extracted new client information may express the requirements of incomplete new clients, the requirements of the new clients are completed through similar clients. Searching a similar customer set and an insurance product set B selected and purchased by a similar user in an existing customer information database by using a cosine similarity method, extracting the guarantee project requirements of the similar user, and supplementing the guarantee project requirements of a new customer;

the specific process is as follows:

let the feature attribute vector describing the new customer be

Calculating the similarity degree of the new client s and the client t by the cosine similarity theorem:

and if the omega is larger than the set threshold value c, the client t is a similar client, the guarantee project requirements of the similar client t are extracted and supplemented into the guarantee project requirements of the new client, and an insurance product set B selected by the similar client t is obtained. The invention considers that when the new client can not accurately express the self guarantee requirement, the guarantee requirement of the similar user can be supplemented.

S5, constructing a client-guarantee project-insurance product insurance data three-order score tensor, perfecting client-guarantee project-insurance product information through the guarantee project requirement score calculated in the S3 and the new client guarantee project requirement supplemented in the S4, supplementing the insurance data three-order score tensor through a Tucker tensor decomposition algorithm, predicting the score of the new client on the insurance product, and obtaining an insurance product set C meeting the new client requirement;

the method comprises the following steps:

let the size of χ be n ₁ ×n ₂ ×n ₃ Recording the position index set as S, and performing Tucker decomposition to obtain an expression as follows:

χ≈g _×1 U _×2 V _×3 W (5)

wherein g is a core vector and represents the interaction level between different factor matrixes; u is n ₁ ×r ₁ V is a size n ₂ ×r ₂ W is n ₃ ×r ₃ The guarantee project factor matrix of (2);

then equation (5) is equivalent to:

wherein x is _ijk Representing the grade of a user i in a third-order tensor of insurance data on an insurance product j containing a guarantee item k;

converting the tensor decomposition problem into an optimization problem:

for u in the objective function J _im 、v _jn、 w _kl And g _mnl The partial derivative can be found:

according to the gradient descent method, u _im 、v _j n、w _kl And g _mnl The update formula during each iteration is:

and S53, after filling missing data in the tensor, predicting the score of the new client on the insurance product by using a formula (6) to obtain an insurance product set C with higher score.

S6, merging insurance product sets A, B and C output by the S3, the S4 and the S5 to obtain an insurance product set D, wherein D = A ^ B ^ C;

s7, because the function description of the insurance products on the platform is incomplete, and the insurance product set D may be doped with insurance products which do not meet the requirements of the insurance projects of the new client, analyzing the data of the new client when browsing the insurance products by using a Personal Rank algorithm, outputting an insurance product set V with higher interest degree of the new client, and comparing the insurance product set V with the insurance product set D in S6 to obtain an insurance product set Z which is interesting and meets the requirements of the new client;

the specific process is as follows:

generating a bipartite graph of the relationship between the client and the insurance product according to whether the new client has interested behaviors on the insurance product, wherein A, B and C represent the client, a, B, C and d represent the insurance product, and a connecting line is arranged between the client and the insurance product to represent the interested behaviors (behaviors such as praise, collection and the like) of the client on the insurance product, as shown in FIG. 2;

in the Personal Rank algorithm, users and commodities are not distinguished, and the interest degree of the target client A for all the commodities is calculated, so that the importance degree of other nodes for the target client A node is calculated (other client nodes are finally discharged);

the PR value of the initialization target client node A is 1, which means that the target client node A is the starting node, and the PR initialization values of the other nodes are all 0, namely: PR (A) =1, PR (B) =0, \ 8230;, PR (d) =0;

the calculation formula of the PR values of different nodes is:

where v denotes a certain node, α denotes a probability of continuing the migration from the certain node, (1- α) denotes a probability of stopping the migration and restarting the migration from the target client node, and v denotes _u Representing a walk starting node (a target client node), in (v) representing a node set connected with a v node, v ' representing a specific node in (v), and | out (v ') | representing the degree of departure of the node v ';

if v ≠ v _u Then it means that the node continues to walk next to the current node if v = v _u If yes, the target client corresponding node is used for starting the migration again;

It is to be understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art may make modifications, alterations, additions or substitutions within the spirit and scope of the present invention.

Claims

1. An insurance product accurate recommendation method based on a hybrid recommendation algorithm is characterized by comprising the following steps:

s3, calculating the correlation M between the new customer guarantee project requirements and insurance clauses by using a TF-IDF model, setting a correlation threshold value a, outputting an insurance clause set with M being larger than a, further obtaining an insurance product set A containing the insurance clause set, and calculating new customer guarantee project requirement scores by using a TF-IDF algorithm;

2. The method for accurately recommending insurance products based on the hybrid recommendation algorithm according to claim 1, wherein in S2, new customer requirements collected from various channels including text entry and voice entry are converted into text information, and a supervised keyword extraction method is used for comparing the text information with insurance industry professional vocabularies to extract a guarantee project requirement keyword set H = (H) of new customers ₁ ,h ₂ ,…,h _i ,…)。

3. The method for accurately recommending insurance products based on a hybrid recommendation algorithm according to claim 1, wherein in S3, the specific process is as follows:

s31, searching related insurance clauses by using a TF-IDF algorithm:

let insurance clause data set N = (y) ₁ ,y,…,y _j ,…)，

TF _ij ＝d _ij /(Σ _k d _kj ) (1)

wherein d is _ij Keyword h for representing requirements of customer care project _i In insurance clause y _j Number of occurrences in, k represents a keyword h _i The total number of the cells;

inverse document frequency IDF _i The calculation formula of (c) is:

then insurance clause y _j The correlation degree M with the requirements of the new customer guarantee project is calculated according to the formula:

M＝Σ _i TF _ij ·IDF _i (3)

4. The method for accurately recommending insurance products based on a hybrid recommendation algorithm according to claim 1, wherein in S4, the specific process is as follows:

let the feature attribute vector describing the new customer be

5. The method for accurately recommending insurance products based on a hybrid recommendation algorithm according to claim 1, wherein in said S5, the following steps are included:

s51, constructing an insurance data three-order scoring tensor of a client-guarantee project-insurance product;

s52, completing a three-order scoring tensor x formed by 'customer-guarantee project-insurance product' through a Tucker tensor decomposition algorithm;

χ≈g _×1 U _×2 V _×3 W (5)

wherein g is a core vector and represents the interaction level between different factor matrixes; u is n ₁ ×r ₁ V is n ₂ ×r ₂ W is n ₃ ×r ₃ A guarantee project factor matrix of (2);

then equation (5) is equivalent to:

the tensor decomposition problem is converted into an optimization problem:

for u in the objective function J _im 、v _jn 、w _kl And g _mnl The partial derivative can be obtained:

and S53, after the missing data in the third-order scoring tensor is filled, the scoring of the insurance product by the new client is predicted by using a formula (6), and an insurance product set C with higher scoring is obtained.

6. The method for accurately recommending insurance products based on a hybrid recommendation algorithm according to claim 1, wherein in S7, the specific process is as follows:

where α represents the probability of continuing the migration from a node, (1- α) represents the probability of stopping the migration and restarting the migration from the target client node, v represents a node, and _u representing a walking starting node, namely a target client node, in (v) represents a node set connected with a v node, v ' represents a specific node in (v), and | out (v ') | represents the out degree of the node v ';