CN114612132A

CN114612132A - Client renewal prediction method based on machine learning and related equipment

Info

Publication number: CN114612132A
Application number: CN202210169141.0A
Authority: CN
Inventors: 裴合兴; 赵堃宇; 于高升
Original assignee: China Life Insurance Co ltd
Current assignee: China Life Insurance Co ltd
Priority date: 2022-02-23
Filing date: 2022-02-23
Publication date: 2022-06-10

Abstract

The present application provides a machine learning-based customer insurance renewal prediction method and related equipment. The method includes: acquiring renewal policy data for model training; extracting multi-dimensional feature data from the renewal policy data; The multi-dimensional feature data trains the preset machine learning model to obtain the renewal prediction model; obtains the historical policy data of the target customer; inputs the historical policy data into the renewal prediction model, and outputs the target customer's corresponding policy renewal Guaranteed forecast results. The method of the present application can quickly and stably predict the customer's insurance renewal willingness, which is helpful for marketers to develop business according to the renewal prediction result, provide differentiated services to customers, and improve the insurance renewal rate.

Description

Machine Learning-Based Customer Renewal Prediction Method and Related Equipment

技术领域technical field

本申请涉及计算机技术领域，尤其涉及一种基于机器学习的客户续保预测方法及相关设备。The present application relates to the field of computer technology, and in particular, to a method and related equipment for predicting customer renewal based on machine learning.

背景技术Background technique

近年来，随着保险业务的发展，通过互联网销售的保单数量也在逐年增长。对于保险公司来说，保单的续保运营情况将直接影响保费规模。当在保的客户购买的保险产品到期后，会期望在保的客户进行续保，在此过程中，营销员无法得知客户的续保意愿，因而无法进行差异化高效展业。In recent years, with the development of the insurance business, the number of insurance policies sold through the Internet has also increased year by year. For insurance companies, the renewal operation of the policy will directly affect the premium scale. When the insurance product purchased by the insured customer expires, the insured customer will be expected to renew the insurance. During this process, the marketer cannot know the customer's renewal intention, so it is impossible to carry out differentiated and efficient business development.

目前，一般通过营销员的个人经验，对客户的续保意愿进行预测，由于不同营销员的个人经验存在差异，人为判断的方式准确性不高，且不稳定。因此，亟需一种客户续保预测的方案，对客户续保意愿进行有效、准确的预测。At present, the customer's willingness to renew insurance is generally predicted based on the personal experience of the marketer. Due to the differences in the personal experience of different marketers, the accuracy of human judgment is not high and unstable. Therefore, there is an urgent need for a solution for customer renewal forecasting, which can effectively and accurately predict customer renewal intentions.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本申请的目的在于提出一种解决上述问题的基于机器学习的客户续保预测方法及相关设备。In view of this, the purpose of this application is to propose a method for predicting customer renewal based on machine learning and related equipment to solve the above problems.

基于上述目的，本申请第一方面提供一种基于机器学习的客户续保预测方法，包括：Based on the above purpose, a first aspect of the present application provides a method for predicting customer renewal based on machine learning, including:

获取用于模型训练的续保保单数据；Obtain renewal policy data for model training;

从所述续保保单数据中提取多维特征数据；extracting multi-dimensional feature data from the renewal policy data;

根据所述多维特征数据训练预设的机器学习模型，得到续保预测模型；Train a preset machine learning model according to the multi-dimensional feature data to obtain a warranty renewal prediction model;

获取目标客户的历史保单数据；Obtain historical policy data of target customers;

将所述历史保单数据输入至所述续保预测模型中，输出所述目标客户对应保单的续保预测结果。Inputting the historical policy data into the insurance renewal prediction model, and outputting the renewal prediction result of the policy corresponding to the target customer.

进一步地，所述根据所述多维特征数据训练预设的机器学习模型，得到续保预测模型包括：Further, according to the multi-dimensional feature data training preset machine learning model, the obtained renewal prediction model includes:

对所述多维特征数据进行预处理，preprocessing the multidimensional feature data,

将预处理后的多维特征数据划分为训练数据集、验证数据集和测试数据集；Divide the preprocessed multidimensional feature data into training datasets, validation datasets and test datasets;

利用所述训练数据集对所述机器学习模型进行训练，每完成一次迭代，通过所述验证集获取所述机器学习模型的准确率；以及The machine learning model is trained by using the training data set, and the accuracy of the machine learning model is obtained through the validation set every time an iteration is completed; and

通过超参数优化算法对所述机器学习模型的超参数进行优化；Optimize the hyperparameters of the machine learning model through a hyperparameter optimization algorithm;

重复执行上述操作直至满足预定的迭代次数或所述机器学习模型的准确率满足预设条件，得到训练好的续保预测模型；Repeat the above operation until the predetermined number of iterations is met or the accuracy of the machine learning model meets the preset condition, and a trained warranty renewal prediction model is obtained;

利用所述测试集对所述训练好的续保预测模型的准确率进行验证，得到续保预测结果。The accuracy of the trained warranty renewal prediction model is verified by using the test set to obtain a warranty renewal prediction result.

进一步地，所述将所述历史保单数据输入至所述续保预测模型中，输出所述目标客户对应保单的续保预测结果之后还包括：Further, after the inputting the historical policy data into the insurance renewal prediction model, and outputting the renewal prediction result of the corresponding insurance policy of the target customer, the method further includes:

根据所述续保预测结果确定所述目标客户的续保预测概率；Determine the renewal forecast probability of the target customer according to the renewal forecast result;

响应于确定所述续保预测概率大于等于预设的续保概率阈值，将所述目标客户标注为续保意向客户并展示。In response to determining that the predicted renewal probability is greater than or equal to a preset renewal probability threshold, the target customer is marked as a renewal intention customer and displayed.

进一步地，所述预处理包括数据清洗和数据降维。Further, the preprocessing includes data cleaning and data dimensionality reduction.

进一步地，所述超参数优化算法包括下列至少一种：网格搜索算法、随机搜索算法和/或贝叶斯优化算法。Further, the hyperparameter optimization algorithm includes at least one of the following: a grid search algorithm, a random search algorithm and/or a Bayesian optimization algorithm.

进一步地，所述多维特征数据包括：客户基本信息、历史投保记录信息、历史理赔记录信息、保单信息、营销员基本信息。Further, the multi-dimensional feature data includes: basic customer information, historical insurance application record information, historical claim settlement record information, insurance policy information, and basic information on salespersons.

进一步地，所述机器学习模型使用的算法包括：随机森林算法、Xgboost算法或Wide&Deep算法。Further, the algorithms used by the machine learning model include: random forest algorithm, Xgboost algorithm or Wide&Deep algorithm.

基于同一发明构思，本申请第二方面提供一种基于机器学习的客户续保预测装置，包括：Based on the same inventive concept, a second aspect of the present application provides a device for predicting customer renewal based on machine learning, including:

第一获取模块，被配置为获取用于模型训练的续保保单数据；a first acquisition module, configured to acquire policy renewal data for model training;

特征提取模块，被配置为从所述续保保单数据中提取多维特征数据；a feature extraction module configured to extract multi-dimensional feature data from the renewal policy data;

模型训练模块，被配置为根据所述多维特征数据训练预设的机器学习模型，得到续保预测模型；A model training module, configured to train a preset machine learning model according to the multi-dimensional feature data to obtain a warranty renewal prediction model;

第二获取模块，被配置为获取目标客户的历史保单数据；The second acquisition module is configured to acquire historical policy data of the target customer;

续保预测模块，被配置为将所述历史保单数据输入至所述续保预测模型中，输出所述目标客户对应保单的续保预测结果。The insurance renewal prediction module is configured to input the historical insurance policy data into the insurance renewal prediction model, and output the insurance renewal prediction result of the corresponding insurance policy of the target customer.

基于同一发明构思，本申请第三方面提供一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现第一方面所述的方法。Based on the same inventive concept, a third aspect of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the first aspect when executing the program the method described.

基于同一发明构思，本申请第四方面提供一种非暂态计算机可读存储介质，所述非暂态计算机可读存储介质存储计算机指令，所述计算机指令用于使计算机执行第一方面所述的方法。Based on the same inventive concept, a fourth aspect of the present application provides a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions are used to cause a computer to execute the first aspect. Methods.

从上面所述可以看出，本申请提供的基于机器学习的客户续保预测方法及相关设备，基于机器学习，利用续保的保单对机器学习模型进行训练，以得到训练好的续保预测模型，利用训练好的续保预测模型对待预测客户的续保意愿进行快速、稳定地预测，有助于营销员根据续保预测结果进行展业，对客户进行差异化服务，提高续保率。It can be seen from the above that the machine learning-based customer insurance renewal prediction method and related equipment provided in this application are based on machine learning and use the renewed insurance policy to train the machine learning model to obtain a trained insurance renewal prediction model. , using the trained insurance renewal prediction model to quickly and stably predict the renewal intention of the predicted customers, which will help the marketers to expand their business according to the renewal prediction results, provide differentiated services to customers, and improve the renewal rate.

附图说明Description of drawings

为了更清楚地说明本申请或相关技术中的技术方案，下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the present application or related technologies more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments or related technologies. Obviously, the drawings in the following description are only for the present application. In the embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.

图1为本申请实施例的基于机器学习的客户续保预测方法流程图；1 is a flowchart of a method for predicting customer renewal based on machine learning according to an embodiment of the application;

图2为本申请实施例的续保预测模型训练方法流程图；2 is a flowchart of a training method for a warranty renewal prediction model according to an embodiment of the application;

图3为本申请实施例的续保预测结果展示方法流程图；FIG. 3 is a flowchart of a method for displaying renewal prediction results according to an embodiment of the application;

图4为本申请实施例的基于机器学习的客户续保预测装置结构示意图；4 is a schematic structural diagram of a device for predicting customer renewal based on machine learning according to an embodiment of the application;

图5为本申请实施例的电子设备结构示意图。FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

具体实施方式Detailed ways

为使本申请的目的、技术方案和优点更加清楚明白，以下结合具体实施例，并参照附图，对本申请进一步详细说明。In order to make the objectives, technical solutions and advantages of the present application more clearly understood, the present application will be further described in detail below with reference to specific embodiments and accompanying drawings.

需要说明的是，除非另外定义，本申请实施例使用的技术术语或者科学术语应当为本申请所属领域内具有一般技能的人士所理解的通常意义。本申请实施例中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性，而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同，而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接，而是可以包括电性的连接，不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系，当被描述对象的绝对位置改变后，则该相对位置关系也可能相应地改变。It should be noted that, unless otherwise defined, the technical terms or scientific terms used in the embodiments of the present application shall have the usual meanings understood by those with ordinary skills in the field to which the present application belongs. "First", "second" and similar words used in the embodiments of the present application do not indicate any order, quantity or importance, but are only used to distinguish different components. "Comprises" or "comprising" and similar words mean that the elements or things appearing before the word encompass the elements or things recited after the word and their equivalents, but do not exclude other elements or things. Words like "connected" or "connected" are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "Up", "Down", "Left", "Right", etc. are only used to represent the relative positional relationship, and when the absolute position of the described object changes, the relative positional relationship may also change accordingly.

如背景技术部分所述，相关技术中续保意愿的预测方案还难以满足需要，一般通过营销员的个人经验来判断客户的续保意愿。申请人在实现本申请的过程中发现现有技术中续保意愿的预测方案至少存在以下问题：仅凭借营销员的个人经验来判断客户的续保意愿，存在不确定性。工作时间较长的营销员的个人经验相对于工作时间较短的营销员来说更丰富，因此，工作时间较长的营销员对客户的续保意愿判断准确性高于工作时间较短的营销员。此外，当需要进行续保意愿预测的客户量较大时，人工判断客户的续保意愿的准确率无法得到保障，且大幅度降低了工作效率，不利于营销员进行展业。As described in the background art section, the prediction scheme of the renewal intention in the related art is still difficult to meet the needs. Generally, the customer's renewal intention is judged by the personal experience of the salesman. In the process of realizing the present application, the applicant found that the prediction scheme of the renewal willingness in the prior art has at least the following problems: there is uncertainty in judging the customer's renewal willingness only based on the personal experience of the marketer. Marketers with longer working hours have richer personal experience than those with shorter working hours. Therefore, marketers with longer working hours are more accurate in judging customers' willingness to renew their insurance than those with shorter working hours. member. In addition, when the number of customers who need to predict the renewal willingness is large, the accuracy of manually judging the customer's renewal willingness cannot be guaranteed, and the work efficiency is greatly reduced, which is not conducive to the salesperson's business development.

有鉴于此，本申请实施例提供一种基于机器学习的客户续保预测方法，从历史续保保单中提取特征数据，并利用该特征数据对机器学习模型进行训练，得到续保预测模型，利用训练好的续保预测模型能够对客户的续保意愿进行快速、稳定地预测。In view of this, the embodiment of the present application provides a method for predicting customer renewal based on machine learning, extracting characteristic data from historical renewal policies, and using the characteristic data to train a machine learning model to obtain a renewal forecasting model, using The trained renewal prediction model can quickly and stably predict the customer's renewal intention.

以下，通过具体实施例来详细说明本申请的技术方案。Hereinafter, the technical solutions of the present application will be described in detail through specific embodiments.

参考图1，本申请一个实施例提供的一种基于机器学习的客户续保预测方法，具体包括以下步骤：Referring to FIG. 1 , a method for predicting customer renewal based on machine learning provided by an embodiment of the present application specifically includes the following steps:

步骤S101，获取用于模型训练的续保保单数据。Step S101, acquiring renewal policy data for model training.

本步骤中，续保是保险合同即将期满时，被保险人向保险人提出申请，要求延长该保险合同的期限或重新办理保险手续的行为。在办理续保手续时，保险人或被保险人都可以根据当前的客观情况或需要，适当增加或减少保险金额，或做其他变动。对于保险公司来说，保单的续保运营情况对保费规模的影响较大，对即将期满的保单的续保状况进行合理把控，有利于提升续保率，为公司持续创费。In this step, renewal of insurance refers to the act of applying to the insurer to extend the term of the insurance contract or to go through the insurance procedures again when the insurance contract is about to expire. When going through the renewal procedures, the insurer or the insured can appropriately increase or decrease the insurance amount or make other changes according to the current objective situation or needs. For insurance companies, the renewal operation of the policy has a greater impact on the premium scale. Reasonable control of the renewal status of the policy that is about to expire will help increase the renewal rate and continue to generate premiums for the company.

保险的类别主要分为两大类：财产险类和人身险类，其中，财产险类可包括：企财险、工程险、车险、责任险、船舶险、货运险、家财险、信用险、保证险、农险；人身险类可包括：意外险，医疗险，重疾险，寿险，子女教育险，养老险，年金保险，团险。不同类型的保险，客户的续保意愿也是不同的，例如：货车司机需要常年驾驶货车，因而货车司机对于意外险的续保意愿较高；身体状况较差的人对于重疾险和医疗险的续保意愿较高。The types of insurance are mainly divided into two categories: property insurance and personal insurance. Among them, property insurance can include: enterprise property insurance, engineering insurance, auto insurance, liability insurance, ship insurance, cargo insurance, home property insurance, credit insurance , Guarantee insurance, agricultural insurance; life insurance categories can include: accident insurance, medical insurance, critical illness insurance, life insurance, children's education insurance, endowment insurance, annuity insurance, group insurance. Different types of insurance have different renewing intentions of customers. For example, truck drivers need to drive trucks all year round, so truck drivers are more willing to renew insurance for accident insurance; people with poor health are more inclined to renew insurance for critical illness and medical insurance. High willingness to renew.

在获取续保保单数据时，续保保单可从每个类别险种中均提取，以提高后续训练好的机器学习模型的预测准确率。保单数据中至少包括：已购保险类别、已购保险名称、期缴保费、营销员、保费总额、客户资产信息、客户年龄信息、客户已购保险的索赔次数等。When obtaining the renewal policy data, the renewal policy can be extracted from each type of insurance to improve the prediction accuracy of the subsequently trained machine learning model. The policy data includes at least: the type of insurance purchased, the name of the insurance purchased, the regular premium, the salesperson, the total premium, customer asset information, customer age information, and the number of claims for the customer’s purchased insurance, etc.

步骤S102，从所述续保保单数据中提取多维特征数据。Step S102, extracting multi-dimensional feature data from the renewal policy data.

本步骤中，续保保单中数据量较大，因此需要选取有利于提高续保预测准确率的保单数据，具体可选择客户基本信息、历史投保记录信息、历史理赔记录信息、保单信息和营销员基本信息，以构成多维特征数据，需要说明的是，也可根据实际情况增加或重新设置多维特征数据，在此不做具体限定。In this step, the amount of data in the renewal policy is relatively large, so it is necessary to select the policy data that is conducive to improving the accuracy of the renewal prediction. Specifically, you can select basic customer information, historical insurance record information, historical claim record information, policy information and salesperson. The basic information is used to form multi-dimensional feature data. It should be noted that multi-dimensional feature data can also be added or reset according to the actual situation, which is not specifically limited here.

步骤S103，根据所述多维特征数据训练预设的机器学习模型，得到续保预测模型。Step S103: Train a preset machine learning model according to the multi-dimensional feature data to obtain a warranty renewal prediction model.

本步骤中，机器学习模型所使用的的算法可以为随机森林算法、Xgboost算法或Wide&Deep算法。In this step, the algorithm used by the machine learning model may be random forest algorithm, Xgboost algorithm or Wide&Deep algorithm.

随机森林算法是指将训练样本输入到每棵决策树中，对于每棵决策树，进行随机且有放回的抽取部分训练样本，作为该树的训练集，最后的分类结果取决于多棵树占多数的分类结果。The random forest algorithm refers to inputting training samples into each decision tree, and for each decision tree, randomly and with replacement part of the training samples are extracted as the training set of the tree, and the final classification result depends on multiple trees. the majority of classification results.

Xgboost算法是一种基于决策树的集成机器学习算法，采用了梯度提升(GradientBoosting)框架，可以用于通过输入参数数据来进行分类或者回归。The Xgboost algorithm is an integrated machine learning algorithm based on decision trees. It adopts the Gradient Boosting framework and can be used for classification or regression by inputting parameter data.

Wide&Deep算法旨在使训练得到的模型同时获得记忆能力和泛化能力。The Wide&Deep algorithm aims to make the trained model gain both memory and generalization capabilities.

以上三种算法中，Xgboost算法为最优选择，其无论是在分类准确性和模型训练以及模型运行速度上，均具有较好的效果。Among the above three algorithms, the Xgboost algorithm is the best choice, and it has good results in terms of classification accuracy, model training and model running speed.

步骤S104，获取目标客户的历史保单数据。Step S104, acquiring historical policy data of the target customer.

本步骤中，目标客户为保险合同期限即将期满的投保人，获取的历史保单数据也是对应投保人即将期满的保单数据，对于保险公司来说，该投保人即为意向客户，对该投保人的续保意愿进行预测，有利于增加保费规模。In this step, the target customer is the policyholder whose insurance contract is about to expire, and the obtained historical policy data is also the policy data corresponding to the policyholder about to expire. Predicting people's willingness to renew insurance is conducive to increasing the scale of insurance premiums.

步骤S105，将所述历史保单数据输入至所述续保预测模型中，输出所述目标客户对应保单的续保预测结果。Step S105: Input the historical policy data into the insurance renewal prediction model, and output the renewal prediction result of the policy corresponding to the target customer.

可见，本实施例提供的基于机器学习的客户续保预测方法，基于机器学习，利用续保的保单对机器学习模型进行训练，以得到训练好的续保预测模型，利用训练好的续保预测模型对待预测客户的续保意愿进行快速、稳定地预测，有助于营销员根据续保预测结果进行展业，对客户进行差异化服务，提高续保率。It can be seen that the machine learning-based customer insurance renewal prediction method provided in this embodiment is based on machine learning, and uses the renewed insurance policy to train the machine learning model, so as to obtain a trained insurance renewal prediction model, and use the trained insurance renewal prediction model. The model can quickly and stably predict the renewal intention of customers to be predicted, which helps marketers to develop businesses according to the renewal prediction results, provide differentiated services to customers, and improve the renewal rate.

在一些实施例中，结合图2，对于前述实施例中的步骤S103，其还可以包括以下步骤：In some embodiments, with reference to FIG. 2 , for step S103 in the foregoing embodiment, it may further include the following steps:

步骤S1031，对所述多维特征数据进行预处理。Step S1031, preprocessing the multi-dimensional feature data.

本步骤中，预处理包括数据清洗和数据降维，处理结果直接影响后续模型的效果。其中，数据清洗包括异常值的处理、空值的处理、缺失值和重复值的处理。对于重复值和异常值可采用直接删除的处理方式；对于异常值可采用直接删除或值替代的处理方式，对于空值可采用值替代的处理方式对缺失值进行填充。In this step, the preprocessing includes data cleaning and data dimensionality reduction, and the processing result directly affects the effect of the subsequent model. Among them, data cleaning includes processing of abnormal values, processing of null values, processing of missing values and duplicate values. For duplicate values and outliers, direct deletion can be used; for outliers, direct deletion or value substitution can be used; for null values, value substitution can be used to fill in missing values.

当清洗的数据量较大时，可设置固定的时间段，对数据进行划分，在一个时间段内，针对一段固定的数据进行清洗，避免实时对数据进行清洗时，由于数据增加所带来的数据清洗次数的显著增加，以免对数据进行多次重复清洗。When the amount of data to be cleaned is large, a fixed period of time can be set to divide the data. In a period of time, a fixed period of data can be cleaned to avoid real-time cleaning of the data, due to the increase of data. Significant increase in the number of data cleanings to avoid multiple repeated cleanings of the data.

数据降维的具体步骤为：对所述多维特征数据进行标准化，均值为0，方差为1；计算协方差矩阵以及所述协方差矩阵的特征值和与所述特征值对应的特征向量；对所述特征值按大小进行排序，选取所述特征值中最大的k个,将其对应的特征向量分别作为行向量组成特征向量矩阵；将所述多维特征数据转化至由k个特征向量构成的新空间中。The specific steps of data dimensionality reduction are: standardizing the multi-dimensional feature data, the mean value is 0, and the variance is 1; calculating the covariance matrix, the eigenvalues of the covariance matrix and the eigenvectors corresponding to the eigenvalues; The eigenvalues are sorted by size, the largest k among the eigenvalues are selected, and the corresponding eigenvectors are used as row vectors to form an eigenvector matrix; in the new space.

采用数据清洗和数据降维的预处理方法对多维特征数据进行处理，使训练数据量大大减少，提高了模型的训练时间和预测精确度，通过降低数据的维数,生成不失原有数据信息量的新特征数据，大大缩短了计算时间，使得模型预测能力提升。The multi-dimensional feature data is processed by the preprocessing method of data cleaning and data dimension reduction, which greatly reduces the amount of training data and improves the training time and prediction accuracy of the model. By reducing the dimension of the data, the original data information is generated without losing the original data information. The large amount of new feature data greatly shortens the calculation time and improves the prediction ability of the model.

此外，可对分布不均衡的样本进行以下处理：利用imbalanced-learn开发包中Random，SMOTE，bSMOTE(1&2)，SVM SMOTE，ADASYN等多种升采样方法对占比低的样本进行升采样操作；采用Random，Tomek links，NearMiss等多种降采样方法对占比高的样本进行降采样操作。当机器学习模型应用的算法为Xgboost算法时，Xgboost算法中存在一个特定参数scale_pos_weight，其功能主要是实现对于不均衡样本，按照设定比例，调整模型的误差函数，增大少数样本的学习率，由此增强低占比样本的误判对模型误差的影响程度，从而达到调节样本比例的效果。In addition, the samples with unbalanced distribution can be processed as follows: use Random, SMOTE, bSMOTE(1&2), SVM SMOTE, ADASYN and other upsampling methods in the imbalanced-learn development package to upsample the samples with a low proportion; Use Random, Tomek links, NearMiss and other downsampling methods to downsample the samples with a high proportion. When the algorithm applied by the machine learning model is the Xgboost algorithm, there is a specific parameter scale_pos_weight in the Xgboost algorithm. Its function is to adjust the error function of the model according to the set ratio for unbalanced samples, and increase the learning rate of a few samples. In this way, the influence of the misjudgment of low proportion samples on the model error is enhanced, so as to achieve the effect of adjusting the proportion of samples.

步骤S1032，将预处理后的多维特征数据划分为训练数据集、验证数据集和测试数据集。Step S1032: Divide the preprocessed multi-dimensional feature data into a training data set, a verification data set and a test data set.

本步骤中，训练数据集、验证数据集和测试数据集的比例可选择7:1:2或8:1:1，也可根据实际的数据量对训练数据集、验证数据集和测试数据集所占比例进行调整，在此不做具体限定。In this step, the ratio of training data set, verification data set and test data set can be selected from 7:1:2 or 8:1:1, or the training data set, verification data set and test data set can be divided according to the actual data volume. The proportion is adjusted, which is not specifically limited here.

步骤S1033，利用所述训练数据集对所述机器学习模型进行训练，每完成一次迭代，通过所述验证集获取所述机器学习模型的准确率，以及通过超参数优化算法对所述机器学习模型的超参数进行优化。Step S1033: Use the training data set to train the machine learning model, and each time an iteration is completed, obtain the accuracy of the machine learning model through the verification set, and use the hyperparameter optimization algorithm to adjust the machine learning model. to optimize the hyperparameters.

本步骤中，可通过网格搜索算法、随机搜索算法和/或贝叶斯优化算法对机器学习模型的超参数进行优化。其中，网格搜索算法是一种超参数搜索方法，具体为：在一个超参数空间内按照固定步长，遍历所有步点上的参数，选取最好的一个超参数点，然后继续在这个超参数点的邻域内，减小步长继续遍历，直至达到所设定的最小步长，所得到的超参数点即为最优超参数点。随机搜索算法则是对超参数进行随机组合，其相比网络搜索算法，由于其具有随机性，有一定概率降低计算量。贝叶斯优化是通过高斯过程回归计算前n个点的后验概率分布，得到模型的每一组超参数在取值点的期望均值和方差，其中均值代表在该点对应的超参数下，模型取得的期望效果，均值越大表示模型最终效果越好；方差表示了效果的不确定性，方差越大表示这个点取得效果的不确定性越大。In this step, the hyperparameters of the machine learning model may be optimized through a grid search algorithm, a random search algorithm and/or a Bayesian optimization algorithm. Among them, the grid search algorithm is a hyperparameter search method, which is specifically: in a hyperparameter space, according to a fixed step size, traverse the parameters on all step points, select the best hyperparameter point, and then continue to use this hyperparameter point. In the neighborhood of the parameter point, reduce the step size and continue to traverse until the set minimum step size is reached, and the obtained hyperparameter point is the optimal hyperparameter point. The random search algorithm is a random combination of hyperparameters. Compared with the network search algorithm, due to its randomness, it has a certain probability to reduce the amount of calculation. Bayesian optimization is to calculate the posterior probability distribution of the first n points through Gaussian process regression, and obtain the expected mean and variance of each set of hyperparameters of the model at the value point, where the mean represents the hyperparameter corresponding to this point, The expected effect of the model, the larger the mean, the better the final effect of the model; the variance represents the uncertainty of the effect, and the larger the variance, the greater the uncertainty of the effect obtained at this point.

步骤S1034，重复执行步骤S1033直至满足预定的迭代次数或所述机器学习模型的准确率满足预设条件，得到训练好的续保预测模型。Step S1034: Step S1033 is repeatedly executed until a predetermined number of iterations is satisfied or the accuracy of the machine learning model meets a preset condition, and a trained warranty renewal prediction model is obtained.

本步骤中，预设条件为准确率阈值，需要说明的是迭代次数和准确率阈值可根据实际情况进行设置，在此不做具体限定。In this step, the preset condition is the accuracy threshold. It should be noted that the number of iterations and the accuracy threshold can be set according to actual conditions, which are not specifically limited here.

步骤S1035，利用所述测试集对所述训练好的续保预测模型的准确率进行验证，得到续保预测结果。Step S1035, using the test set to verify the accuracy of the trained warranty renewal prediction model to obtain a warranty renewal prediction result.

本步骤中，当利用测试集输入续保预测模型，其对应的续保预测结果满足预设的预测准确度，则说明续保预测模型性能较好。In this step, when the test set is used to input the warranty renewal prediction model, and the corresponding warranty renewal prediction result satisfies the preset prediction accuracy, it means that the performance of the warranty renewal prediction model is good.

在一些实施例中，参考图3，对于前述实施例中的步骤S105，其之后还可以包括以下步骤：In some embodiments, referring to FIG. 3 , for step S105 in the foregoing embodiment, the following steps may be included after it:

步骤S301，根据所述续保预测结果确定所述目标客户的续保预测概率。Step S301, determining the renewal prediction probability of the target customer according to the renewal prediction result.

步骤S302，响应于确定所述续保预测概率大于等于预设的续保概率阈值，将所述目标客户标注为续保意向客户并展示。Step S302, in response to determining that the predicted renewal probability is greater than or equal to a preset renewal probability threshold, the target customer is marked as a renewal intention customer and displayed.

本实施例中，通过设置续保概率阈值，选择出续保意愿较高的客户，并将该客户展示给对应的营销员，以便于营销员优先制定展业策略。此外，也可设置多个续保概率阈值，将客户的续保意愿划分为多个等级，有利于进一步提高营销员的展业效率。In this embodiment, by setting the renewal probability threshold, a customer with a higher willingness to renew is selected, and the customer is displayed to the corresponding salesperson, so that the salesperson can preferentially formulate a business development strategy. In addition, multiple insurance renewal probability thresholds can also be set to divide the customer's insurance renewal willingness into multiple levels, which is conducive to further improving the business development efficiency of marketers.

需要说明的是，本申请实施例的方法可以由单个设备执行，例如一台计算机或服务器等。本实施例的方法也可以应用于分布式场景下，由多台设备相互配合来完成。在这种分布式场景的情况下，这多台设备中的一台设备可以只执行本申请实施例的方法中的某一个或多个步骤，这多台设备相互之间会进行交互以完成所述的方法。It should be noted that, the methods in the embodiments of the present application may be executed by a single device, such as a computer or a server. The method in this embodiment can also be applied in a distributed scenario, and is completed by the cooperation of multiple devices. In the case of such a distributed scenario, one device among the multiple devices may only execute one or more steps in the methods of the embodiments of the present application, and the multiple devices will interact with each other to complete all the steps. method described.

需要说明的是，上述对本申请的一些实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下，在权利要求书中记载的动作或步骤可以按照不同于上述实施例中的顺序来执行并且仍然可以实现期望的结果。另外，在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中，多任务处理和并行处理也是可以的或者可能是有利的。It should be noted that some embodiments of the present application are described above. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the above-described embodiments and still achieve desirable results. Additionally, the processes depicted in the figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

基于同一发明构思，与上述任意实施例方法相对应的，本申请还提供了一种基于机器学习的客户续保预测装置。Based on the same inventive concept, and corresponding to the method in any of the above embodiments, the present application also provides a device for predicting customer renewal based on machine learning.

参考图4，所述基于机器学习的客户续保预测装置，包括：Referring to Figure 4, the machine learning-based customer warranty renewal prediction device includes:

第一获取模块，被配置为获取用于模型训练的续保保单数据。The first acquisition module is configured to acquire the renewal policy data for model training.

特征提取模块，被配置为从所述续保保单数据中提取多维特征数据。A feature extraction module configured to extract multi-dimensional feature data from the renewal policy data.

模型训练模块，被配置为根据所述多维特征数据训练预设的机器学习模型，得到续保预测模型。The model training module is configured to train a preset machine learning model according to the multi-dimensional feature data to obtain a warranty renewal prediction model.

第二获取模块，被配置为获取目标客户的历史保单数据。The second acquiring module is configured to acquire historical policy data of the target customer.

作为一个可选的实施例，所述模型训练模块具体被配置为对所述多维特征数据进行预处理；As an optional embodiment, the model training module is specifically configured to preprocess the multidimensional feature data;

利用所述训练数据集对所述机器学习模型进行训练，每完成一次迭代，通过所述验证集获取所述机器学习模型的准确率；以及通过超参数优化算法对所述机器学习模型的超参数进行优化；重复执行上述操作直至满足预定的迭代次数或所述机器学习模型的准确率满足预设条件，得到训练好的续保预测模型；利用所述测试集对所述训练好的续保预测模型的准确率进行验证，得到续保预测结果。The machine learning model is trained by using the training data set, and each iteration is completed, the accuracy of the machine learning model is obtained through the verification set; and the hyperparameters of the machine learning model are adjusted by a hyperparameter optimization algorithm. Perform optimization; repeat the above operations until a predetermined number of iterations are met or the accuracy of the machine learning model meets a preset condition, and a trained warranty renewal prediction model is obtained; use the test set to predict the trained warranty renewal The accuracy of the model is verified, and the renewal prediction results are obtained.

作为一个可选的实施例，所述装置还包括预测结果展示模块(图未示出)，所述预测结果展示模块被配置为根据所述续保预测结果确定所述目标客户的续保预测概率；响应于确定所述续保预测概率大于等于预设的续保概率阈值，将所述目标客户标注为续保意向客户并展示。As an optional embodiment, the apparatus further includes a forecast result display module (not shown in the figure), and the forecast result display module is configured to determine the renewal forecast probability of the target customer according to the renewal forecast result ; In response to determining that the predicted renewal probability is greater than or equal to a preset renewal probability threshold, the target customer is marked as a renewal intention customer and displayed.

作为一个可选的实施例，所述所述预处理包括数据清洗和数据降维。As an optional embodiment, the preprocessing includes data cleaning and data dimensionality reduction.

作为一个可选的实施例，所述超参数优化算法包括下列至少一种：网格搜索算法、随机搜索算法和/或贝叶斯优化算法。As an optional embodiment, the hyperparameter optimization algorithm includes at least one of the following: a grid search algorithm, a random search algorithm and/or a Bayesian optimization algorithm.

作为一个可选的实施例，所述多维特征数据包括：客户基本信息、历史投保记录信息、历史理赔记录信息、保单信息、营销员基本信息。As an optional embodiment, the multi-dimensional feature data includes: basic customer information, historical insurance application record information, historical claim settlement record information, insurance policy information, and basic information of marketers.

作为一个可选的实施例，所述机器学习模型使用的算法包括：随机森林算法、Xgboost算法或Wide&Deep算法。As an optional embodiment, the algorithm used by the machine learning model includes: random forest algorithm, Xgboost algorithm or Wide&Deep algorithm.

为了描述的方便，描述以上装置时以功能分为各种模块分别描述。当然，在实施本申请时可以把各模块的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, when describing the above device, the functions are divided into various modules and described respectively. Of course, when implementing the present application, the functions of each module may be implemented in one or more software and/or hardware.

上述实施例的装置用于实现前述任一实施例中相应的基于机器学习的客户续保预测方法，并且具有相应的方法实施例的有益效果，在此不再赘述。The apparatus of the foregoing embodiment is used to implement the corresponding machine learning-based customer warranty renewal prediction method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which will not be repeated here.

基于同一发明构思，与上述任意实施例方法相对应的，本申请还提供了一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现上任意一实施例所述的基于机器学习的客户续保预测方法。Based on the same inventive concept, corresponding to any of the above-mentioned embodiments, the present application further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the processor, the processor When the program is executed, the method for predicting customer renewal based on machine learning described in any one of the above embodiments is implemented.

图5示出了本实施例所提供的一种更为具体的电子设备硬件结构示意图，该设备可以包括：处理器1010、存储器1020、输入/输出接口1030、通信接口1040和总线1050。其中处理器1010、存储器1020、输入/输出接口1030和通信接口1040通过总线1050实现彼此之间在设备内部的通信连接。FIG. 5 shows a schematic diagram of a more specific hardware structure of an electronic device provided in this embodiment. The device may include: a processor 1010 , a memory 1020 , an input/output interface 1030 , a communication interface 1040 and a bus 1050 . The processor 1010 , the memory 1020 , the input/output interface 1030 and the communication interface 1040 realize the communication connection among each other within the device through the bus 1050 .

处理器1010可以采用通用的CPU(Central Processing Unit，中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit，ASIC)、或者一个或多个集成电路等方式实现，用于执行相关程序，以实现本说明书实施例所提供的技术方案。The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related program to implement the technical solutions provided by the embodiments of this specification.

存储器1020可以采用ROM(Read Only Memory，只读存储器)、RAM(Random AccessMemory，随机存取存储器)、静态存储设备，动态存储设备等形式实现。存储器1020可以存储操作系统和其他应用程序，在通过软件或者固件来实现本说明书实施例所提供的技术方案时，相关的程序代码保存在存储器1020中，并由处理器1010来调用执行。The memory 1020 may be implemented in the form of a ROM (Read Only Memory, read only memory), a RAM (Random Access Memory, random access memory), a static storage device, a dynamic storage device, and the like. The memory 1020 may store an operating system and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 1020 and invoked by the processor 1010 for execution.

输入/输出接口1030用于连接输入/输出模块，以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出)，也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等，输出设备可以包括显示器、扬声器、振动器、指示灯等。The input/output interface 1030 is used to connect the input/output module to realize information input and output. The input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions. The input device may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output device may include a display, a speaker, a vibrator, an indicator light, and the like.

通信接口1040用于连接通信模块(图中未示出)，以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信，也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。The communication interface 1040 is used to connect a communication module (not shown in the figure), so as to realize the communication interaction between the device and other devices. The communication module may implement communication through wired means (eg, USB, network cable, etc.), or may implement communication through wireless means (eg, mobile network, WIFI, Bluetooth, etc.).

总线1050包括一通路，在设备的各个组件(例如处理器1010、存储器1020、输入/输出接口1030和通信接口1040)之间传输信息。Bus 1050 includes a path to transfer information between the various components of the device (eg, processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).

需要说明的是，尽管上述设备仅示出了处理器1010、存储器1020、输入/输出接口1030、通信接口1040以及总线1050，但是在具体实施过程中，该设备还可以包括实现正常运行所必需的其他组件。此外，本领域的技术人员可以理解的是，上述设备中也可以仅包含实现本说明书实施例方案所必需的组件，而不必包含图中所示的全部组件。It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in the specific implementation process, the device may also include necessary components for normal operation. other components. In addition, those skilled in the art can understand that, the above-mentioned device may only include components necessary to implement the solutions of the embodiments of the present specification, rather than all the components shown in the figures.

上述实施例的电子设备用于实现前述任一实施例中相应的基于机器学习的客户续保预测方法，并且具有相应的方法实施例的有益效果，在此不再赘述。The electronic device in the above embodiment is used to implement the corresponding machine learning-based customer warranty renewal prediction method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which will not be repeated here.

基于同一发明构思，与上述任意实施例方法相对应的，本申请还提供了一种非暂态计算机可读存储介质，所述非暂态计算机可读存储介质存储计算机指令，所述计算机指令用于使所述计算机执行如上任一实施例所述的基于机器学习的客户续保预测方法。Based on the same inventive concept and corresponding to any of the methods in the above embodiments, the present application further provides a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions use in causing the computer to execute the method for predicting customer renewal based on machine learning as described in any of the above embodiments.

本实施例的计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括，但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带，磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质，可用于存储可以被计算设备访问的信息。The computer readable medium of this embodiment includes both permanent and non-permanent, removable and non-removable media and can be implemented by any method or technology for information storage. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

上述实施例的存储介质存储的计算机指令用于使所述计算机执行如上任一实施例所述的基于机器学习的客户续保预测方法，并且具有相应的方法实施例的有益效果，在此不再赘述。The computer instructions stored in the storage medium of the above embodiments are used to cause the computer to execute the method for predicting customer renewal based on machine learning as described in any of the above embodiments, and have the beneficial effects of the corresponding method embodiments, which will not be repeated here. Repeat.

所属领域的普通技术人员应当理解：以上任何实施例的讨论仅为示例性的，并非旨在暗示本申请的范围(包括权利要求)被限于这些例子；在本申请的思路下，以上实施例或者不同实施例中的技术特征之间也可以进行组合，步骤可以以任意顺序实现，并存在如上所述的本申请实施例的不同方面的许多其它变化，为了简明它们没有在细节中提供。Those of ordinary skill in the art should understand that the discussion of any of the above embodiments is only exemplary, and is not intended to imply that the scope of the application (including the claims) is limited to these examples; under the idea of the application, the above embodiments or Technical features in different embodiments can also be combined, steps can be implemented in any order, and there are many other variations of different aspects of the embodiments of the present application as described above, which are not provided in detail for the sake of brevity.

另外，为简化说明和讨论，并且为了不会使本申请实施例难以理解，在所提供的附图中可以示出或可以不示出与集成电路(IC)芯片和其它部件的公知的电源/接地连接。此外，可以以框图的形式示出装置，以便避免使本申请实施例难以理解，并且这也考虑了以下事实，即关于这些框图装置的实施方式的细节是高度取决于将要实施本申请实施例的平台的(即，这些细节应当完全处于本领域技术人员的理解范围内)。在阐述了具体细节(例如，电路)以描述本申请的示例性实施例的情况下，对本领域技术人员来说显而易见的是，可以在没有这些具体细节的情况下或者这些具体细节有变化的情况下实施本申请实施例。因此，这些描述应被认为是说明性的而不是限制性的。In addition, to simplify description and discussion, and to not obscure the understanding of the embodiments of the present application, well-known power/power sources associated with integrated circuit (IC) chips and other components may or may not be shown in the provided figures. ground connection. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the embodiments of the present application, and this also takes into account the fact that details regarding the implementation of these block diagram devices are highly dependent on the implementation of the embodiments of the present application platform (ie, these details should be well within the understanding of those skilled in the art). Where specific details (eg, circuits) are set forth to describe exemplary embodiments of the present application, it will be apparent to those skilled in the art that these specific details may be used without or with changes to the specific details The embodiments of the present application are implemented below. Accordingly, these descriptions are to be considered illustrative rather than restrictive.

尽管已经结合了本申请的具体实施例对本申请进行了描述，但是根据前面的描述，这些实施例的很多替换、修改和变型对本领域普通技术人员来说将是显而易见的。例如，其它存储器架构(例如，动态RAM(DRAM))可以使用所讨论的实施例。Although the present application has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations to these embodiments will be apparent to those of ordinary skill in the art from the foregoing description. For example, other memory architectures (eg, dynamic RAM (DRAM)) may use the discussed embodiments.

本申请实施例旨在涵盖落入所附权利要求的宽泛范围之内的所有这样的替换、修改和变型。因此，凡在本申请实施例的精神和原则之内，所做的任何省略、修改、等同替换、改进等，均应包含在本申请的保护范围之内。The embodiments of the present application are intended to cover all such alternatives, modifications and variations that fall within the broad scope of the appended claims. Therefore, any omission, modification, equivalent replacement, improvement, etc. made within the spirit and principle of the embodiments of the present application shall be included within the protection scope of the present application.

Claims

1. A client renewal prediction method based on machine learning is characterized by comprising the following steps:

acquiring continuous insurance policy data for model training;

extracting multi-dimensional characteristic data from the renewal warranty data;

training a preset machine learning model according to the multi-dimensional feature data to obtain a continuous prediction model;

acquiring historical policy data of a target client;

and inputting the historical policy data into the renewal prediction model, and outputting a renewal prediction result of the policy corresponding to the target customer.

2. The prediction method according to claim 1, wherein the training of a preset machine learning model according to the multidimensional feature data to obtain a continuous prediction model comprises:

preprocessing the multi-dimensional feature data;

dividing the preprocessed multidimensional characteristic data into a training data set, a verification data set and a test data set;

training the machine learning model by using the training data set, and acquiring the accuracy of the machine learning model through the verification set every time iteration is completed; and

optimizing the hyper-parameters of the machine learning model through a hyper-parameter optimization algorithm;

repeatedly executing the operation until the preset iteration times are met or the accuracy of the machine learning model meets the preset condition, and obtaining a trained continuous prediction model;

and verifying the accuracy of the trained continuous prediction model by using the test set to obtain a continuous prediction result.

3. The forecasting method of claim 1, wherein the step of inputting the historical policy data into the renewal prediction model and outputting the renewal prediction result for the policy corresponding to the target customer further comprises:

determining the renewal prediction probability of the target client according to the renewal prediction result;

in response to determining that the renewal prediction probability is greater than or equal to a preset renewal probability threshold, marking the target customer as an renewal intent customer and presenting.

4. The prediction method of claim 2, wherein the preprocessing comprises data cleansing and data dimensionality reduction.

5. The prediction method of claim 2, wherein the hyper-parametric optimization algorithm comprises at least one of: a grid search algorithm, a random search algorithm, and/or a bayesian optimization algorithm.

6. The prediction method according to any one of claims 1 to 5, wherein the multi-dimensional feature data comprises: the system comprises client basic information, historical insurance application record information, historical claim settlement record information, insurance policy information and marketer basic information.

7. The prediction method according to any one of claims 1 to 5, wherein the machine learning model uses an algorithm comprising: a random forest algorithm, an Xgboost algorithm, or a Wide & Deep algorithm.

8. A client renewal prediction device based on machine learning, comprising:

a first acquisition module configured to acquire renewal sheet data for model training;

a feature extraction module configured to extract multi-dimensional feature data from the renewal warranty data;

the model training module is configured to train a preset machine learning model according to the multi-dimensional feature data to obtain a continuous prediction model;

a second obtaining module configured to obtain historical policy data of the target customer;

and the renewal prediction module is configured to input the historical policy data into the renewal prediction model and output a renewal prediction result of the policy corresponding to the target customer.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the program.

10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 7.