TW201947510A

TW201947510A - Insurance service risk prediction processing method, device and processing equipment

Info

Publication number: TW201947510A
Application number: TW108105617A
Authority: TW
Inventors: 吳龍鳳; 陳龑; 石秋慧; 張泰瑋; 陳詩奕
Original assignee: 香港商阿里巴巴集團服務有限公司
Priority date: 2018-05-16
Filing date: 2019-02-20
Publication date: 2019-12-16
Also published as: WO2019218748A1; CN108694673A

Abstract

An insurance service risk prediction processing method, device and processing equipment: the calculation of a gradient boosting decision tree may be introduced into the prediction of insurance service risks, which not only is compatible with risk prediction processing for insurance service data of a non-linear relationship among insurance services, but may also output the relative risk size relationship after risk prediction, wherein a risk prediction result after sorting indicates the relative size of risk between different users, which may provide another more reliable implementation solution for predicting risk of insurance services.

Description

Processing method, device and processing equipment for insurance business risk prediction

本說明書實施例方案屬於保險業務風險預測的電腦資料處理技術領域，尤其涉及一種保險業務風險預測的處理方法、裝置及處理設備。The solutions in the embodiments of the present specification belong to the technical field of computer data processing for insurance business risk prediction, and more particularly, to a method, device, and processing device for insurance business risk prediction.

機動車輛保險即汽車保險(或簡稱車險)，是指對機動車輛由於自然災害或意外事故所造成的人身傷亡或財產損失負賠償責任的一種商業保險。隨著經濟的發展，機動車輛的數量不斷增加，當前，車險已成為中國財產保險業務中最大的險種之一。
用戶在進行車輛投保時，保險公司通常會對用戶進行風險評估，風險評估的結果會直接影響到用戶投保金額、優惠待遇等。藉由對用戶的風險評估，保險公司可以更加準確、合理的進行保險業務的處理，有效規避或減少業務風險。目前，在車險風險預測領域，基於廣義線性模型(generalized linear model，GLM)的風險預測已成為業內的主流風險預測技術體系。廣義線性模型主要處理的為線性相關的資料對象，如上網時間長度降低1個百分點，年齡增大1歲，可以基於網齡資料與年齡資料的線性關係實現GLM的建模。
但隨著車險業務的不斷增加，網際網路資料已呈現多種類、海量資料增長，傳統的GLM模型體系已越來越受到限制，例如如果“年齡”不是單純隨上網時間長度變化而變化，而是同時與人群的購物以及習慣等方面相關，不同消費習慣隨自身變化改變年齡分佈呈非線性影響的形式。GLM模型可以藉由分箱將非線性變量進行分段匯總，但是會損失很多變量的精準性，難以適應當前大數據、多維度的風險預測要求。因此，業內極需一種可以在多維度資料中更加有效和高效的進行車險業務風險預測的處理方式。Motor vehicle insurance, or car insurance for short, refers to a type of commercial insurance that is liable for compensation for personal injury or property damage caused by a natural disaster or accident in a motor vehicle. With the development of the economy, the number of motor vehicles continues to increase. At present, auto insurance has become one of the largest types of insurance in China's property insurance business.
When a user insures a vehicle, an insurance company usually performs a risk assessment on the user. The result of the risk assessment will directly affect the user's insurance amount and preferential treatment. By assessing the risk of users, insurance companies can handle insurance business more accurately and reasonably, effectively avoiding or reducing business risks. At present, in the field of automobile insurance risk prediction, risk prediction based on a generalized linear model (GLM) has become the mainstream risk prediction technology system in the industry. The generalized linear model mainly deals with linearly related data objects. For example, the length of time spent on the Internet is reduced by 1 percentage point, and the age is increased by 1 year. GLM modeling can be realized based on the linear relationship between Internet age data and age data.
However, with the continuous increase of auto insurance business, the Internet data has shown a variety of types and massive data growth, and the traditional GLM model system has become more and more restricted. It is related to the shopping and habits of the crowd at the same time. The age distribution of different consumption habits changes with their own changes in a non-linear manner. The GLM model can summarize non-linear variables by binning, but it will lose the accuracy of many variables, making it difficult to adapt to current big data and multi-dimensional risk prediction requirements. Therefore, there is a great need in the industry for a processing method that can more effectively and efficiently perform risk prediction of auto insurance business in multi-dimensional data.

本說明書實施例目的在於提供一種保險業務風險預測的處理方法、裝置及處理設備，可以藉由在保險業務風險預測中引入演算梯度提升決策樹，不僅兼容保險業務中非線性關係的保險業務資料的風險預測處理，還可以輸出風險預測後的相對風險大小關係，排序後的風險預測結果表徵的是不同用戶之間風險的相對大小，可以提供另一種更加可靠的保險業務的風險預測實施方案。
本說明書實施例提供的一種保險業務風險預測的處理方法、裝置及處理設備是包括以下方式實現的：
一種保險業務風險預測的處理方法，所述方法包括：
獲取待預測用戶的目標風險關聯資料；
利用構建的風險排序模型對所述目標風險關聯資料進行處理，輸出所述待預測用戶的風險相對值，所述風險排序模型包括：利用已進行標識的風險關聯資料對演算梯度提升決策樹進行訓練確定的排序模型。
一種保險業務風險預測的處理方法，包括：
取待預測用戶的目標風險關聯資料；
利用構建的風險排序模型對所述目標風險關聯資料進行處理，輸出所述待預測用戶的風險相對值，所述風險相對值表徵在指定用戶集合中用戶之間的相對風險大小關係。
一種保險業務風險預測處理裝置，包括：
預測資料獲取模組，用於獲取待預測用戶的目標風險關聯資料；
風險預測模組，用於利用構建的風險排序模型對所述目標風險關聯資料進行處理，輸出所述待預測用戶的風險相對值，所述風險排序模型包括：利用已進行標識的風險關聯資料對演算梯度提升決策樹進行訓練確定的排序模型。
一種保險業務風險預測處理設備，包括處理器以及用於儲存處理器可執行指令的記憶體，所述處理器執行所述指令時實現：
獲取待預測用戶的目標風險關聯資料；
利用構建的風險排序模型對所述目標風險關聯資料進行處理，輸出所述待預測用戶的風險相對值，所述風險排序模型包括：利用已進行標識的風險關聯資料對演算梯度提升決策樹進行訓練確定的排序模型。
一種保險業務風險預測處理設備，包括處理器以及用於儲存處理器可執行指令的記憶體，所述處理器執行所述指令時實現：
獲取待預測用戶的目標風險關聯資料；
利用構建的風險排序模型對所述目標風險關聯資料進行處理，輸出所述待預測用戶的風險相對值，所述風險相對值表徵在指定用戶集合中用戶之間的相對風險大小關係。
本說明書實施例提供的一種保險業務風險預測的處理方法、裝置及處理設備，可以獲取待預測用戶的目標風險關聯資料，然後利用利用構建的風險排序模型對所述目標風險關聯資料進行處理，輸出所述待預測用戶的在風險相對值，所述風險排序模型包括：利用已進行標識的風險關聯資料對演算梯度提升決策樹進行訓練確定的排序模型。利用本說明書實施例提供的方法，可以藉由在保險業務風險預測中引入演算梯度提升決策樹，不僅兼容保險業務中非線性關係的保險業務資料的風險預測處理，還可以輸出風險預測後的相對風險大小關係，排序後的風險預測結果表徵的是不同用戶之間風險的相對大小，可以提供另一種更加可靠的保險業務的風險預測實施方案。The purpose of the embodiments of the present specification is to provide a method, a device and a processing device for insurance business risk prediction, which can improve the decision tree by introducing a calculus gradient into the insurance business risk prediction, which is not only compatible with the insurance business data of non-linear relationships in the insurance business. The risk prediction process can also output the relative risk magnitude relationship after the risk forecast. The sorted risk prediction result represents the relative magnitude of the risk between different users, which can provide another more reliable risk prediction implementation solution for insurance business.
An insurance business risk prediction processing method, device, and processing equipment provided by the embodiments of this specification are implemented in the following ways:
A method for processing insurance business risk prediction, the method includes:
Obtain target risk related data of users to be predicted;
Use the constructed risk ranking model to process the target risk related data, and output the relative risk value of the user to be predicted, the risk ranking model includes: using the identified risk related data to train the calculus gradient promotion decision tree Determined sorting model.
A method for processing insurance business risk prediction includes:
Obtain target risk related data of users to be predicted;
The target risk related data is processed by using the constructed risk ranking model, and the relative risk value of the user to be predicted is output, and the relative risk value represents a relative risk magnitude relationship between users in a specified user set.
An insurance business risk prediction processing device includes:
Prediction data acquisition module, used to obtain target risk-related data of users to be predicted;
A risk prediction module is configured to process the target risk related data by using a constructed risk ranking model, and output the relative risk value of the user to be predicted. The risk ranking model includes: using the identified risk related data pair The calculus gradient boosts the decision tree for training to determine the ranking model.
An insurance business risk prediction processing device includes a processor and a memory for storing processor-executable instructions. When the processor executes the instructions, the processor implements:
Obtain target risk related data of users to be predicted;
Use the constructed risk ranking model to process the target risk related data, and output the relative risk value of the user to be predicted, the risk ranking model includes: using the identified risk related data to train the calculus gradient promotion decision tree Determined sorting model.
An insurance business risk prediction processing device includes a processor and a memory for storing processor-executable instructions. When the processor executes the instructions, the processor implements:
Obtain target risk related data of users to be predicted;
The target risk related data is processed by using the constructed risk ranking model, and the relative risk value of the user to be predicted is output, and the relative risk value represents a relative risk magnitude relationship between users in a specified user set.
An insurance business risk prediction processing method, device, and processing device provided in the embodiments of the present specification can obtain target risk-related data of users to be predicted, and then use the constructed risk ranking model to process the target risk-related data and output The relative risk value of the user to be predicted, and the risk ranking model includes a ranking model determined by training the calculus gradient promotion decision tree by using the identified risk-related data. Using the method provided by the embodiment of this specification, the decision tree can be improved by introducing a calculus gradient into the insurance business risk prediction, which is not only compatible with the risk prediction processing of insurance business data with non-linear relationships in the insurance business, but also can output the relative value after the risk prediction. The relationship between risk magnitudes, and the sorted risk prediction results represent the relative magnitude of risks between different users, which can provide another more reliable implementation of risk prediction for insurance business.

為了使本技術領域的人員更好地理解本說明書中的技術方案，下面將結合本說明書實施例中的圖式，對本說明書實施例中的技術方案進行清楚、完整地描述，顯然，所描述的實施例僅僅是本說明書中的一部分實施例，而不是全部的實施例。基於本說明書中的一個或多個實施例，本領域具有通常知識者在沒有作出創造性勞動前提下所獲得的所有其他實施例，都應當屬於本說明書實施例保護的範圍。
隨著電腦網際網路技術的發展，資料量飛速增長。保險業務風險預測時資料特徵的分類也越來越多維度、細緻化。很多變量對篩選分類的影響是以非線性存在的，例如上網時間長度和年齡呈現相關性，但是該相關性可以是多種多樣的。例如可以是簡單的線性關係，例如上網時間長度降低1個百分點，年齡增大1歲；也可以是比較複雜的關係，例如指數關係，上網時間長度降低4個百分點，年齡增大2歲，此時可以藉由一定數學變化轉化為線性的均可以用廣義線性模型解決。現實生活中，除了一些基本線性關係的變量外，還存在大量非線性變量。例如在預測年齡時，如果“年齡”不是單純隨上網時間長度變化而變化，而是同時與人群的購物以及習慣等方面相關，不同消費習慣隨自身變化改變年齡分佈呈非線性影響的形式。因為預測“用戶年齡”是目的之一，若一些線性關係預測模型無法識別非線性關係將會大幅降低模型的預測性能。現有解決的方式中，可以藉由分箱將變量進行分段匯總，但是會損失很多變量的精準性，降低預測結果。本說明書實施例提供的區別於現有常規實施方案的另一種保險業務中風險預測的實現方法，引入了LambdaMART (Lambda Multiple Additive Regression Tree，λ演算梯度提升決策樹，或λ-梯度提升決策樹)，可以在風險預測中合理有效地應用非線性變量構建風險排序模型，該模型能很好地兼容線性和非線性變量，並且可以綜合風險關係，直接模型輸出同一用戶集合中用戶相對風險高低關係，使其盡可能接近真實情況，預測結果可靠性有著顯著的提升。需要注意的是，為便於描述，本說明書中可以將LambdaMART稱為演算梯度提升決策樹。
這裡所述的風險相對值可以包括輸出某個用戶在一個用戶集合中的相對用戶集合中其他用戶的相對風險高低關係，後續的可以根據風險相對值具體的數值進行排序。例如，本說明書的一些實施例中使用LambdaMART模型預測的用戶A的風險相對值為0.6，這裡的風險相對值的取值範圍為[0,1]，則可以表示該用戶A=0.6的風險大於用戶B=0.58的風險，此時的數值0.6或0.8並非代表具體的賠付率或預測的風險絕對值，表示的是在某些用戶集合中各個用戶之間相對的風險大小。如在一些應用場景下，雖然A的風險大於B的風險，但A和B的實際賠付率很低或者預測的風險絕對值很低，A和B都可能基於實際賠付率從而符合風險評估要求。
上述中所述的用戶集合具體的可以根據處理的資料或實際應用場景的預測需求進行劃分，如屬於同一個保險公司的用戶可以作為一個用戶集合，或者屬於同一個保險種類的用戶可以作為一個用戶集合，或者以構建的風險排序模型響應的所有用戶作為一個用戶集合，或者指定的多個用戶為作為一個用戶集合。一般的，一個用戶集合在模型訓練時可以為同一個保險公司或保險服務方的用戶，在線上預測使用時可以不做限定，如可以輸出不同保險公司的多個用戶之間相對風險大小關係。
下面以一個具體的車險業務風險預測處理的應用場景為例對本說明書實施方案進行說明。具體的，圖1是本說明書提供的所述保險業務風險預測的處理方法實施例的流程示意圖。雖然本說明書提供了如下述實施例或圖式所示的方法操作步驟或裝置結構，但基於常規或者無需創造性的勞動在所述方法或裝置中可以包括更多或者部分合併後更少的操作步驟或模組單元。在邏輯性上不存在必要因果關係的步驟或結構中，這些步驟的執行順序或裝置的模組結構不限於本說明書實施例或圖式所示的執行順序或模組結構。所述的方法或模組結構的在實際中的裝置、伺服器或終端產品應用時，可以按照實施例或者圖式所示的方法或模組結構進行順序執行或者並行執行(例如並行處理器或者多線處理的環境、甚至包括分散式處理、伺服器集群的實施環境)。
當然，下述車險業務風險預測的實施例的描述並不對基於本說明書的其他可擴展到的技術方案構成限制。例如其他的實施場景中，本說明書提供的實施方案同樣可以應用到基金風險排序、醫療保險風險排序等的實施場景中，其他實施場景中的應用參照本說明書車險業務的實施例描述，不再進行替代性的重複描述。具體的一種實施例如圖1所示，本說明書提供的一種保險業務風險預測處理方法可以包括：
S0：獲取待預測用戶的目標風險關聯資料；
S2：利用構建的風險排序模型對所述目標風險關聯資料進行處理，輸出所述待預測用戶的風險相對值，所述風險排序模型包括：利用已進行標識的風險關聯資料對演算梯度提升決策樹進行訓練確定的排序模型。
所述的MART是GBDT(Gradient Boosting Decision Tree，梯度提升決策樹)的另一種說法，是一種迭代的決策樹演算法，該演算法由多棵決策樹組成，所有樹的結論累加起來做最終答案。GBDT中的樹都是回歸樹，可以用來做回歸預測。本說明書提供的保險業務風險預測的處理方法中，底層訓練模型可以使用GBDT非線性關係演算法模型，可以預先使用已進行標識的風險關聯資料構建決策樹模型，藉由回歸的機器學習(分佈迭代)對決策樹中的參數逐步調整最佳化。當模型預測結果符合保險業務風險預測的精度要求時，可以線上使用來預測待預測用戶的風險相對值。
Lambda是MART求解過程使用的梯度，其物理含義可以表示一個待排序的文件下一次迭代應該排序的方向(向上或者向下)和強度。GBDT的核心思想是在不斷的迭代中，新一輪迭代產生的回歸決策樹模型擬合損失函數的梯度，最終將所有的回歸決策樹疊加得到最終的模型。LambdaMART使用一個特殊的Lambda值來代替上述梯度，也就是將LambdaRank演算法與MART演算法加和起來。將MART和Lambda組合起來可以作為本說明書一些實施例中使用的LambdaMART。MART的原理是直接在函數空間對函數進行求解，模型結果可以由許多棵樹組成，每棵樹的擬合目標是損失函數的梯度。LambdaMART的框架(底層模型)是基於MART，主要在於中間計算的梯度使用的是Lambda。在LambdaMART中，MART設置的參數可以包括：樹的數量M、葉子節點數L和學習率v等，這些參數可以藉由驗證集調節獲取最優參數。
MART支持“熱啟動”，即可以在已經訓練好的模型基礎上繼續訓練，在剛開始的時候藉由初始化加載進來即可。下面簡單介紹LambdaMART通常的執行補正：
1、每棵樹的訓練會先遍歷所有的訓練資料(label不同的文件pair)，計算每個pair互換位置導致的指標變化以及Lambda，即，然後計算每個文件的Lambda，再計算每個的導數wi，用於後面的Newton step求解葉子節點的數值。
2、創建回歸樹擬合第一步產生的，劃分樹節點的標準是MSE(mean squared error，均方誤差)，產生一顆葉子節點數為L的回歸樹。
3、對第二步產生的回歸樹，計算每個葉子節點的數值，採用Newton step求解，即對落入該葉子節點的文件集，用公式計算該葉子節點的輸出值。
4、更新模型，將當前學習到的回歸樹加入到已有的模型中，用學習率v(也叫shrinkage係數)做regularization。
具體的一個示例中，LambdaMART的演算法過程可以如圖2所示，主要包括以下幾個步驟：
1)初始值的確定。模型可以根據初始化的底層模型迭代更新；如果沒有底層模型的話，那麼初始值全部為0；
2)遍歷訓練集，對每個文件計算它的lambda梯度(λ)，用於後續牛頓迭代法求解葉子節點的偏導；
3)利用上述梯度資訊，產生決策樹。劃分樹節點的標準可以參考MSE，產生一顆葉子節點為L的回歸樹R。
4)根據牛頓迭代法，計算葉子節點的值；
5)更新模型，根據學習率η更新每個文件的得分。
本說明書提供實施例中使用的lambdaMart，區別於傳統回歸模型的單用戶絕對相關性，lambdaMart考慮了給定條件下，所有用戶的綜合風險關係，直接求解，結果更全面。一般的，不同的保險公司之間定價有一定的區別，lambdaMart構造訓練樣本時，在同一個保險公司間構造序關係，模型訓練是根據資料之間的相對關係，而不是絕對數值，因此對正負樣本比例失衡不敏感。相對回歸模型的絕對數值，這種序關係可以更加準確地表徵用戶的風險水平。另外，本說明書實施例lambdaMart中利用了GBDT的回歸模型，可以對保險業務中模型輸入的非線性關係的用戶特徵資料進行風險預測處理，在車險用戶風險預測具有較強的適用性，風險預測的結果也更加準確和可靠。
在具體實施過程中，本說明書的一個或多個實施例中，可以預先構建基於LambdaMART的風險排序模型。具體的底層使用的GBDT模型的訓練和構建可以根據實際業務場景需求和資料進行相應的模型結構和參數設置，如可以以單棵樹進行單獨訓練，訓練的殘差作為另一個樹的輸入繼續進行訓練；或者多棵樹多級連接進行訓練，訓練殘差再作為另一個多級連接的數的輸入。當然，其他的實施例中還可以應用基於GBDT演算法進行一些變形、變換或改進的LambdaMART實現的非線性關係的保險業務資料的風險預測處理，本說明書不再對LambdaMART模型構建的實現過程逐一贅述。
本實施例中可以預先根據歷史車險業務保單資料採集確定風險排序模型的訓練資料，根據風險劃分或者設置要求對訓練資料進行進行標識。在本實施例保險業務風險預測的實施場景中，所述的訓練資料可以稱為風險關聯資料，這些風險關聯資料通常與保險業務相關聯，用於對風險排序模型的樣本訓練。例如風險關聯可以為包括多個維度的用戶特徵資料，一個用戶相關聯的用戶特徵資料為一組訓練資料，每組風險關聯資料可以進行標識設置相應的風險分值。具體的，本說明書所述方法的一個實施例中，所述風險關聯資料可以包括與至少一個類別的用戶特徵資料，所述用戶特徵資料包括與保險業務相關聯的非線性關係的資料資訊。例如一個示例中，用戶A的風險關聯資料可以包括(A1，A2，A3…，A9)9個維度的用戶特徵資料。可以根據車險預測的需求相應的選取不同維度的用戶特徵資料，例如上述示例的9個維度可以包括年齡，性別，職業，年收入，歷史出險次數，月均消費，徵信等級，婚姻狀況，負債資產。或者可以預先採集獲取10個或10個維度以上的用戶特徵資料，在確定風險關聯資料時從多個維度的用戶特徵資料中選取需要進行模型訓練的用戶特徵資料。例如，具體的風險關聯資料可以包括如下表1所示：

當然，其他的實施例中，所述的風險關聯資料還可以包括按照預定規則產生的人工資料，例如作業人員可以根據預期的風險可能包括的情況自定義設置進行模型訓練的風險關聯資料。或者，在設置的資料產生規則後由電腦自動產生所需的風險關聯資料。這裡的產生的人工資料更加符合預期的風險預測情況，而歷史車險案件資料則更接近真實的風險情況，一些實施應用場景中，可以使用其中的一種或者，同時結合人工資料和歷史車險案件資料進行風險排序模型的訓練，以提高預測結果的準確性。
獲取的風險關聯資料可以作為訓練資料在GBDT模型中進行訓練，經過學習訓練後風險排序模型中決策樹分枝時的決策特徵的閾值(可以是全部的閾值，或者部分的閾值)能滿足模型最終實際進行標識的各個用戶風險大小的關係(通常還可以要求連續穩定的輸出)。本說明書實施例中使用的GBDT是一種迭代的決策樹演算法，主要可以分為決策樹(Regression Decision Tree，DT)和梯度提升(Gradient boosting，GB)。決策樹主要分為兩類：分類樹和回歸樹，分類樹常用來解決分類問題，比如用戶性別、網頁是否是垃圾頁面、用戶是不是作弊等。而回歸樹一般用來預測真實數值，比如用戶的年齡、用戶點擊的概率、網頁相關程度等等。前者用於分類標籤紙，後者用於預測實數值。這裡要強調的是，回歸樹的結果加減是有意義的，如10歲+5歲-3歲=12歲，後者則是沒有辦法累加或累加結果無意義，如男+男+女=到底是男是女。回歸樹大致流程與分類樹類似，區別在於，回歸樹的每一個節點都會得到一個預測值，以年齡為例，該預測值等於屬於這個節點的所有人年齡的平均值。分枝時窮舉每一個特徵尋找最優切分變量和最優切分點，本實施例中衡量的準則不再是分類樹中的吉尼係數，而是平方誤差最小化。也就是被預測錯誤的人數越多，平方誤差就越大，藉由最小化平方誤差找到最可靠的分枝依據。分枝直到每個葉子節點上人的對遊戲感興趣都是唯一的或者達到預設的終止條件(如葉子個數上限)，若最終葉子節點上年齡不是唯一的，則以該節點上所有人的平均年齡作為該葉子節點的預測結果。
梯度提升(Gradient boosting)是一種用於回歸、分類和排序任務的機器學習技術，屬於Boosting演算法族的一部分。Boosting是一族可將弱學習器提升為強學習器的演算法，屬於集合學習(ensemble learning)的範疇。Boosting方法基於這樣一種思想：對於一個複雜任務來說，將多個專家的判斷進行適當的綜合所得出的判斷，要比其中任何一個專家單獨的判斷要好。通俗地說，就是“三個臭皮匠，勝過一個諸葛亮”的道理。梯度提升同其他boosting方法一樣，藉由集合(ensemble)多個弱學習器，通常是決策樹，來構建最終的預測模型。boosting方法藉由分步迭代(stage-wise)的方式來構建模型，在迭代的每一步構建的弱學習器都是為了彌補已有模型的不足。
例如具體的一個處理過程中，訓練的時候可以設定樹的棵樹，樹的棵樹達到指定數值後(如八十棵)可以停止訓練了；或者殘差很小(滿足停止訓練的條件)的時候，這兩個條件滿足一個訓練就可以停止訓練。
若在第N棵殘差不全為0或不滿足停止條件時，使用第N棵樹的節點的殘差結果替代相應的原值代入到第N+1棵樹中進行學習；
直至第N+K顆數葉子節點的殘差和預測值相等或小於閾值，輸出當前葉子節點對應的預測數值。具體的可以將所有殘差累加作為預測數值。
本實施例中，可以預先確定訓練使用的決策樹的數量，藉由梯度迭代逐漸最佳化確定一顆決策樹進行分枝時的決策特徵的閾值。如可以使用80棵決策樹，每棵樹每一棵樹學的是之前所有樹結論和的殘差。初始的數的閾值可以根據經驗值進行設置。假如A的真實分值(進行標識分值為80分)，但第一棵樹的根據年齡的決策特徵預測分值是60分，差了20分，殘差為20。那麼在第二棵樹(決策特徵為用戶的職業)裡把A的分值設為20分去學習，如果第二棵樹真的能把A分到20分的葉子節點，那累加兩棵樹的結論就是A的真實分值(預測分值60分+殘差20分)；如果第二棵樹的結論是18分，則A仍然存在2分殘差，第三棵樹(決策特徵為年收入)裡A的年齡就變成2分，繼續學習。每一步的殘差計算相當於變相地增大了分錯事件的權重，而已經分對的時間則都趨向於0，如，根據年齡過大或過小，則風險越大，以及，收入越高風險越小；如果一個用戶年齡過大為60歲，但被分入了風險較小的分支L1，但風險較小的分組L1上的平均年齡在20-40歲之間，則得到的殘差值就會相應的增大，該用戶可以藉由後續的收入、婚姻狀況、駕齡等逐漸將其分向靠近實際風險的葉子節點。
若訓練的決策樹的數量達到預定數值後，如從根節點一直到葉子節點的10棵樹均訓練一遍，或者當前數的參數滿足停止訓練條件，如殘差為0或者其他殘差停止閾值，此時可以停止該組資料的訓練。當每個閾值找最好的分割點，或者符合訓練要求的分割點，則可以確定決策樹的決策特徵的閾值，直至調整後的所述閾值滿足風險排序模型的預測結果輸出要求時，確定所述風險排序模型。例如初始設置風險分值分為60和80的閾值為年齡是否大於20歲。經過大量資料訓練最佳化後，最終可以將從年齡維度進行風險評估這個決策特徵調整年齡是否大於24歲，以符合多數情況下的真實預測結果。
另一種實施例中，確定風險排序模型使用的決策樹的數量時，可以基於所述用戶特徵資料對應的類別的數量確定。例如選取了80個維度的用戶特徵資料，每一個維度可以代表一棵樹的決策特徵，這樣可以使用80個決策樹來構建非線性的風險排序模型。當然，本說明書其他的實施例中，具體決策樹的總數量可以根據採集是資料、樹的分支數、樹的上下級連接關係等進行確定。
本說明書提供的一種實施例中，所述風險排序模型訓練使用的風險關聯資料為屬於相同用戶集合的用戶特徵資料，且包括至少一個類別的用戶特徵資料，所述用戶特徵資料包括與保險業務相關聯的非線性關係的資料資訊。
本實施例中使用的排序模型，該模型在訓練時利用同一個保險公司的用戶的賠付率構造這些用戶間的相對風險大小關係，由於同一個保險公司裡的用戶資料是完全可以比較的，這就保證了訓練資料真實可靠，因此相對傳統的回歸模型，本實施例基於排序模型提供了另一種風險預測的識別處理方式，排序模型的結果更真實可信。同時，利用本說明書實施例提供的方法可以合理有效地應用保險業務中多維度的非線性關係變量，基於梯度提升決策樹的非線性關係的風險排序模型可以很好地兼容線性和非線性變量，相對於傳統的線性模型，預測結果的準確性有著顯著的提升，有效彌補傳統線性模型的不足，提高保險業務服務體驗
本說明書的另一個實施例中，具體的，所述風險排序模型在訓練過程中使用的風險關聯資料包括：
S20：將用戶歸屬標籤相同的用戶特徵資料構造為一個樣本訓練集合，所述用戶特徵資料中包括與保險業務相關聯的非線性關係的資料資訊；
相應的，所述進行訓練包括：輸入第一樣本訓練集合中第一用戶的特徵資料，輸出所述第一用戶在第一用戶集合中的風險相對值，所述第一用戶集合為在所述第一樣本訓練集合中所包括的用戶。
所述的用戶歸屬標籤可以表示用戶的分佈或歸屬分類，例如一個保險公司的用戶使用相應的用戶歸屬標籤，可以為用戶標記該用戶屬於保險公司A的用戶歸屬標籤。
上述中所述的第一樣本訓練集合、第一用戶集合、第一用戶，主要是將當前處理的訓練集與其他訓練集區別開來，並不特指某個集合。以此類推的，另一個保險公司B的用戶的特徵資料構成的訓練集可以稱為第二樣本訓練集合，屬於保險公司B的用戶的集合可以稱為第二用戶集合。在模型訓練中，可以輸入各家保險公司分別構造出來的訓練集合集，輸出待預測用戶的風險相對大小。
所述的風險排序模型可以單個的預測出每個用戶的風險相對值，可以直接最佳化模型輸出的屬於同一用戶集合中用戶相對風險高低關係。因此，所述方法的另一個實施例中，還可以包括：
基於所風險相對值，確定指定用戶集合中用戶之間的相對風險大小關係；
輸出所述相對風險大小關係的資料資訊。
例如一個示例中，可以使用構建好的風險排序模型分別對用戶A、B、C、D的特徵資料進行處理，分別得的其風險相對值為0.48、0.56、0.81、0.62，其中B、C、D屬於同一個保險公司。由於本說明書的一些實施例中，在風險排序模型訓練階段，同一個保險公司的用戶的特徵資料構建一個樣本訓練集合，利用這樣的訓練樣本進行模型的學習、訓練，可以輸出更加準確、可靠的風險預測結果。具體的在本示例中可以輸出，如：
C(0.81)＞D(0.62)＞B(0.56)＞A(0.48)。
在本說明書實施例提供方法中，利用LambdaMART的排序模型，輸入用戶的特徵資料，可以輸出用戶賠付率的數值排序後序關係編號，解決了現有回歸模型對利用不同保險公司的賠付率進行建模得到的模型輸出結果可靠性較差的問題。LambdaMART的排序模型學的就是用戶之間的風險高低，排序模型的數值無實際物理含義，是作為序關係的比較依據。
如前所述，本說明書提供的實施例不僅可以用於車險業務風險預測的實施場景中，還可以應用到基金風險排序、醫療保險風險排序等的實施場景中。具體的在車險業務風險預測的應用場景中，
S22：所述風險排序模型為基於與車險業務相關聯的風險關聯資料進行訓練得到的車險風險排序模型；
所述風險相對值包括所述待預測用戶對應的賠付率的相對風險大小。
當然，上述所述的賠付率、車險風險分值僅僅是一種或多種實施例對非線性關係風險排序模型的一種輸出表徵方式。本說明書不限制其他的實施例中還可以有其他的表徵方式或者所述賠付率、車險風險分值經過變形、變換的表徵方式，如賠付率經過線性變換後可以得到車險分，車險分越大，風險越小(車險風險分值相反，風險分值越大，風險越高)。
需要說明的是，通常所述的線性關係是指兩個變量之間存在一次方函數，本說明書實施例中所述的保險或車險中變量的線性關係可以包括y=ax+b形式，x為自變數，y為受控變數。本說明書實施例在具體的保險或車險業務應用場景中，所述的線性關係廣義的理解可以是指兩個變量之間的關係是明確的、固定的，一些情況下可以用直線表述或者藉由一定的數學變化後轉化為線性關係(轉化的資訊損失在一定範圍內)。所述的非線性關係主要是指變量之間的關係是不斷變化的，無法用公式描述，一些情況下只能用曲線、曲面或不規則的線來表示，如風險分值與職業、風險分值與性別。
本說明書一個或多個實施例中，所述的構建風險排序模型的處理，可以採用離線預先構建的方式產生，可以預先選取包含非線性關係的訓練資料進行GBDT決策樹的學習訓練，訓練完成後再在線上使用。本說明書不排除所述風險排序模型可以採用在線構建或更新/維護的方式，例如在電腦能力足夠的情況下，可以在線構建出風險排序模型，構建出風險排序模型可以同步在線使用，對待預測用的目標風險關聯資料進行處理。
雖然上述實施例提供了可以利用LambdaMART模型實現風險排序模型的實施方案，但本說明書不排除其他的實施例中可以利用其他資料模型輸出用戶之間的相對風險大小關係的實施方式，如其他lambdarank(一種排序演算法)、listnet(一種排序演算法)等的listwise方法(文件列表方法)，上述中的。具體的，如圖3所示，本說明書還提供另一種保險業務風險預測的處理方法，所述方法包括：
S30：獲取待預測用戶的目標風險關聯資料；
S32：利用構建的風險排序模型對所述目標風險關聯資料進行處理，輸出所述待預測用戶的風險相對值，所述風險相對值表徵在指定用戶集合中用戶之間的相對風險大小關係。
輸出的風險相對值可以是一個用戶的值，也可以是多個用戶的值。輸出多個用戶值時，可以是表徵用戶之間相對風險大小排序後的輸出結果，也可以是未排序的輸出結果。
本說明書實施例提供的一種保險業務風險預測的處理方法，可以獲取待預測用戶的目標風險關聯資料，然後利用利用構建的風險排序模型對所述目標風險關聯資料進行處理，輸出所述待預測用戶的在風險相對值，所述風險排序模型包括：利用已進行標識的風險關聯資料對演算梯度提升決策樹進行訓練確定的排序模型。利用本說明書實施例提供的方法，可以藉由在保險業務風險預測中引入演算梯度提升決策樹，不僅兼容保險業務中非線性關係的保險業務資料的風險預測處理，還可以輸出風險預測後的相對風險大小關係，排序後的風險預測結果表徵的是不同用戶之間風險的相對大小，可以提供另一種更加可靠的保險業務的風險預測實施方案。
上述所述的方法可以用於客戶端一側的風險識別，如行動終端的支付應用程式中提供的保險業務的風險評估。所述的客戶端可以為PC(personal computer)機、伺服器、工控機(工業控制電腦)、行動智慧型電話、平板電子設備、便攜式電腦(例如筆記型電腦等)、個人數位助理(PDA)、或桌上型電腦或智慧型穿戴設備等。行動通訊終端、手持設備、車載設備、可穿戴設備、電視設備、計算設備。也可以應用在保險公司或第三方保險服務機構的系統伺服器中，所述的系統伺服器可以包括單獨的伺服器、伺服器集群、分散式系統伺服器或者處理設備請求資料的伺服器與其他相關聯資料處理的系統伺服器組合。例如，一種實現中可以包括建立在阿里雲開放資料處理服務(Open Data Processing Service，簡稱ODPS)平臺上。可以為來自不同用戶需求的各種資料處理任務提供統一的編程介面和介面。基於ODPS進行系統性能的保障，實施本說明書實施例方法的系統可以並行處理海量資料並達到最佳的運算性能。
如前所述，本說明書實施例所提供的方法實施例可以在行動終端、電腦終端、伺服器或者類似的運算裝置中執行。以運行在伺服器上為例，圖4是本說明書提供的一種應用保險業務風險預測處理方法的伺服器的硬體結構方塊圖。如圖4所示，伺服器10可以包括一個或多個(圖中僅示出一個)處理器102(處理器102可以包括但不限於微處理器MCU或可程式邏輯裝置FPGA等的處理裝置)、用於儲存資料的記憶體104、以及用於通訊功能的傳輸模組106。本領域具有通常知識者可以理解，圖4所示的結構僅為示意，其並不對上述電子裝置的結構造成限定。例如，伺服器10還可包括比圖4中所示更多或者更少的組件，例如還可以包括其他的處理硬體，如資料庫或多級快取，或者具有與圖4所示不同的配置。
記憶體104可用於儲存應用軟體的軟體程式以及模組，如本發明實施例中的搜尋方法對應的程式指令/模組，處理器102藉由運行儲存在記憶體104內的軟體程式以及模組，從而執行各種功能應用以及資料處理，即實現上述導航互動介面內容展示的處理方法。記憶體104可包括高速隨機記憶體，還可包括非易失性記憶體，如一個或者多個磁性儲存裝置、快閃記憶體、或者其他非易失性固態記憶體。在一些實例中，記憶體104可進一步包括相對於處理器102遠端設置的記憶體，這些遠端記憶體可以藉由網路連接至電腦終端10。上述網路的實例包括但不限於網際網路、企業內部網、區域網路、行動通訊網及其組合。
傳輸模組106用於經由一個網路接收或者發送資料。上述的網路具體實例可包括電腦終端10的通訊供應商提供的無線網路。在一個實例中，傳輸模組106包括一個網路介面控制器(Network Interface Controller，NIC)，其可藉由基地台與其他網路設備相連從而可與網際網路進行通訊。在一個實例中，傳輸模組106可以為射頻(Radio Frequency，RF)模組，其用於藉由無線方式與網際網路進行通訊。
基於上述所述的設備型號識別方法，本說明書還提供一種保險業務風險預測處理裝置。所述的裝置可以包括使用了本說明書實施例所述方法的系統(包括分散式系統)、軟體(應用程式)、模組、組件、伺服器、客戶端等並結合必要的實施硬體的設備裝置。基於同一創新構思，本說明書提供的一種實施例中的處理裝置如下面的實施例所述。由於裝置解決問題的實現方案與方法相似，因此本說明書實施例具體的處理裝置的實施可以參見前述方法的實施，重複之處不再贅述。儘管以下實施例所描述的裝置較佳地以軟體來實現，但是硬體，或者軟體和硬體的組合的實現也是可能並被構想的。具體的，如圖5所示，圖5是本說明書提供的一種保險業務風險預測處理裝置實施例的模組結構示意圖，可以包括：
預測資料獲取模組201，可以用於獲取待預測用戶的目標風險關聯資料；
風險預測模組202，可以用於利用構建的風險排序模型對所述目標風險關聯資料進行處理，輸出所述待預測用戶的風險相對值，所述風險排序模型包括：利用已進行標識的風險關聯資料對演算梯度提升決策樹進行訓練確定的排序模型。
需要說明的是，本說明書實施例上述所述的裝置，根據相關方法實施例的描述還可以包括其他的實施方式。具體的實現方式可以參照方法實施例的描述，在此不作一一贅述。
本說明書實施例提供的伺服器或客戶端可以在電腦中由處理器執行相應的程式指令來實現，如使用windows操作系統的c++語言在PC端或伺服器端實現，或其他例如Linux、系統相對應的應用程式設計語言集合必要的硬體實現，或者基於量子電腦的處理邏輯實現等。上述的處理設備可以具體的為保險伺服器或第三方服務機構提供風險預測的伺服器，所述的伺服器可以為單獨的伺服器、伺服器集群、分散式系統伺服器或者處理設備請求資料的伺服器與其他相關聯資料處理的系統伺服器組合。本說明書還提供一種保險業務風險預測處理設備，具體的可以包括處理器以及用於儲存處理器可執行指令的記憶體，所述處理器執行所述指令時實現：
獲取待預測用戶的目標風險關聯資料；
利用構建的風險排序模型對所述目標風險關聯資料進行處理，輸出所述待預測用戶的風險相對值，所述風險排序模型包括：利用已進行標識的風險關聯資料對演算梯度提升決策樹進行訓練確定的排序模型。
基於前述方式實施例所述，本說明書提供的所述處理設備的另一個實施例中，將用戶歸屬標籤相同的用戶特徵資料構造為一個樣本訓練集合，所述用戶特徵資料中包括與保險業務相關聯的非線性關係的資料資訊；
相應的，所述進行訓練包括：輸入第一樣本訓練集合中第一用戶的特徵資料，輸出所述第一用戶在第一用戶集合中的風險相對值，所述第一用戶集合為在所述第一樣本訓練集合中所包括的用戶。
基於前述方式實施例所述，本說明書提供的所述處理設備的另一個實施例中，所述處理器執行所述指令時還實現：
基於所風險相對值，確定指定用戶集合中用戶之間的相對風險大小關係；
輸出所述相對風險大小關係的資料資訊。
基於前述方式實施例所述，本說明書提供的所述處理設備的另一個實施例中，所述風險排序模型為基於與車險業務相關聯的風險關聯資料進行訓練得到的車險風險排序模型；
所述風險相對值包括所述待預測用戶對應的賠付率的相對風險大小。
當然，本說明書提供的風險預測處理設備中使用的排序模型不限於LambdaMART模型，其他輸出表示用戶風險相對大小的其演算法模型同樣可以適用。因此，本說明書還提供另一種保險業務風險預測處理設備，具體的可以包括處理器以及用於儲存處理器可執行指令的記憶體，所述處理器執行所述指令時實現：
獲取待預測用戶的目標風險關聯資料；
利用構建的風險排序模型對所述目標風險關聯資料進行處理，輸出所述待預測用戶的風險相對值，所述風險相對值表徵在指定用戶集合中用戶之間的相對風險大小關係。
上述的指令可以儲存在多種電腦可讀儲存媒體中。所述電腦可讀儲存媒體可以包括用於儲存資訊的物理裝置，可以將資訊數位化後再以利用電、磁或者光學等方式的媒體加以儲存。本實施例所述的電腦可讀儲存媒體有可以包括：利用電能方式儲存資訊的裝置如，各式記憶體，如RAM、ROM等；利用磁能方式儲存資訊的裝置如，硬碟、軟碟、磁帶、磁芯記憶體、磁泡記憶體、USB隨身碟；利用光學方式儲存資訊的裝置如，CD或DVD。當然，還有其他方式的可讀儲存媒體，例如量子記憶體、石墨烯記憶體等等。上述所述的裝置或伺服器或客戶端或處理設備中的所涉及的指令同上描述。
需要說明的是，本說明書實施例上述所述的裝置和處理設備，根據相關方法實施例的描述還可以包括其他的實施方式。具體的實現方式可以參照方法實施例的描述，在此不作一一贅述。
本說明書中的各個實施例均採用遞進的方式描述，各個實施例之間相同相似的部分互相參見即可，每個實施例重點說明的都是與其他實施例的不同之處。尤其，對於硬體+程式類實施例而言，由於其基本相似於方法實施例，所以描述的比較簡單，相關之處參見方法實施例的部分說明即可。
上述對本說明書特定實施例進行了描述。其它實施例在所附申請專利範圍的範圍內。在一些情況下，在申請專利範圍中記載的動作或步驟可以按照不同於實施例中的順序來執行並且仍然可以實現期望的結果。另外，在圖式中描繪的過程不一定要求示出的特定順序或者連續順序才能實現期望的結果。在某些實施方式中，多任務處理和並行處理也是可以的或者可能是有利的。
本說明書實施例提供的一種保險業務風險預測的處理方法、裝置及處理設備，可以獲取待預測用戶的目標風險關聯資料，然後利用利用構建的風險排序模型對所述目標風險關聯資料進行處理，輸出所述待預測用戶的在風險相對值，所述風險排序模型包括：利用已進行標識的風險關聯資料對演算梯度提升決策樹進行訓練確定的排序模型。利用本說明書實施例提供的方法，可以藉由在保險業務風險預測中引入演算梯度提升決策樹，不僅兼容保險業務中非線性關係的保險業務資料的風險預測處理，還可以輸出風險預測後的相對風險大小關係，排序後的風險預測結果表徵的是不同用戶之間風險的相對大小，可以提供另一種更加可靠的保險業務的風險預測實施方案。
雖然本申請提供了如實施例或流程圖所述的方法操作步驟，但基於常規或者無創造性的勞動可以包括更多或者更少的操作步驟。實施例中列舉的步驟順序僅僅為眾多步驟執行順序中的一種方式，不代表唯一的執行順序。在實際中的裝置或系統伺服器產品執行時，可以按照實施例或者圖式所示的方法順序執行或者並行執行(例如並行處理器或者多線處理的環境)。
儘管本說明書實施例內容中提到線性關係/非線性關係的定義、LambdaMART中GDBT底層模型的構建、GBDT模型演算法的處理過程等之類的資料獲取、儲存、互動、計算、判斷等操作和資料描述，但是，本說明書實施例並不局限於必須是符合行業通訊標準、標準GBDT模型演算法處理、通訊協議和標準資料模型/模板或本說明書實施例所描述的情況。某些行業標準或者使用自定義方式或實施例描述的實施基礎上略加修改後的實施方案也可以實現上述實施例相同、等同或相近、或變形後可預料的實施效果。應用這些修改或變形後的資料獲取、儲存、判斷、處理方式等獲取的實施例，仍然可以屬於本說明書的可選實施方案範圍之內。
在20世紀90年代，對於一個技術的改進可以很明顯地區分是硬體上的改進(例如，對二極體、電晶體、開關等電路結構的改進)還是軟體上的改進(對於方法流程的改進)。然而，隨著技術的發展，當今的很多方法流程的改進已經可以視為硬體電路結構的直接改進。設計人員幾乎都藉由將改進的方法流程編程到硬體電路中來得到相應的硬體電路結構。因此，不能說一個方法流程的改進就不能用硬體實體模組來實現。例如，可程式邏輯裝置(
Programmable Logic Device，PLD)(例如現場可程式閘陣列(Field Programmable Gate Array，FPGA))就是這樣一種積體電路，其邏輯功能由用戶對裝置編程來確定。由設計人員自行編程來把一個數位系統“集成”在一片PLD上，而不需要請晶片製造廠商來設計和製作專用的積體電路晶片。而且，如今，取代手工地製作積體電路晶片，這種編程也多半改用“邏輯編譯器(logic compiler)”軟體來實現，它與程式開發撰寫時所用的軟體編譯器相類似，而要編譯之前的原始碼也得用特定的編程語言來撰寫，此稱之為硬體描述語言(Hardware Description Language，HDL)，而HDL也並非僅有一種，而是有許多種，如ABEL(Advanced
Boolean Expression Language)、AHDL(Altera Hardware
Description Language)、Confluence、CUPL(Cornell
University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language)等，目前最普遍使用的是VHDL(Very-High-Speed
Integrated Circuit Hardware Description Language)與Verilog。本領域技術人員也應該清楚，只需要將方法流程用上述幾種硬體描述語言稍作邏輯編程並編程到積體電路中，就可以很容易得到實現該邏輯方法流程的硬體電路。
控制器可以按任何適當的方式實現，例如，控制器可以採取例如微處理器或處理器以及儲存可由該(微)處理器執行的電腦可讀程式碼(例如軟體或韌體)的電腦可讀媒體、邏輯閘、開關、應用特定積體電路(Application
Specific Integrated Circuit，ASIC)、可程式邏輯控制器和嵌入微控制器的形式，控制器的例子包括但不限於以下微控制器：ARC 625D、Atmel AT91SAM、Microchip
PIC18F26K20以及Silicone Labs C8051F320，記憶體控制器還可以被實現為記憶體的控制邏輯的一部分。本領域技術人員也知道，除了以純電腦可讀程式碼方式實現控制器以外，完全可以藉由將方法步驟進行邏輯編程來使得控制器以邏輯閘、開關、應用特定積體電路、可程式邏輯控制器和嵌入微控制器等的形式來實現相同功能。因此這種控制器可以被認為是一種硬體部件，而對其內包括的用於實現各種功能的裝置也可以視為硬體部件內的結構。或者甚至，可以將用於實現各種功能的裝置視為既可以是實現方法的軟體模組又可以是硬體部件內的結構。
上述實施例闡明的處理設備、裝置、模組或單元，具體可以由電腦晶片或實體實現，或者由具有某種功能的產品來實現。一種典型的實現設備為電腦。具體的，電腦例如可以為個人電腦、筆記型電腦、車載人機互動設備、行動電話、相機電話、智慧型電話、個人數位助理、媒體播放器、導航設備、電子郵件設備、遊戲控制台、平板電腦、可穿戴設備或者這些設備中的任何設備的組合。
雖然本說明書實施例提供了如實施例或流程圖所述的方法操作步驟，但基於常規或者無創造性的手段可以包括更多或者更少的操作步驟。實施例中列舉的步驟順序僅僅為眾多步驟執行順序中的一種方式，不代表唯一的執行順序。在實際中的裝置或終端產品執行時，可以按照實施例或者圖式所示的方法順序執行或者並行執行(例如並行處理器或者多線處理的環境，甚至為分散式資料處理環境)。術語“包括”、“包含”或者其任何其他變體意在涵蓋非排他性的包含，從而使得包括一系列要素的過程、方法、產品或者設備不僅包括那些要素，而且還包括沒有明確列出的其他要素，或者是還包括為這種過程、方法、產品或者設備所固有的要素。在沒有更多限制的情況下，並不排除在包括所述要素的過程、方法、產品或者設備中還存在另外的相同或等同要素。
為了描述的方便，描述以上裝置時以功能分為各種模組分別描述。當然，在實施本說明書實施例時可以把各模組的功能在同一個或多個軟體和/或硬體中實現，也可以將實現同一功能的模組由多個子模組或子單元的組合實現等。以上所描述的裝置實施例僅僅是示意性的，例如，所述單元的劃分，僅僅為一種邏輯功能劃分，實際實現時可以有另外的劃分方式，例如多個單元或組件可以結合或者可以集成到另一個系統，或一些特徵可以忽略，或不執行。另一點，所顯示或討論的相互之間的耦合或直接耦合或通訊連接可以是藉由一些介面，裝置或單元的間接耦合或通訊連接，可以是電性，機械或其它的形式。
本領域技術人員也知道，除了以純電腦可讀程式碼方式實現控制器以外，完全可以藉由將方法步驟進行邏輯編程來使得控制器以邏輯閘、開關、應用特定積體電路、可程式邏輯控制器和嵌入微控制器等的形式來實現相同功能。因此這種控制器可以被認為是一種硬體部件，而對其內部包括的用於實現各種功能的裝置也可以視為硬體部件內的結構。或者甚至，可以將用於實現各種功能的裝置視為既可以是實現方法的軟體模組又可以是硬體部件內的結構。
本發明是參照根據本發明實施例的方法、設備(系統)、和電腦程式產品的流程圖和／或方塊圖來描述的。應理解可由電腦程式指令實現流程圖和／或方塊圖中的每一流程和／或方塊、以及流程圖和／或方塊圖中的流程和／或方塊的結合。可提供這些電腦程式指令到通用電腦、專用電腦、嵌入式處理機或其他可程式資料處理設備的處理器以產生一個機器，使得藉由電腦或其他可程式資料處理設備的處理器執行的指令產生用於實現在流程圖一個流程或多個流程和／或方塊圖一個方塊或多個方塊中指定的功能的裝置。
這些電腦程式指令也可儲存在能引導電腦或其他可程式資料處理設備以特定方式工作的電腦可讀記憶體中，使得儲存在該電腦可讀記憶體中的指令產生包括指令裝置的製造品，該指令裝置實現在流程圖一個流程或多個流程和／或方塊圖一個方塊或多個方塊中指定的功能。
這些電腦程式指令也可裝載到電腦或其他可程式資料處理設備上，使得在電腦或其他可程式設備上執行一系列操作步驟以產生電腦實現的處理，從而在電腦或其他可程式設備上執行的指令提供用於實現在流程圖一個流程或多個流程和／或方塊圖一個方塊或多個方塊中指定的功能的步驟。
在一個典型的配置中，計算設備包括一個或多個處理器(CPU)、輸入/輸出介面、網路介面和記憶體。
記憶體可能包括電腦可讀媒體中的非永久性記憶體，隨機存取記憶體(RAM)和/或非易失性記憶體等形式，如唯讀記憶體(ROM)或快閃記憶體(flash RAM)。記憶體是電腦可讀媒體的示例。
電腦可讀媒體包括永久性和非永久性、可移動和非可移動媒體可以由任何方法或技術來實現資訊儲存。資訊可以是電腦可讀指令、資料結構、程序的模組或其他資料。電腦的儲存媒體的例子包括，但不限於相變記憶體(PRAM)、靜態隨機存取記憶體(SRAM)、動態隨機存取記憶體(DRAM)、其他類型的隨機存取記憶體(RAM)、唯讀記憶體(ROM)、電可抹除可程式化唯讀記憶體(
EEPROM)、快閃記憶體或其他記憶體技術、唯讀光碟(CD-ROM)、數位化多功能光碟(DVD)或其他光學儲存、磁盒式磁帶，磁帶磁磁碟儲存或其他磁性儲存設備或任何其他非傳輸媒體，可用於儲存可以被計算設備存取的資訊。按照本文中的界定，電腦可讀媒體不包括暫存電腦可讀媒體(transitory media)，如調變的資料信號和載波。
本領域技術人員應明白，本說明書的實施例可提供為方法、系統或電腦程式產品。因此，本說明書實施例可採用完全硬體實施例、完全軟體實施例或結合軟體和硬體方面的實施例的形式。而且，本說明書實施例可採用在一個或多個其中包含有電腦可用程式碼的電腦可用儲存媒體(包括但不限於磁碟記憶體、CD-ROM、光學記憶體等)上實施的電腦程式產品的形式。
本說明書實施例可以在由電腦執行的電腦可執行指令的一般上下文中描述，例如程式模組。一般地，程式模組包括執行特定任務或實現特定抽象資料類型的例程、程式、物件、組件、資料結構等等。也可以在分散式計算環境中實踐本說明書實施例，在這些分散式計算環境中，由藉由通訊網路而被連接的遠端處理設備來執行任務。在分散式計算環境中，程式模組可以位於包括儲存設備在內的本地和遠端電腦儲存媒體中。
本說明書中的各個實施例均採用遞進的方式描述，各個實施例之間相同相似的部分互相參見即可，每個實施例重點說明的都是與其他實施例的不同之處。尤其，對於系統實施例而言，由於其基本相似於方法實施例，所以描述的比較簡單，相關之處參見方法實施例的部分說明即可。在本說明書的描述中，參考術語“一個實施例”、“一些實施例”、“示例”、“具體示例”、或“一些示例”等的描述意指結合該實施例或示例描述的具體特徵、結構、材料或者特點包含於本說明書實施例的至少一個實施例或示例中。在本說明書中，對上述術語的示意性表述不必須針對的是相同的實施例或示例。而且，描述的具體特徵、結構、材料或者特點可以在任一個或多個實施例或示例中以合適的方式結合。此外，在不相互矛盾的情況下，本領域的技術人員可以將本說明書中描述的不同實施例或示例以及不同實施例或示例的特徵進行結合和組合。
以上所述僅為本說明書實施例的實施例而已，並不用於限制本說明書實施例。對於本領域技術人員來說，本說明書實施例可以有各種更改和變化。凡在本說明書實施例的精神和原理之內所作的任何修改、等同替換、改進等，均應包含在本說明書實施例的申請專利範圍之內。In order to enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described in combination with the drawings in the embodiments of this specification. Obviously, the described The examples are only a part of examples in this specification, but not all examples. Based on one or more embodiments in this specification, all other embodiments obtained by a person having ordinary knowledge in the art without creative labor should fall within the protection scope of the embodiments of this specification.
With the development of computer Internet technology, the amount of data has grown rapidly. The classification of data characteristics in insurance business risk prediction is also more and more dimensional and detailed. The influence of many variables on screening classification is non-linear, such as the correlation between the length of time spent on the Internet and age, but the correlation can be diverse. For example, it can be a simple linear relationship, such as reducing the length of time spent online by 1 percentage point, and increasing age by 1 year; or a more complex relationship, such as an exponential relationship, reducing the length of time spent online by 4 percentage points and increasing age by 2 When it can be transformed into linearity through certain mathematical changes, it can be solved by generalized linear model. In real life, in addition to some variables with a basic linear relationship, there are also a large number of non-linear variables. For example, when predicting age, if "age" does not simply change with the length of time spent on the Internet, but is also related to the shopping and habits of the crowd, different consumption habits change the age distribution with its own changes in a non-linear manner. Because predicting the "user age" is one of the goals, if some linear relationship prediction models cannot identify non-linear relationships, it will greatly reduce the prediction performance of the model. In the existing solution, the variables can be summarized by sub-boxes, but the accuracy of many variables will be lost and the prediction result will be reduced. The embodiment of the present specification provides another method for realizing risk prediction in insurance business that is different from the existing conventional implementation, and introduces LambdaMART (Lambda Multiple Additive Regression Tree, λ-calculus gradient promotion decision tree, or λ-gradient promotion decision tree), Non-linear variables can be reasonably and effectively used in risk prediction to build a risk ranking model. The model is well compatible with linear and non-linear variables, and it can integrate risk relationships, and directly model the output of the relative risk of users in the same user set. It is as close to the real situation as possible, and the reliability of prediction results has been significantly improved. It should be noted that, for the convenience of description, LambdaMART may be referred to as a calculus gradient promotion decision tree in this specification.
The relative risk value described herein may include outputting a relative risk level relationship between a user in a user set and other users in a user set, and subsequent ones may be sorted according to specific values of the risk relative value. For example, in some embodiments of this specification, the relative risk of user A predicted by the LambdaMART model is 0.6, and the range of the relative risk value here is [0,1], which can indicate that the risk of the user A = 0.6 is greater than The risk of user B = 0.58. At this time, the value of 0.6 or 0.8 does not represent the specific payout rate or the absolute value of the predicted risk. It represents the relative risk of each user in some user sets. For example, in some application scenarios, although the risk of A is greater than the risk of B, but the actual payout rate of A and B is very low or the absolute value of the predicted risk is very low, both A and B may meet the risk assessment requirements based on the actual payout rate.
The user set described above can be specifically divided according to the processed data or the predicted demand of the actual application scenario. For example, users belonging to the same insurance company can be regarded as a user set, or users belonging to the same insurance type can be regarded as a user. Set, or all users responding to the constructed risk ranking model as a user set, or multiple users specified as a user set. Generally, a user set can be a user of the same insurance company or insurance service provider during model training, and it can be used without limitation in online prediction. For example, the relationship between the relative risks of multiple users of different insurance companies can be output.
In the following, a specific application scenario of risk prediction processing of auto insurance business is taken as an example to describe the implementation of this specification. Specifically, FIG. 1 is a schematic flowchart of an embodiment of a method for processing insurance business risk prediction provided in this specification. Although the present specification provides method operation steps or device structures as shown in the following embodiments or drawings, based on conventional or no creative labor, the method or device may include more or partially merged fewer operation steps. Or module unit. Among the steps or structures that do not logically have the necessary causal relationship, the execution order of these steps or the module structure of the device is not limited to the execution order or module structure shown in the embodiments or the drawings of this specification. When the described method or module structure is applied to an actual device, server, or end product, the method or module structure shown in the embodiment or the diagram may be executed sequentially or in parallel (for example, a parallel processor or Multi-line processing environment, even decentralized processing, server cluster implementation environment).
Of course, the description of the embodiment of the risk prediction of the automobile insurance business below does not limit other technical solutions that can be extended based on this specification. For example, in other implementation scenarios, the implementation solutions provided in this specification can also be applied to implementation scenarios such as fund risk ranking and medical insurance risk ranking. Applications in other implementation scenarios are described with reference to the embodiments of the automobile insurance business in this specification and will not be performed. Alternative repetitive description. A specific embodiment is shown in FIG. 1. An insurance business risk prediction processing method provided in this specification may include:
S0: Obtain target risk related data of users to be predicted;
S2: Use the constructed risk ranking model to process the target risk related data, and output the relative risk value of the user to be predicted. The risk ranking model includes: using the identified risk related data to improve the decision gradient of the calculation tree. Perform training to determine the ranking model.
The MART is another term for GBDT (Gradient Boosting Decision Tree). It is an iterative decision tree algorithm. The algorithm consists of multiple decision trees. The conclusions of all trees are added up to make the final answer. . The trees in GBDT are regression trees that can be used for regression prediction. In the insurance business risk prediction processing method provided in this specification, the underlying training model can use the GBDT non-linear relationship algorithm model, and the identified risk-related data can be used to construct a decision tree model in advance. ) Adjust and optimize the parameters in the decision tree step by step. When the model prediction result meets the accuracy requirements of insurance business risk prediction, it can be used online to predict the relative risk of the user to be predicted.
Lambda is the gradient used in the MART solution process. Its physical meaning can indicate the direction (up or down) and intensity of the next iteration of a file to be sorted. The core idea of GBDT is that in the continuous iteration, the regression decision tree model generated by the new iteration fits the gradient of the loss function, and finally all the regression decision trees are superimposed to obtain the final model. LambdaMART uses a special Lambda value to replace the above gradient, which is the sum of LambdaRank algorithm and MART algorithm. The combination of MART and Lambda can be used as the LambdaMART used in some embodiments of this specification. The principle of MART is to solve the function directly in the function space. The model result can be composed of many trees, and the fitting goal of each tree is the gradient of the loss function. LambdaMART's framework (the underlying model) is based on MART, which mainly uses Lambda for the gradients calculated in the middle. In LambdaMART, the parameters set by MART can include: the number of trees M, the number of leaf nodes L, and the learning rate v, etc. These parameters can be obtained by adjusting the validation set to obtain the optimal parameters.
MART supports "hot start", that is, you can continue training on the basis of the already trained model, and it can be loaded by initialization at the beginning. The following is a brief introduction to the general implementation of LambdaMART:
1. Each tree training will first traverse all the training data (file pairs with different labels), calculate the index changes and Lambda caused by the swap position of each pair, that is, then calculate the Lambda of each file, and then calculate each The derivative wi is used to solve the value of the leaf node in the Newton step.
2. Created in the first step of fitting the regression tree. The criterion for dividing the tree nodes is MSE (mean squared error, mean square error), and a regression tree with a number of leaf nodes is generated.
3. For the regression tree generated in the second step, calculate the value of each leaf node and use Newton step to solve, that is, for the file set that falls into the leaf node, use the formula to calculate the output value of the leaf node.
4. Update the model, add the currently learned regression tree to the existing model, and use the learning rate v (also called shrinkage coefficient) for regularization.
In a specific example, the algorithm process of LambdaMART can be shown in Figure 2, which mainly includes the following steps:
1) Determination of initial value. The model can be updated iteratively based on the initialized underlying model; if there is no underlying model, all initial values are 0;
2) Traverse the training set, calculate its lambda gradient (λ) for each file, and use it for subsequent Newton iterative methods to solve partial derivatives of leaf nodes;
3) Use the above gradient information to generate a decision tree. The criteria for dividing the tree nodes can refer to MSE to generate a regression tree R with leaf nodes L.
4) Calculate the value of the leaf nodes according to the Newton iteration method;
5) Update the model and update the score of each file according to the learning rate η.
This specification provides the lambdaMart used in the examples, which is different from the single user absolute correlation of the traditional regression model. The lambdaMart considers the comprehensive risk relationship of all users under a given condition and directly solves it. The result is more comprehensive. Generally, there is a certain difference in pricing between different insurance companies. When lambdaMart constructs training samples, it constructs an order relationship between the same insurance companies. The model training is based on the relative relationship between the data, not the absolute value. Unbalanced sample proportions are not sensitive. Relative to the absolute value of the regression model, this order relationship can more accurately represent the user's risk level. In addition, the lambdaMart in the embodiment of this specification uses a regression model of GBDT, which can perform risk prediction processing on the user characteristic data of the nonlinear relationship input by the model in the insurance business. It has strong applicability in the risk prediction of auto insurance users. The results are also more accurate and reliable.
In a specific implementation process, in one or more embodiments of the present specification, a risk ranking model based on LambdaMART may be constructed in advance. The training and construction of the specific GBDT model used at the bottom layer can be carried out according to the actual business scene requirements and information to set the corresponding model structure and parameter settings. For example, a single tree can be used for individual training, and the training residual can be used as the input of another tree to continue. Training; or multiple trees and multiple levels of connections for training, and the training residuals are then used as the input for another number of multiple levels of connections. Of course, in other embodiments, it is also possible to use the GBDT algorithm to perform some deformation, transformation or improvement of the non-linear relationship insurance business data risk prediction processing implemented by LambdaMART. This specification will not repeat the implementation process of LambdaMART model construction one by one. .
In this embodiment, training data of a risk ranking model may be determined in advance based on historical auto insurance business policy data collection, and training data may be identified according to risk division or setting requirements. In the implementation scenario of insurance business risk prediction in this embodiment, the training data may be referred to as risk-related data, and these risk-related data are usually associated with the insurance business and are used to train samples of the risk ranking model. For example, the risk correlation may be user characteristic data including multiple dimensions, the user characteristic data associated with one user is a set of training data, and each group of risk related data may be identified and set a corresponding risk score. Specifically, in an embodiment of the method described in this specification, the risk-related data may include user characteristic data related to at least one category, and the user characteristic data includes data information of a non-linear relationship associated with insurance business. For example, in one example, the risk-related data of the user A may include (A1, A2, A3, ..., A9) user characteristic data of 9 dimensions. User characteristics data in different dimensions can be selected accordingly according to the needs of auto insurance forecasting. For example, the 9 dimensions of the above example can include age, gender, occupation, annual income, historical outages, average monthly consumption, credit rating, marital status, and debt. assets. Alternatively, user characteristic data of 10 or more dimensions may be collected in advance, and user characteristic data for model training may be selected from user characteristic data of multiple dimensions when determining risk-related data. For example, specific risk-related information can include the following Table 1:

Of course, in other embodiments, the risk-related data may further include manual data generated according to a predetermined rule, for example, the operator may customize the risk-related data for model training according to the conditions that the expected risk may include. Or, after setting the data generation rules, the computer automatically generates the required risk-related data. The artificial data generated here is more in line with the expected risk prediction situation, while the historical auto insurance case data is closer to the real risk situation. In some implementation application scenarios, one of them can be used or a combination of artificial data and historical auto insurance case data can be used. Training of risk ranking models to improve the accuracy of prediction results.
The obtained risk correlation data can be used as training data for training in the GBDT model. After learning and training, the thresholds of decision characteristics (which can be all thresholds or part of the thresholds) when the decision tree branches in the risk ranking model can meet the model's finality. The relationship between the actual risk of each user identified (usually a continuous and stable output can also be required). The GBDT used in the embodiment of the present specification is an iterative decision tree algorithm, which can be mainly divided into a decision tree (Regression Decision Tree) and a gradient boosting (GB). Decision trees are mainly divided into two categories: classification trees and regression trees. Classification trees are commonly used to solve classification problems, such as user gender, whether a web page is a spam page, and whether a user cheats. The regression tree is generally used to predict real values, such as the age of the user, the probability of the user clicking, the relevance of the web page, and so on. The former is used to classify label paper, and the latter is used to predict real values. It should be emphasized here that the addition and subtraction of the results of the regression tree is meaningful, such as 10 years old + 5 years-3 years old = 12 years old, the latter is no way to accumulate or accumulate results is meaningless, such as male + male + female = male Is female. The general process of a regression tree is similar to a classification tree, except that each node of the regression tree will get a predicted value. Taking age as an example, the predicted value is equal to the average age of all people who belong to this node. When branching, each feature is exhaustively searched for the optimal segmentation variable and the optimal segmentation point. The criterion measured in this embodiment is no longer the Gini coefficient in the classification tree, but the square error is minimized. That is, the larger the number of people who are predicted to be wrong, the greater the squared error. The most reliable branch basis is found by minimizing the squared error. Branch until each person on the leaf node is interested in the game is unique or reaches a preset termination condition (such as the upper limit of the number of leaves). If the final age on the leaf node is not unique, everyone on the node The average age of is used as the prediction result of this leaf node.
Gradient boosting is a machine learning technique used for regression, classification, and ranking tasks, and is part of the Boosting algorithm family. Boosting is a family of algorithms that can promote weak learners to strong learners, and belongs to the category of ensemble learning. The Boosting method is based on the idea that, for a complex task, the judgment obtained by appropriately combining the judgments of multiple experts is better than the judgment of any one of the experts alone. In layman's terms, it is the principle of "three tanners better than one Zhuge Liang". Like other boosting methods, gradient boosting builds the final prediction model by ensemble multiple weak learners, usually decision trees. The boosting method uses a stage-wise way to build the model. The weak learner built at each step of the iteration is to make up for the shortcomings of the existing model.
For example, during a specific processing process, the tree of the tree can be set during training, and the tree of the tree can stop training after reaching a specified value (such as eighty); or the residual is small (meeting the conditions for stopping training) When these two conditions meet a training, you can stop training.
If the Nth residual is not all 0 or the stopping condition is not met, the residual results of the nodes of the Nth tree are used to replace the corresponding original values and substituted into the N + 1th tree for learning;
Until the residual value of the N + Kth number of leaf nodes is equal to or less than the threshold, the predicted value corresponding to the current leaf node is output. Specifically, all residuals can be accumulated as a predicted value.
In this embodiment, the number of decision trees used for training can be determined in advance, and the threshold of the decision characteristics when a decision tree is branched is gradually optimized through gradient iteration. For example, 80 decision trees can be used, and each tree learns the residuals of the sum of all previous tree conclusions. The threshold of the initial number can be set according to the experience value. Suppose that A's true score (the score for identification is 80), but the predicted score of the first tree based on the decision characteristics of age is 60 points, a difference of 20 points, and a residual error of 20. Then in the second tree (the decision characteristic is the user's occupation), set the score of A to 20 to learn. If the second tree can really score A to 20 leaf nodes, then add two trees. The conclusion is that A's true score (predicted score 60 points + residual 20 points); if the conclusion of the second tree is 18 points, then A still has 2 points of residuals, and the third tree (decision characteristic is year Income) A's age becomes 2 points, and continue to study. The residual calculation at each step is equivalent to increasing the weight of the mismatch event in disguise, and the time of pairing has tended to 0. For example, if the age is too old or too young, the greater the risk, and the higher the risk of income The smaller the age is, if a user is too old, but is classified into the less risky branch L1, but the average age of the less risky group L1 is between 20 and 40, the residual value will be It will increase accordingly, and the user can gradually divide it into the leaf nodes close to the actual risk through subsequent income, marital status, driving age, etc.
If the number of trained decision trees reaches a predetermined value, for example, all 10 trees from the root node to the leaf nodes are trained once, or the parameters of the current number meet the stopping training conditions, such as the residual error is 0 or other residual stopping thresholds, At this point, you can stop training for this group of data. When each threshold value finds the best segmentation point or the segmentation point that meets the training requirements, the threshold value of the decision feature of the decision tree can be determined, until the adjusted threshold value meets the prediction result output requirements of the risk ranking model, and the The risk ranking model is described. For example, a threshold of 60 and 80 for the risk score is initially set as whether the age is greater than 20 years. After training and optimizing with a large amount of data, the risk assessment from the age dimension can finally be used as a decision feature to adjust whether the age is greater than 24 years old to meet the true prediction results in most cases.
In another embodiment, the number of decision trees used by the risk ranking model may be determined based on the number of categories corresponding to the user characteristic data. For example, user characteristics data of 80 dimensions are selected, and each dimension can represent the decision characteristics of a tree. In this way, 80 decision trees can be used to build a non-linear risk ranking model. Of course, in other embodiments of the present specification, the total number of specific decision trees may be determined according to the collected data, the number of branches of the tree, the upper-lower connection relationship of the tree, and the like.
In an embodiment provided in this specification, the risk-related data used in the training of the risk ranking model is user characteristic data belonging to the same user set, and includes user characteristic data of at least one category, and the user characteristic data includes information related to insurance services. Information about connected nonlinear relationships.
The ranking model used in this embodiment. This model uses the compensation rates of users of the same insurance company to construct the relative risk relationship between these users during training. Since the user information in the same insurance company is completely comparable, this This guarantees that the training data is authentic and reliable. Therefore, compared with the traditional regression model, this embodiment provides another risk prediction recognition processing method based on the ranking model, and the results of the ranking model are more authentic and reliable. At the same time, the method provided by the embodiment of this specification can reasonably and effectively apply multi-dimensional nonlinear relationship variables in insurance business, and the risk ranking model based on the nonlinear relationship of the gradient promotion decision tree can be well compatible with linear and nonlinear variables. Compared with the traditional linear model, the accuracy of the prediction results has been significantly improved, which effectively makes up for the shortcomings of the traditional linear model, and improves the insurance business service experience. In another embodiment of this specification, specifically, the risk ranking model is in the training process The risk-related information used in this includes:
S20: Construct user feature data with the same user attribution label as a sample training set, and the user feature data includes data information of a non-linear relationship associated with the insurance service;
Correspondingly, performing the training includes: inputting the characteristic data of the first user in the first sample training set, and outputting the relative risk value of the first user in the first user set, and the first user set is The users included in the first sample training set are described.
The user attribution label may represent the distribution or classification of users. For example, a user of an insurance company may use a corresponding user attribution label to mark the user as belonging to the user attribution label of insurance company A.
The first sample training set, the first user set, and the first user described above mainly distinguish the currently processed training set from other training sets, and do not specifically refer to a certain set. By analogy, a training set composed of characteristic data of users of another insurance company B may be referred to as a second sample training set, and a set of users belonging to insurance company B may be referred to as a second user set. In model training, you can input the training set set constructed by each insurance company, and output the relative risk of the user to be predicted.
The risk ranking model can individually predict the relative risk value of each user, and can directly optimize the relationship between the relative risk of users belonging to the same user set that is output by the model. Therefore, in another embodiment of the method, the method may further include:
Determine the relative risk relationship between users in the specified user set based on the relative value of the risks;
Output data information of the relative risk magnitude relationship.
For example, in an example, the constructed risk ranking model can be used to process the characteristic data of users A, B, C, and D respectively. The relative risk values obtained are respectively 0.48, 0.56, 0.81, and 0.62, where B, C, and D belongs to the same insurance company. In some embodiments of the present specification, during the risk ranking model training phase, a sample training set is constructed by the characteristics of users of the same insurance company. Using such training samples for model learning and training can output more accurate and reliable Risk prediction results. Specifically, it can be output in this example, such as:
C (0.81)> D (0.62)> B (0.56)> A (0.48).
In the method provided in the embodiment of the present specification, a LambdaMART ranking model is used to input the user's characteristic data, and the numerical value of the user's payout rate can be output. The problem of poor reliability of the model output. LambdaMART's ranking model learns the level of risk between users. The values of the ranking model have no actual physical meaning and are used as a basis for comparing order relationships.
As mentioned above, the embodiments provided in this specification can be used not only in the implementation scenarios of risk prediction of auto insurance business, but also in the implementation scenarios of risk ranking of funds, ranking of medical insurance risks, and the like. Specifically in the application scenario of risk prediction for auto insurance business,
S22: The risk ranking model is a car insurance risk ranking model obtained by training based on risk-related data associated with the car insurance business;
The relative risk value includes a relative risk magnitude of a compensation rate corresponding to the user to be predicted.
Of course, the above-mentioned compensation rate and car insurance risk score are only an output characterization method of the non-linear relationship risk ranking model in one or more embodiments. This specification does not limit that other embodiments may also have other characterization methods or characterization methods in which the payout rate and car insurance risk score are transformed and transformed. For example, the car insurance score can be obtained after the payoff rate is linearly transformed. , The smaller the risk (opposite to the car insurance risk score, the greater the risk score, the higher the risk).
It should be noted that the linear relationship generally refers to the existence of a square function between two variables. The linear relationship of the variables in insurance or auto insurance described in the examples of this specification may include the form y = ax + b, where x is Independent variable, y is the controlled variable. In the specific application scenarios of the insurance or auto insurance business in the embodiments of this specification, the broad understanding of the linear relationship may mean that the relationship between two variables is clear and fixed. In some cases, it can be expressed in a straight line or by After a certain mathematical change, it is transformed into a linear relationship (the converted information loss is within a certain range). The non-linear relationship mainly refers to that the relationship between variables is constantly changing and cannot be described by formulas. In some cases, it can only be expressed by curves, surfaces or irregular lines, such as risk scores and occupations, risk scores Value and gender.
In one or more embodiments of the present specification, the process of constructing a risk ranking model may be generated in an offline pre-built manner, and training data including non-linear relationships may be selected in advance for learning training of the GBDT decision tree. After the training is completed, Use it online again. This specification does not exclude that the risk ranking model can be constructed or updated / maintained online. For example, if the computer capacity is sufficient, the risk ranking model can be constructed online, and the risk ranking model can be used simultaneously online and used for prediction. To process the target risk related data.
Although the above embodiment provides an implementation scheme that can implement a risk ranking model by using a LambdaMART model, this description does not exclude implementations in other embodiments that can use other data models to output relative risk magnitude relationships between users, such as other lambdarank ( (A sorting algorithm), listnet (a sorting algorithm), and other listwise methods (file list methods), as described above. Specifically, as shown in FIG. 3, this specification also provides another method for processing insurance business risk prediction. The method includes:
S30: Obtain target risk related data of users to be predicted;
S32: Use the constructed risk ranking model to process the target risk-related data, and output a relative risk value of the user to be predicted, where the relative risk value represents a relative risk magnitude relationship between users in a specified user set.
The relative risk output can be the value of one user or the value of multiple users. When outputting multiple user values, it can be the output result after sorting that represents the relative risk between users, or it can be the unsorted output result.
An embodiment of this specification provides a method for processing insurance business risk prediction, which can obtain target risk-related data of users to be predicted, and then use the constructed risk ranking model to process the target risk-related data to output the users to be predicted The relative risk value, the risk ranking model includes: a ranking model determined by training the calculus gradient promotion decision tree using the identified risk correlation data. Using the method provided by the embodiment of this specification, the decision tree can be improved by introducing a calculus gradient into the insurance business risk prediction, which is not only compatible with the risk prediction processing of insurance business data with non-linear relationships in the insurance business, but also can output the relative value after the risk prediction. The relationship between risk magnitudes, and the sorted risk prediction results represent the relative magnitude of risks between different users, which can provide another more reliable implementation of risk prediction for insurance business.
The method described above can be used for risk identification on the client side, such as risk assessment of insurance business provided in a payment application of a mobile terminal. The client can be a personal computer (PC), a server, an industrial control computer (industrial control computer), a mobile smart phone, a tablet electronic device, a portable computer (such as a notebook computer, etc.), and a personal digital assistant (PDA). , Or a desktop computer or smart wearable device. Mobile communication terminals, handheld devices, vehicle-mounted devices, wearable devices, TV devices, computing devices. It can also be applied to the system server of an insurance company or a third-party insurance service agency. The system server may include a separate server, a server cluster, a decentralized system server, or a server that processes data requested by the device and other Associated system server combination for data processing. For example, one implementation may include building on an Alibaba Cloud Open Data Processing Service (Open Data Processing Service, ODPS for short) platform. Can provide a unified programming interface and interface for a variety of data processing tasks from different user needs. System performance is guaranteed based on ODPS. The system implementing the method of the embodiment of this specification can process massive data in parallel and achieve the best computing performance.
As mentioned above, the method embodiments provided in the embodiments of this specification may be executed in a mobile terminal, a computer terminal, a server, or a similar computing device. Taking running on a server as an example, FIG. 4 is a block diagram of the hardware structure of a server applying the insurance business risk prediction processing method provided in this specification. As shown in FIG. 4, the server 10 may include one or more (only one shown in the figure) a processor 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) , A memory 104 for storing data, and a transmission module 106 for communication functions. Those skilled in the art can understand that the structure shown in FIG. 4 is only a schematic diagram, and does not limit the structure of the electronic device. For example, the server 10 may further include more or fewer components than those shown in FIG. 4, for example, may further include other processing hardware, such as a database or a multi-level cache, or have a different configuration from that shown in FIG. Configuration.
The memory 104 may be used to store software programs and modules of application software, such as program instructions / modules corresponding to the search method in the embodiment of the present invention. The processor 102 runs the software programs and modules stored in the memory 104 , So as to execute various functional applications and data processing, that is, to achieve the above-mentioned processing method of navigation interactive interface content display. The memory 104 may include high-speed random access memory, and may further include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely disposed relative to the processor 102, and these remote memories may be connected to the computer terminal 10 through a network. Examples of the above network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
The transmission module 106 is used to receive or send data through a network. The above specific examples of the network may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission module 106 includes a network interface controller (NIC), which can be connected to other network devices through the base station to communicate with the Internet. In one example, the transmission module 106 may be a radio frequency (RF) module, which is used to communicate with the Internet wirelessly.
Based on the device model identification method described above, this specification also provides an insurance business risk prediction processing device. The device may include a system (including a decentralized system), software (application), a module, a component, a server, a client, and the like using the method described in the embodiments of the present specification, and a device incorporating necessary implementation hardware Device. Based on the same innovative concept, the processing device in one embodiment provided in this specification is as described in the following embodiments. Since the implementation solution of the device to solve the problem is similar to the method, the implementation of the specific processing device in the embodiment of this specification may refer to the implementation of the foregoing method, and the duplicated details are not described again. Although the devices described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware is also possible and conceived. Specifically, as shown in FIG. 5, FIG. 5 is a schematic diagram of a module structure of an embodiment of an insurance business risk prediction processing device provided in this specification, which may include:
The prediction data acquisition module 201 may be used to obtain target risk-related data of a user to be predicted;
The risk prediction module 202 may be used to process the target risk correlation data by using the constructed risk ranking model, and output the relative risk value of the user to be predicted. The risk ranking model includes: using the identified risk correlation Data are used to determine the ranking model of calculus gradient promotion decision tree training.
It should be noted that, in the device described in the embodiments of this specification, according to the description of the related method embodiments, it may also include other implementations. For specific implementation manners, reference may be made to the description of the method embodiments, and details are not described herein.
The server or client provided in the embodiments of this specification can be implemented by the processor executing corresponding program instructions in the computer, such as using the C ++ language of the Windows operating system on the PC or server, or other systems such as Linux and the system. Corresponding application programming language sets the necessary hardware implementation, or the processing logic implementation based on quantum computers. The above processing equipment may specifically be an insurance server that provides risk prediction to an insurance server or a third-party service agency. The server may be a separate server, a server cluster, a distributed system server, or a processing equipment requesting data. Servers are combined with other system servers associated with data processing. This specification also provides an insurance business risk prediction processing device, which may specifically include a processor and a memory for storing processor-executable instructions. When the processor executes the instructions, it implements:
Obtain target risk related data of users to be predicted;
Use the constructed risk ranking model to process the target risk related data, and output the relative risk value of the user to be predicted, the risk ranking model includes: using the identified risk related data to train the calculus gradient promotion decision tree Determined sorting model.
Based on the embodiment of the foregoing manner, in another embodiment of the processing device provided in this specification, user characteristic data with the same user attribution label is constructed as a sample training set, and the user characteristic data includes information related to insurance services. Information about connected nonlinear relationships;
Correspondingly, performing the training includes: inputting the characteristic data of the first user in the first sample training set, and outputting the relative risk value of the first user in the first user set, and the first user set is The users included in the first sample training set are described.
Based on the foregoing embodiment, in another embodiment of the processing device provided in this specification, when the processor executes the instruction, it further implements:
Determine the relative risk relationship between users in the specified user set based on the relative value of the risks;
Output data information of the relative risk magnitude relationship.
Based on the embodiment of the foregoing manner, in another embodiment of the processing device provided in the present specification, the risk ranking model is an automobile insurance risk ranking model obtained by training based on risk-related data associated with a car insurance business;
The relative risk value includes a relative risk magnitude of a compensation rate corresponding to the user to be predicted.
Of course, the ranking model used in the risk prediction processing device provided in this specification is not limited to the LambdaMART model, and other algorithm models whose output indicates the relative magnitude of the user's risk can also be applied. Therefore, this specification also provides another insurance business risk prediction processing device, which may specifically include a processor and a memory for storing processor-executable instructions. When the processor executes the instructions, it implements:
Obtain target risk related data of users to be predicted;
The target risk related data is processed by using the constructed risk ranking model, and the relative risk value of the user to be predicted is output, and the relative risk value represents a relative risk magnitude relationship between users in a specified user set.
The above instructions can be stored in a variety of computer-readable storage media. The computer-readable storage medium may include a physical device for storing information, and the information may be digitized and then stored by using a medium such as electricity, magnetism, or optics. The computer-readable storage medium described in this embodiment may include: a device for storing information using electric energy, such as various types of memory, such as RAM, ROM, etc .; a device for storing information using magnetic energy, such as hard disk, floppy disk, Magnetic tape, core memory, bubble memory, USB flash drive; devices that use optical means to store information, such as CDs or DVDs. Of course, there are other ways of readable storage media, such as quantum memory, graphene memory, and so on. The instructions involved in the device or server or client or processing device described above are the same as described above.
It should be noted that the apparatus and processing equipment described in the embodiments of this specification may also include other implementation manners according to the description of the related method embodiments. For specific implementation manners, reference may be made to the description of the method embodiments, and details are not described herein.
Each embodiment in this specification is described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other. Each embodiment focuses on the differences from other embodiments. In particular, for the hardware + programming embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts may refer to the description of the method embodiment.
The specific embodiments of the present specification have been described above. Other embodiments are within the scope of the appended patent applications. In some cases, the actions or steps described in the scope of the patent application may be performed in a different order than in the embodiments and still achieve the desired result. In addition, the processes depicted in the figures do not necessarily require the particular order shown or sequential order to achieve the desired result. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
An insurance business risk prediction processing method, device, and processing device provided in the embodiments of the present specification can obtain target risk-related data of users to be predicted, and then use the constructed risk ranking model to process the target risk-related data and output The relative risk value of the user to be predicted, and the risk ranking model includes a ranking model determined by training the calculus gradient promotion decision tree by using the identified risk-related data. Using the method provided by the embodiment of this specification, the decision tree can be improved by introducing a calculus gradient into the insurance business risk prediction, which is not only compatible with the risk prediction processing of insurance business data with non-linear relationships in the insurance business, but also can output the relative value The relationship between risk magnitudes, and the sorted risk prediction results represent the relative magnitude of risks between different users, which can provide another more reliable implementation of risk prediction for insurance business.
Although the present application provides method operation steps as described in the embodiments or flowcharts, more or less operation steps may be included based on conventional or non-creative labor. The sequence of steps listed in the embodiments is only one way of executing the steps, and does not represent the only sequence of execution. When the actual device or system server product is executed, it may be executed sequentially or in parallel according to the method shown in the embodiment or the diagram (for example, a parallel processor or a multi-line processing environment).
Although the content of the examples in this specification mentions the definition of linear / non-linear relationships, the construction of the GDBT underlying model in LambdaMART, the processing process of GBDT model algorithms, and other operations such as data acquisition, storage, interaction, calculation, and judgment, and The data description, however, the embodiments of the present specification are not limited to the situations described in the embodiments of the present specification that must conform to industry communication standards, standard GBDT model algorithm processing, communication protocols and standard data models / templates. Certain industry standards or implementations that are slightly modified based on implementations described in custom methods or embodiments can also achieve the same, equivalent or similar, or predictable implementation effects of the above embodiments. Embodiments obtained by applying these modified or deformed data acquisition, storage, judgment, processing methods, etc., may still fall within the scope of optional implementations of this specification.
In the 1990s, for a technical improvement, it can be clearly distinguished whether it is an improvement in hardware (for example, the improvement of circuit structures such as diodes, transistors, switches, etc.) or an improvement in software (for method and process Improve). However, with the development of technology, the improvement of many methods and processes can be regarded as a direct improvement of the hardware circuit structure. Designers almost always get the corresponding hardware circuit structure by programming the improved method flow into the hardware circuit. Therefore, it cannot be said that the improvement of a method flow cannot be realized by a hardware entity module. For example, a programmable logic device (
A Programmable Logic Device (PLD) (such as a Field Programmable Gate Array (FPGA)) is one such integrated circuit whose logic function is determined by the user programming the device. It is programmed by the designer to "integrate" a digital system on a PLD, without having to ask a chip manufacturer to design and manufacture a dedicated integrated circuit chip. Moreover, nowadays, instead of making integrated circuit chips by hand, this programming is mostly implemented using "logic compiler" software, which is similar to the software compiler used in program development and writing, and requires compilation. The previous source code must also be written in a specific programming language. This is called the Hardware Description Language (HDL). HDL is not only one, but there are many types, such as ABEL (Advanced
Boolean Expression Language), AHDL (Altera Hardware
Description Language), Confluence, CUPL (Cornell
University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, RHDL (Ruby Hardware Description Language), etc. At present, the most commonly used is VHDL (Very-High-Speed
Integrated Circuit Hardware Description Language) and Verilog. Those skilled in the art should also be clear that the hardware circuit that implements the logic method flow can be easily obtained by simply programming the method flow into the integrated circuit with the above-mentioned several hardware description languages.
The controller may be implemented in any suitable manner, for example, the controller may take the form of a microprocessor or processor and a computer-readable storage of computer-readable code (such as software or firmware) executable by the (micro) processor. Media, logic gates, switches, application-specific integrated circuits (Application
Specific Integrated Circuit (ASIC), programmable logic controller and embedded microcontroller. Examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip
PIC18F26K20 and Silicone Labs C8051F320, the memory controller can also be implemented as part of the control logic of the memory. Those skilled in the art also know that, in addition to implementing the controller in pure computer-readable code, it is entirely possible to program the method steps to logically control the controller with logic gates, switches, application-specific integrated circuits, and programmable logic control. Controller and embedded microcontroller to achieve the same function. Therefore, the controller can be considered as a hardware component, and the device included in the controller for implementing various functions can also be considered as a structure in the hardware component. Or even, a device for implementing various functions can be regarded as a structure that can be both a software module implementing the method and a hardware component.
The processing equipment, devices, modules, or units described in the above embodiments may be specifically implemented by computer chips or entities, or by products with certain functions. A typical implementation is a computer. Specifically, the computer may be, for example, a personal computer, a notebook computer, a vehicle-mounted human-computer interactive device, a mobile phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, or a tablet. A computer, a wearable device, or a combination of any of these devices.
Although the embodiments of the present specification provide the operation steps of the method as described in the embodiments or flowcharts, more or less operation steps may be included based on conventional or non-creative means. The sequence of steps listed in the embodiments is only one way of executing the steps, and does not represent the only sequence of execution. When the actual device or terminal product is executed, it may be executed sequentially or in parallel according to the method shown in the embodiment or the diagram (for example, a parallel processor or multi-line processing environment, or even a distributed data processing environment). The terms "including,""including," or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, product, or device that includes a series of elements includes not only those elements, but also other elements not explicitly listed Elements, or elements that are inherent to such a process, method, product, or device. Without further limitation, it does not exclude that there are other identical or equivalent elements in the process, method, product or equipment including the elements.
For the convenience of description, when describing the above device, the functions are divided into various modules and described separately. Of course, when implementing the embodiments of this specification, the functions of each module may be implemented in the same or multiple software and / or hardware, or the module that implements the same function may be composed of multiple submodules or subunits. Implementation etc. The device embodiments described above are only schematic. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or integrated into Another system, or some features, can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.
Those skilled in the art also know that, in addition to implementing the controller in pure computer-readable code, it is entirely possible to program the method steps to logically control the controller with logic gates, switches, application-specific integrated circuits, and programmable logic control. Controller and embedded microcontroller to achieve the same function. Therefore, such a controller can be considered as a hardware component, and the device included in the controller for implementing various functions can also be considered as a structure within the hardware component. Or even, a device for implementing various functions can be regarded as a structure that can be both a software module implementing the method and a hardware component.
The present invention is described with reference to flowcharts and / or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present invention. It should be understood that each flow and / or block in the flowchart and / or block diagram, and a combination of the flow and / or block in the flowchart and / or block diagram can be implemented by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, special purpose computer, embedded processor, or other programmable data processing device to generate a machine, so that the instructions generated by the processor of the computer or other programmable data processing device are generated A device for realizing the functions specified in one or more flowcharts and / or one or more blocks of the block diagram.
These computer program instructions may also be stored in computer readable memory that can guide a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer readable memory generate a manufactured article including a command device, The instruction device implements a function specified in a flowchart or a plurality of processes and / or a block or a block of a block diagram.
These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operating steps can be performed on the computer or other programmable equipment to generate computer-implemented processing, which can be executed on the computer or other programmable equipment. The instructions provide steps for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagram.
In a typical configuration, a computing device includes one or more processors (CPUs), input / output interfaces, network interfaces, and memory.
Memory may include non-permanent memory, random access memory (RAM), and / or non-volatile memory in computer-readable media, such as read-only memory (ROM) or flash memory ( flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media includes permanent and non-permanent, removable and non-removable media. Information can be stored by any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), and other types of random access memory (RAM) , Read-only memory (ROM), electrically erasable and programmable read-only memory (
EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical storage, magnetic tape cartridges, magnetic disk storage or other magnetic storage devices Or any other non-transmitting medium that can be used to store information that can be accessed by a computing device. According to the definition in this article, computer-readable media does not include temporary computer-readable media (transitory media), such as modulated data signals and carrier waves.
Those skilled in the art should understand that the embodiments of the present specification may be provided as a method, a system, or a computer program product. Therefore, the embodiments of the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Moreover, the embodiments of the present specification may use a computer program product implemented on one or more computer-usable storage media (including but not limited to disk memory, CD-ROM, optical memory, etc.) containing computer-usable code. form.
The embodiments of this specification can be described in the general context of computer-executable instructions executed by a computer, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. The embodiments of the present specification can also be practiced in a decentralized computing environment. In these decentralized computing environments, tasks are performed by a remote processing device connected through a communication network. In a decentralized computing environment, program modules can be located in local and remote computer storage media, including storage devices.
Each embodiment in this specification is described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other. Each embodiment focuses on the differences from other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple. For the relevant part, refer to the description of the method embodiment. In the description of this specification, the description with reference to the terms “one embodiment”, “some embodiments”, “examples”, “specific examples”, or “some examples” and the like means specific features described in conjunction with the embodiments or examples , Structure, materials, or features are included in at least one embodiment or example of an embodiment of the present specification. In this specification, the schematic expressions of the above terms are not necessarily directed to the same embodiment or example. Moreover, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. In addition, without any contradiction, those skilled in the art may combine and combine different embodiments or examples and features of the different embodiments or examples described in this specification.
The above descriptions are merely examples of the embodiments of the present specification, and are not intended to limit the embodiments of the present specification. For those skilled in the art, the embodiments of the present specification may have various modifications and changes. Any modification, equivalent replacement, and improvement made within the spirit and principle of the embodiments of the present specification shall be included in the scope of patent application of the embodiments of the present specification.

S30~S32‧‧‧步驟S30 ~ S32‧‧‧step

10‧‧‧伺服器(電腦終端) 10‧‧‧Server (computer terminal)

102‧‧‧處理器 102‧‧‧ processor

104‧‧‧非易失性記憶體 104‧‧‧Non-volatile memory

106‧‧‧傳輸模組 106‧‧‧Transmission Module

201‧‧‧預測資料獲取模組 201‧‧‧ Forecast data acquisition module

202‧‧‧風險預測模組 202‧‧‧Risk Forecast Module

為了更清楚地說明本說明書實施例或現有技術中的技術方案，下面將對實施例或現有技術描述中所需要使用的圖式作簡單地介紹，顯而易見地，下面描述中的圖式僅僅是本說明書中記載的一些實施例，對於本領域具有通常知識者來講，在不付出創造性勞動性的前提下，還可以根據這些圖式獲得其他的圖式。In order to more clearly explain the embodiments of the present specification or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings in the following description are only For some people with ordinary knowledge in the field, some of the embodiments described in the specification can also obtain other drawings based on these drawings without paying creative labor.

圖1是本說明書提供的一種保險業務風險預測處理方法實施例的流程示意圖； FIG. 1 is a schematic flowchart of an embodiment of an insurance business risk prediction processing method provided in this specification;

圖2是本說明書提供的所述方法中一種LambdaMART模型處理過程示意圖； 2 is a schematic diagram of a LambdaMART model processing process in the method provided in this specification;

圖3是本說明書提供的另一種保險業務風險預測處理方法實施例的流程示意圖； 3 is a schematic flowchart of another embodiment of an insurance business risk prediction processing method provided in this specification;

圖4是本說明書提供的一種應用保險業務風險預測處理方法的伺服器的硬體結構方塊圖。 FIG. 4 is a block diagram of the hardware structure of a server to which an insurance business risk prediction processing method provided in this specification is applied.

圖5是本說明書提供的一種保險業務風險預測處理裝置的模組結構示意圖。 FIG. 5 is a schematic diagram of a module structure of an insurance business risk prediction processing device provided in this specification.

Claims

A method for processing insurance business risk prediction, the method includes: Obtain target risk related data of users to be predicted; Use the constructed risk ranking model to process the target risk related data, and output the relative risk value of the user to be predicted. The risk ranking model includes: using the identified risk related data to train and determine the gradual decision tree for gradual improvement. model.

According to the method described in the first patent application scope, the risk correlation data used in the training process of the risk ranking model includes: Constructing user characteristic data with the same user attribution label as a sample training set, the user characteristic data including non-linear relationship data information associated with insurance business; Correspondingly, the training includes: inputting the characteristic information of the first user in the first sample training set, and outputting the relative risk value of the first user in the first user set, and the first user set Users included in this training set.

The method according to item 2 of the patent application scope, further comprising: Based on the relative risk value, determine the relative risk magnitude relationship between users in the specified user set; Output data information of the relative risk magnitude relationship.

As in the method described in item 3 of the scope of patent application, the risk ranking model is an auto insurance risk ranking model trained based on risk-related data associated with the car insurance business; The relative risk value includes the relative risk magnitude of the compensation rate corresponding to the user to be predicted.

A method for processing insurance business risk prediction includes: Obtain target risk related data of users to be predicted; The target risk related data is processed by using the constructed risk ranking model, and the relative risk value of the user to be predicted is output. The relative risk value represents the relationship between the relative risks among users in the specified user set.

An insurance business risk prediction processing device includes: Prediction data acquisition module, used to obtain target risk-related data of users to be predicted; The risk prediction module is used to process the target risk related data by using the constructed risk ranking model and output the relative risk value of the user to be predicted. The risk ranking model includes: using the identified risk related data to improve the calculation gradient The decision tree is trained to determine the ranking model.

An insurance business risk prediction processing device includes a processor and a memory for storing processor-executable instructions. When the processor executes the instructions, the processor implements: Obtain target risk related data of users to be predicted; Use the constructed risk ranking model to process the target risk related data, and output the relative risk value of the user to be predicted. The risk ranking model includes: using the identified risk related data to train and determine the gradual decision tree for gradual improvement. model.

As the processing equipment described in item 7 of the scope of patent application, the risk-related data used in the training process of the risk ranking model includes: Constructing user characteristic data with the same user attribution label as a sample training set, the user characteristic data including non-linear relationship data information associated with insurance business; Correspondingly, the training includes: inputting the characteristic information of the first user in the first sample training set, and outputting the relative risk value of the first user in the first user set, and the first user set is the same as the first user Users included in this training set.

As the processing device described in item 8 of the patent application scope, when the processor executes the instruction, it also implements: Determine the relative risk relationship between users in the specified user set based on the relative value of the risks; Output data information of the relative risk magnitude relationship.

According to the processing equipment described in item 9 of the scope of patent application, the risk ranking model is an auto insurance risk ranking model obtained by training based on risk-related data associated with auto insurance business; The relative risk value includes the relative risk magnitude of the compensation rate corresponding to the user to be predicted.

An insurance business risk prediction processing device includes a processor and a memory for storing processor-executable instructions. When the processor executes the instructions, the processor implements: Obtain target risk related data of users to be predicted; The target risk related data is processed by using the constructed risk ranking model, and the relative risk value of the user to be predicted is output. The relative risk value represents the relative risk magnitude relationship between users in the specified user set.