CN115329958A

CN115329958A - Model migration method and device and electronic equipment

Info

Publication number: CN115329958A
Application number: CN202110510937.3A
Authority: CN
Inventors: 阮怀玉; 章鹏; 苏煜
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2021-05-11
Filing date: 2021-05-11
Publication date: 2022-11-11

Abstract

A model migration method and device are provided, and the method comprises the following steps: performing model training on the model based on first training sample data in a source scene and a sample label corresponding to the first training sample data to obtain the trained decision tree model; based on second training sample data corresponding to no sample label in the target scene, pruning processing is respectively carried out on each decision tree in the trained decision tree model, so that the decision tree model is migrated from the source scene to the target scene. On one hand, the problem of model initialization under the condition that a target scene is not provided with a label is solved, and the generalization efficiency of the migration model in the target scene is improved; on the other hand, only the model parameters of the decision tree model trained in the source scene need to be output to the target scene, data under the source scene does not need to be used, and the user requirements of data safety and privacy protection are met.

Description

Model migration method and device and electronic equipment

Technical Field

The present disclosure relates to the field of computer applications, and in particular, to a model migration method and apparatus, and an electronic device.

Background

Generally, a developer can acquire a machine learning model with a specific function by using a mode of training the machine learning model, and then use the model to complete a specific task, so that compared with the manual completion of the task, the method can obviously save human resources; however, since the training process of the machine learning model is completed in a specific business scenario, if a new business scenario is switched, the machine learning model trained previously may not work normally; if the model is retrained, it may not be possible to do so due to the lack of historical data accumulation for the new business scenario.

In the related technology, a model trained previously can adapt to a new service scene by a transfer learning method; however, in the conventional migration learning mode, sample data in an original service scene and sample data in a new scene need to be mixed for model training, but in the financial wind control modeling, the sample data in the specific service scene cannot be directly used due to requirements on data security, privacy and the like, so that model migration cannot be completed.

Disclosure of Invention

In view of the above, the present specification proposes a model migration method, where the model is a decision tree model including a plurality of decision trees, and the method includes:

performing model training on the model based on first training sample data in a source scene and a sample label corresponding to the first training sample data to obtain the trained decision tree model;

based on second training sample data corresponding to the non-sample label in the target scene, pruning each trained decision tree in the decision tree model respectively to complete the migration of the decision tree model from the source scene to the target scene.

Optionally, the feature space of the first training sample data of the target scene is the same as the feature space of the second training sample data of the source scene; the feature distribution of first training sample data of the target scene is different from the feature distribution of second training sample data of the source scene.

Optionally, after pruning each of the trained decision trees in the decision tree model, the method further includes:

and respectively carrying out decision parameter adjustment on each decision tree in the decision tree model after pruning so as to finish the migration of the decision tree model from a source scene to a target scene.

Optionally, each decision tree includes a root node, a non-leaf node, and a leaf node;

the pruning treatment is respectively carried out on each trained decision tree in the decision tree model, and the pruning treatment comprises the following steps:

inputting the second training sample data into each decision tree of the trained decision tree model for prediction, and recording the sample distribution of the second training sample data in each node of each decision tree;

traversing each non-leaf node of all non-leaf nodes in each decision tree;

for each decision tree, judging whether the sample distribution of the second training sample data in the leaf node corresponding to each non-leaf node is smaller than a preset sample distribution threshold or judging whether the sample number of the second training sample data in the leaf node corresponding to each non-leaf node is smaller than a preset sample number threshold;

if yes, pruning the leaf nodes corresponding to the non-leaf nodes of the decision tree, and outputting the pruned decision tree model.

Optionally, the decision parameter is a decision characteristic threshold used for making a decision for each node in the decision tree;

the step of respectively adjusting the decision parameters of each decision tree in the pruned decision tree model comprises the following steps:

inputting the second training sample data into each decision tree of the pruned decision tree model for prediction, and recording the sample distribution of the second training sample data at each node in each decision tree and the posterior probability distribution of all sample data in the second training sample data;

traversing the layers from the root node to the non-leaf nodes of each decision tree;

iteratively calculating the sample distribution of each node in each decision tree and the posterior probability distribution of all sample data in the second training sample data based on a preset loss function, and adjusting the decision characteristic threshold of each node according to a preset step value to solve the minimum value of the loss function;

and taking the decision characteristic threshold of each node corresponding to the minimum loss function as the adjusted decision parameter of each decision tree.

Optionally, the loss function is characterized based on the following formula:

f (x) is trained under a representation source scene to obtain a classification function, p, corresponding to the decision tree model _S (f (x)) characterizes the predicted distribution of the machine learning model in the source scene, p _T (f (x)) characterizing a predicted distribution of the machine learning model in a target scene; p is a radical of _s (x _i ) Characterizing variable x _i Probability distribution in source scene, p _T (x _i ) Characterizing variable x _i Probability distribution in a target scene; t is _i Characterising variable x _i A series of decision characteristic thresholds for nodes in the decision tree model; by searching for the variable x in the target scene _i A new decision characteristic threshold value is adopted, so that the value of the loss function is minimum;

JS (p) in the loss function _S (f(x))||p _T P in (f (x))) _S (f (x)) and p _T (f (x) as p and q, respectively, and JS (p) in the loss function _S (x _i )||p _T (x _i ) P in (1) _S (x _i ) And p _T (x _i ) As p and q, respectively, substituted into the following formula for calculation:

wherein JS (p | | q) characterizes JS divergence of the probability distribution p and the probability distribution q,

characterizing probability distributions p and p

The KL divergence of (a) is,

characterizing probability distributions q and q

KL divergence of (1).

Optionally, the decision tree in the decision tree model is constructed based on a random forest algorithm or a GDBT algorithm.

The present specification also provides a model migration apparatus, where the model is a decision tree model including a plurality of decision trees, the apparatus including:

the training module is used for carrying out model training on the model based on first training sample data in a source scene and a sample label corresponding to the first training sample data to obtain the trained decision tree model;

and the migration module is used for performing pruning processing on each trained decision tree in the decision tree model based on second training sample data corresponding to the non-sample label in the target scene so as to finish the migration of the decision tree model from the source scene to the target scene.

Optionally, after pruning each of the trained decision trees in the decision tree model, the migration module further:

the migration module:

traversing each non-leaf node of all non-leaf nodes in each decision tree;

if yes, pruning the leaf node corresponding to the non-leaf node of the decision tree, and outputting the decision tree model after pruning.

the migration module further:

performing hierarchical traversal from the root node to the non-leaf node of each decision tree;

Optionally, the loss function is characterized based on the following formula:

f (x) is trained under a representation source scene to obtain a classification function, p, corresponding to the decision tree model _S (f (x)) characterizing the predicted distribution of the machine learning model in the source scene, p _T (f (x)) characterizing a predicted distribution of the machine learning model in a target scene; p is a radical of _S (x _i ) Characterising variable x _i Probability distribution in the source scene, p _T (x _i ) Characterising variable x _i Probability distribution in a target scene; t is a unit of _i Characterizing variable x _i A series of decision characteristic thresholds for nodes in the decision tree model; by searching for the variable x in the target scene _i A new decision characteristic threshold value is adopted, so that the value of the loss function is minimum;

JS (p) in the loss function _S (f(x))||p _T P in (f (x))) _S (f (x)) and p _T (f (x) as p and q, respectively, and JS (p) in loss function _S (x _i )||p _T (x _i ) P in (1) _S (x _i ) And p _T (x _i ) As p and q, respectively, substituted into the following formula for calculation:

characterizing probability distributions p and p

The KL divergence of (a),

characterizing probability distributions q and q

KL divergence of.

Optionally, the decision tree in the decision tree model is constructed based on a random forest algorithm or based on a GDBT algorithm.

The present specification also provides an electronic device, including a communication interface, a processor, a memory, and a bus, where the communication interface, the processor, and the memory are connected to each other through the bus;

the memory stores machine-readable instructions that the processor executes by invoking to perform the above-described method.

The present specification also provides a machine-readable storage medium having stored thereon machine-readable instructions which, when invoked and executed by a processor, perform the method described above.

In the technical scheme, the non-label sample of the target scene is utilized, the structural characteristics of the decision tree model are combined, the pruning fine adjustment is carried out on the model structure, and the characteristic decision threshold value of the decision tree node is obtained, so that the migration adaptation of the model from the source scene to the target scene is realized, on one hand, the problem of model initialization under the condition that the target scene is non-label is solved, and the generalization efficiency of the migration model in the target scene is improved; on the other hand, only the model parameters of the decision tree model trained in the source scene need to be output to the target scene, data under the source scene does not need to be used, and the user requirements of data safety and privacy protection are met.

Drawings

FIG. 1 is a flow chart of a method of model migration provided by an exemplary embodiment;

FIG. 2 is a schematic diagram of a tree structure and migration process for a decision tree provided by an exemplary embodiment;

FIG. 3 is a hardware block diagram of an electronic device provided by an exemplary embodiment;

FIG. 4 is a block diagram of a model migration apparatus provided in an exemplary embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if," as used herein, may be interpreted as "at \8230; \8230when" or "when 8230; \823030when" or "in response to a determination," depending on the context.

The specification aims to provide a technical scheme of model migration.

When the method is realized, the model is a decision tree model comprising a plurality of decision trees; performing model training on the model based on first training sample data in a source scene and a sample label corresponding to the first training sample data to obtain the trained decision tree model;

furthermore, based on second training sample data corresponding to no sample label in the target scene, each trained decision tree in the decision tree model is pruned, so that the decision tree model is migrated from the source scene to the target scene.

In the technical scheme, a model structure is pruned and fine-tuned by using a label-free sample of a target scene in combination with the structural characteristics of a decision tree model, and a characteristic decision threshold of a decision tree node is obtained, so that the model is migrated and adapted from a source scene to the target scene, on one hand, the problem of model initialization under the condition of no label of the target scene is solved, and the generalization efficiency of the migration model in the target scene is improved; on the other hand, only the model parameters of the decision tree model trained in the source scene need to be output to the target scene, data under the source scene does not need to be used, and the user requirements of data safety and privacy protection are met.

The present specification is described below with reference to specific embodiments and specific application scenarios.

Referring to fig. 1, fig. 1 is a flowchart of a model migration method according to an embodiment of the present disclosure, in which the model is a decision tree model including a plurality of decision trees; the method comprises the following steps:

102, performing model training on the model based on first training sample data in a source scene and a sample label corresponding to the first training sample data to obtain the trained decision tree model.

And step 104, based on second training sample data corresponding to no sample label in the target scene, performing pruning processing on each trained decision tree in the decision tree model respectively to complete the migration of the decision tree model from the source scene to the target scene.

For ease of understanding, the lower decision tree is briefly introduced here.

The Decision Tree (Decision Tree) is a Decision analysis method which is used for obtaining the probability that the expected value of the net present value is greater than or equal to zero by forming the Decision Tree on the basis of the known occurrence probability of various situations, evaluating the risk of the project and judging the feasibility of the project, and is a graphical method which intuitively uses probability analysis. This branch is called a decision tree because it is drawn to resemble a branch of a tree.

Whereas in the field of machine learning, decision trees may typically include predictive models for classification; the decision tree is a tree structure in which each internal node represents a decision on an attribute, each branch represents the output of a decision result, and finally each leaf node represents a classification result.

In this specification, the model is a decision tree model including a plurality of decision trees;

for the specific process of constructing the decision tree, please refer to the description about the random forest algorithm construction decision tree and the GDBT construction decision tree, which is not described herein again.

It should be noted that, the precision of classification and prediction of sample data is relatively poor based on a single decision tree, and the model performs ensemble learning by including a plurality of decision trees, and performs weighted summation on the classification and prediction results output by the plurality of decision trees as base classifiers to obtain a classification and prediction result with higher precision than that of the single decision tree, for example, different classifiers (weak classifiers) may be trained in the same training set based on the Adaboost algorithm, and then the weak classifiers are aggregated to form a stronger final classifier (strong classifier). Of course, in practical applications, the decision tree model may include only one decision tree.

In this specification, the source scenario may include any business scenario in which sample data satisfies the model training. For example, in practical applications, the source scenario may be a financial pneumatic scenario in a pay bank service, and when a user accesses the pay bank service, the pay bank system obtains a large amount of service data (e.g., personal information, purchase information, and credit information of the user) as sample data for the model training.

In this specification, the target scenario refers to a scenario in which a service similar to that of the source scenario exists. For example, when the source scene is a financial wind control scene in a pay service, the target scene may be a credit card cashback scene of a certain bank.

In this specification, model training is performed on the model based on first training sample data in the source scene and a sample label corresponding to the first training sample data, so as to obtain the trained decision tree model.

For example, taking the source scenario as a financial wind control scenario in the financial banking business as an example, the first training sample data is sample data of users of different ages acquired in the financial wind control scenario in the financial banking business, and a sample label corresponding to the sample data may be a two-class label (for example, 0 represents that the user risk is low and 1 represents that the user risk is low), the first sample is input to the model, and the model is trained by using the sample label corresponding to the first sample as a constraint, so as to obtain the trained decision tree model.

In this specification, after the training of the decision tree model is completed in the source scene, model migration may be performed on model parameters of the trained decision tree model based on unlabeled second training sample data in a target scene, so that the decision tree model after model migration may be adapted to the target scene.

Then, continuing to illustrate the above example, taking the target scene as a credit card anti-cash-out scene of a certain bank corresponding to the source scene (financial wind control scene under the payment treasure business) as an example, after the decision tree model trained based on the source scene is obtained, inputting second training sample data corresponding to no sample label in the target scene into the decision tree model for model migration, and obtaining the decision tree model after model migration.

It should be noted that, in order to migrate the decision tree model from the source scene to the target scene, the target scene and the source scene need to satisfy a preset condition; the preset condition may include that a feature space of the first training sample data of the target scene is the same as a feature space of the second training sample data of the source scene, and a feature distribution of the first training sample data of the target scene is different from a feature distribution of the second training sample data of the source scene.

For ease of understanding, the following concepts of "feature space" and "feature distribution" are briefly introduced herein.

In the field of machine learning technology, a feature is an abstraction of raw data, and is an abstract representation of raw data, and the raw data is represented by using a numerical value. For example, the text feature and the picture feature may be obtained by feature extraction of the text and the picture. The feature space refers to a space with the above features as a set, and the features in the feature space are higher-dimensional abstractions of the original data, that is, the feature space is a space in which the original data is mapped to a higher dimension.

The feature distribution refers to probability distribution of values of features as random variables. In the present specification, the type of the probability distribution is not particularly limited. For example, the feature distribution may be such that the features follow a normal distribution, a binomial distribution, a poisson distribution, or the like.

In this specification, after the decision tree model is migrated from the source scene to the target scene, prediction sample data to be classified in the target scene is acquired and input to the decision tree model after model migration, and the decision tree model after model migration may output a classification prediction result corresponding to the prediction sample data.

In this specification, the applicant researches a decision tree, and finds that decision characteristics of nodes of the decision tree and decision characteristic thresholds corresponding to the decision characteristics reflect the contribution of the decision characteristics to segmentation. And the sample distribution of the decision characteristics at the final leaf node is the sample prediction value falling into the leaf node. Therefore, the tree structure of the decision tree, the decision feature threshold distribution and the node sample distribution parameters determine the prediction performance of the model. The decision tree structure in the decision tree model of the source scene is assumed to be basically unchanged in the target scene, but the decision feature distribution of the nodes of the target scene can be shifted due to the existence of the sample distribution change of the scene. And carrying out fine adjustment on a basic structure of a decision tree structure in the decision tree model and fine adjustment on the characteristic distribution of the decision characteristics of the nodes through the target scene label-free samples.

For convenience of understanding and describing the model migration process of the decision tree model from the source scene to the target scene, the following description mainly combines "pruning processing of decision tree migration" and "decision parameter adjustment of decision tree migration" to perform a specific introduction of model migration.

1) Pruning of decision tree migration

In this specification, when the decision tree model includes one or more decision trees, pruning each of the trained decision trees in the decision tree model based on second training sample data corresponding to a sample-free label in a target scene to complete migration of the decision tree model from the source scene to the target scene;

each decision tree in the decision tree model comprises a root node, a non-leaf node and a leaf node.

For convenience of understanding and description, please refer to fig. 2, and fig. 2 is a schematic diagram of a tree structure and a migration process of a decision tree provided in an embodiment of the present specification.

As shown in fig. 2, the decision tree in the rectangular box on the left side of the "migration" is the nth decision tree of the source scene training, the decision tree in the rectangular box on the right side of the "migration" is the nth decision tree of the target scene corresponding to the nth decision tree of the source scene training after the migration, and n is a natural number.

As shown in FIG. 2, taking the decision tree in the rectangular box on the left side of the "migration" as an example, the decision tree includes a root node (e.g. the circle where "x1" is located in FIG. 2), non-leaf nodes (e.g. the circles where "x2", "x3", "x4", "x5" are located in FIG. 2, respectively), leaf nodes (e.g. the white circles and the slashed circles in FIG. 2; the white circles represent the right leaf nodes of the non-leaf nodes, and the slashed circles represent the left leaf nodes of the non-leaf nodes). Similarly, the decision tree in the rectangular box on the right side of the "migration" is similar to the decision tree in the rectangular box on the left side of the "migration", and is not described herein again.

In this specification, the pruning process is performed on each trained decision tree in the above decision tree model, and specifically includes the following steps:

step a, inputting the second training sample data into each decision tree of the trained decision tree model for prediction, and recording the sample distribution p of the second training sample data in each node of each decision tree ^T (x)。

For example, the second training sample data is input into the nth decision tree of the source scene training as shown in fig. 2 for prediction, and the sample distribution p of the second training sample data at each node in each decision tree is recorded ^T (x)

B, traversing each non-leaf node in all non-leaf nodes in each decision tree, and judging whether the sample distribution of the second training sample data in the leaf node corresponding to each non-leaf node is smaller than a preset sample distribution threshold (such as p) or not according to each decision tree ^T (x)<p0; p0 represents a preset sample distribution threshold), or whether the number of samples of the second training sample data at the leaf node corresponding to each non-leaf node is less than a preset sample number threshold (for example: count (x)<cnt0, cnt0 characterizing a preset sample number threshold); if yes, pruning the leaf nodes corresponding to the non-leaf nodes of the decision tree.

For example, referring to the non-leaf node x4 in the nth decision tree of the source scene training shown in fig. 2, it is determined that the sample distribution of the second training sample data at the leaf node (including the left leaf node and the right leaf node of x 4) corresponding to the non-leaf node x4 is smaller than the preset sample distribution threshold p0 (e.g., p0= 0.000001), or it is determined that the sample number of the second training sample data at the leaf node (including the left leaf node and the right leaf node of x 4) corresponding to the non-leaf node x4 is smaller than the preset sample number threshold (e.g., the sample number threshold is 5, the sample number of the left leaf node of x4 is 3, the sample number of the right leaf node of x4 is 2, and the sample numbers of the left leaf node and the right leaf node of x4 are both smaller than the sample number threshold, respectively).

It should be noted that, when the leaf nodes below the non-leaf nodes of the decision tree are pruned, both left and right leaf nodes below the non-leaf nodes may be pruned (for example, as shown in fig. 2, both left and right leaf nodes of "x4" in the dashed-line box in the decision tree in the rectangular box on the right side of the "migration" are pruned), or only left leaf nodes below the non-leaf nodes may be pruned alone, or only right leaf nodes below the non-leaf nodes may be pruned alone. Of course, in practical applications, different pruning strategies may be adopted, such as: and (4) pruning is carried out aiming at the father node (non-leaf node) of the leaf node meeting the decision tree pruning judgment condition.

It should be noted that, when the sample distribution of the leaf node corresponding to the non-leaf node is smaller than the preset sample distribution threshold, the influence of the classification prediction error corresponding to the leaf node corresponding to the characterization non-leaf node is small (for example, the classification error is smaller than 0.000001), the number of samples of the leaf node corresponding to each non-leaf node is smaller than the preset sample number threshold, and the number of samples of a certain classification after the classification prediction corresponding to the leaf node corresponding to the characterization non-leaf node is too low (for example, in 1 ten thousand second training sample data, only less than 5 second training sample data correspond to the classification predicted by the leaf node below a certain non-leaf node).

By pruning the decision tree in the decision tree model through the steps described above, over-fitting of the decision tree can be prevented and the generalization capability of the decision tree can be improved.

Similarly, pruning is performed on each decision tree of the decision tree model, and the pruned decision tree model is output.

In the process from step a to step b described above, only the process of migration pruning by one decision tree in the above decision tree model is illustrated with reference to fig. 2. In practical applications, when the decision tree model includes a plurality of decision trees, the steps a to b are repeatedly executed to prune each decision tree of the decision tree model, and finally the decision tree model with all the pruned decision trees is output.

2) Decision parameter adjustment for decision tree migration

In this specification, the decision feature refers to an attribute of a node in the decision tree for making a decision.

For example, please refer to the nth decision tree of the source scene training or the nth decision tree after the target scene migration shown in fig. 2, where "x1" represents the decision feature of the root node of the decision tree, "x2" represents the decision feature of the non-leaf node where "x2" is located, "x3" represents the decision feature of the non-leaf node where "x3" is located, "x4" represents the decision feature of the non-leaf node where "x4" is located, and "x5" represents the decision feature of the non-leaf node where "x5" is located.

In this specification, the decision parameter refers to a decision characteristic threshold value used for each node in the decision tree to make a decision.

For example, please refer to the nth decision tree of the source scenario training shown in fig. 2, where "x1" represents the decision feature of the root node of the decision tree, "x2" represents the decision feature of the non-leaf node where "x2" is located, "x3" represents the decision feature of the non-leaf node where "x3" is located, "x4" represents the decision feature of the non-leaf node where "x4" is located, and "x5" represents the decision feature of the non-leaf node where "x5" is located. And carrying out decision classification on the corresponding node of the 'x 1' according to decision conditions of '< t1' and '> = t1', and then t1 is a decision characteristic threshold value for carrying out decision on the corresponding node of the 'x 1'. Similarly, t2 is a decision feature threshold for making a decision with a node corresponding to "x2", t3 is a decision feature threshold for making a decision with a node corresponding to "x3", t4 is a decision feature threshold for making a decision with a node corresponding to "x4", and t5 is a decision feature threshold for making a decision with a node corresponding to "x 5".

For another example, please refer to the nth decision tree after the target scene migration shown in fig. 2, where "x1" represents the decision feature of the root node of the decision tree, "x2" represents the decision feature of the non-leaf node where "x2" is located, "x3" represents the decision feature of the non-leaf node where "x3" is located, "x4" represents the decision feature of the non-leaf node where "x4" is located, and "x5" represents the decision feature of the non-leaf node where "x5" is located. And carrying out decision classification on the ' x 1' corresponding node according to decision conditions of ' < t1' and ' > = t1' ″, wherein t1' is a decision characteristic threshold value for carrying out decision on the node corresponding to the ' x 1'. Similarly, t2' is a decision feature threshold for making a decision with a node corresponding to "x2", t3' is a decision feature threshold for making a decision with a node corresponding to "x3", and t5' is a decision feature threshold for making a decision with a node corresponding to "x 5".

In the present specification, the above-mentioned loss function is characterized based on the following formula:

f (x) is trained under a representation source scene to obtain a classification function, p, corresponding to the decision tree model _S (f (x)) characterizes the predicted distribution of the machine learning model in the source scene, p _T (f (x)) characterizing a predicted distribution of the machine learning model in a target scene; p is a radical of _S (x _i ) Characterising variable x _i Probability distribution in source scene, p _T (x _i ) Characterising variable x _i Probability distribution in a target scene; t is a unit of _i Characterizing variable x _i A series of decision feature thresholds for nodes in the decision tree model; by searching for the variable x in the target scene _i A new decision characteristic threshold value is adopted, so that the value of the loss function is minimum;

JS (p) in the above loss function _S (f(x))||p _T P in (f (x))) _S (f (x)) and p _T (f (x) as p and q, respectively, and JS (p) in the loss function _S (x _i )||p _T (x _i ) P in (1) _S (x _i ) And p _T (x _i ) As p and q, respectively, substituted into the following formula for calculation:

wherein JS (p | | q) represents the probability scoreJS divergence of cloth p and probability distribution q,

characterizing probability distributions p and p

The KL divergence of (a),

characterizing probability distributions q and q

KL divergence of (1).

It should be noted that the KL divergence is also called relative entropy, information divergence, and information gain. The KL divergence is a measure of the asymmetry of the difference between the two probability distributions. The JS divergence measures the similarity between two probability distributions, and the problem of asymmetric KL divergence is solved based on the KL divergence variant. Generally, the JS divergence is symmetrical, with a value between 0 and 1. For specific introduction of the JS divergence and the KL divergence, please refer to the description of the related art document, which is not repeated here.

In an embodiment shown in the present disclosure, after pruning each of the trained decision trees in the decision tree model, a decision parameter may be further adjusted for each of the pruned decision trees in the decision tree model, so as to complete the migration of the decision tree model from the source scene to the target scene.

In this specification, the decision parameter adjustment process is performed on each of the pruned decision trees in the decision tree model, and specifically includes the following steps:

step a, inputting the second training sample data into each decision tree of the pruned decision tree model for prediction, and recording the sample distribution of the second training sample data at each node in each decision tree

And the abovePosterior probability distribution of all sample data in the second training sample data

That is, the posterior probability distribution is a conditional probability of y =1 (y characterizes the prediction result, and y =1 characterizes the class label of the prediction result of y as 1, which can actually characterize two or more classes) with x known.

B, performing hierarchical traversal from a root node to a non-leaf node of each decision tree, iteratively calculating the sample distribution of each node in each decision tree and the posterior probability distribution of all sample data in the second training sample data based on a preset loss function, and adjusting the decision characteristic threshold of each node according to a preset step value to solve the minimum value of the loss function; and taking the decision characteristic threshold of each node corresponding to the minimum loss function as the adjusted decision parameter of each decision tree.

For example, the process described in step b above can be specifically expressed in pseudo-code by the following algorithm:

and traversing the non-leaf nodes i of a certain decision tree from the base point to the hierarchy:

{

calculating the Loss (corresponding to the Loss in the formula corresponding to the Loss function) of each node sample distribution according to the formula of the Loss function _S ，x _T ) And the posterior distribution Loss (corresponding to Loss function in the above formula for Loss function) _S )，f(x _T ) ) sum of L ₀ ；

And (5) iteratively calculating k times:

{

s1, using decision characteristic x of node i _j Corresponding decision characteristic threshold v _ij Is preset step value delta _j (e.g.: delta _j Can be v _ij One hundredth of) of the decision characteristics x) are adjusted _j A corresponding decision characteristic threshold; i.e. the adjusted decision characteristic x _j Corresponding decision characteristic threshold v _ij ′＝v _ij +m*δ _j M is a natural number

S2. With v _ij Predicting the second sample in the target scene again for the new decision threshold of the node i, and recording the sample distribution of the second training sample data in each node of each decision tree

And posterior probability distribution of all sample data in the second training sample data

Calculate by v _ij ' under the new decision threshold of the node i, calculating the Loss (corresponding to the Loss in the formula corresponding to the Loss function) of the sample distribution of each node according to the formula of the Loss function _S ，x _T ) And the posterior distribution Loss (corresponding to Loss function in the above formula for Loss function) _s )，f(x _T ) ) sum of L _k ；

S3, if L _k ＜L ₀ &|L ₀ -L _k |>= e; wherein e is a constant with a positive value close to 0; the following update is performed:

let L ₀ ＝L _k ；v _ij ＝v _ij ′；

Otherwise; the sub-cycle is terminated

}

Referring to fig. 2, after the decision parameter adjustment algorithm in steps a to b described above, model migration is performed on the nth decision tree trained in the source scene to obtain the nth decision tree after the corresponding target scene is migrated.

As shown in fig. 2, please compare the decision tree of the reference source scene with the decision tree of the target scene, and adjust the threshold of the decision feature corresponding to the same decision feature. For example, the decision characteristic threshold for the "x1" corresponding node to make the decision has been updated from t1 to t1'. Similarly, the decision feature threshold for the "x2" corresponding node to make a decision has been updated from t2 to t2', "x3" corresponding node to make a decision has been updated from t3 to t3', and "x5" corresponding node to make a decision has been updated from t5 to t5'. It should be noted that, since the leaf node of the "x4" corresponding node has already been pruned, the decision characteristic threshold for the "x5" corresponding node to make a decision does not need to be updated.

It should be noted that, when the decision feature threshold of the decision tree node is adjusted, not only one level traversal from the root node to the leaf node may be performed on each decision tree, but also multiple traversals may be performed on the leaf nodes first, and then the root node is recursively adjusted.

As shown in fig. 2, in the above-described decision parameter adjustment process in steps a to b, only the processing process of migrating decision parameter in one decision tree in the above decision tree model is illustrated. In practical applications, when the decision tree model includes a plurality of decision trees, similarly, the decision parameter adjustment algorithm described in steps a to b is repeatedly executed for each pruned decision tree, so as to prune each decision tree of the decision tree model, and finally output the decision tree model in which all decision trees are pruned.

In this specification, after the pruning processing and the decision parameter adjustment described above, the decision tree model is migrated to the target scene, and further, the migrated decision tree model may be deployed in a service system in the target scene, and the test sample data to be predicted in the target scene is input to the migrated decision tree model for classification prediction, so as to obtain a classification prediction result corresponding to the test sample data to be predicted.

For example, after the decision tree model in the financial wind control scenario of the pay bank service is migrated to the credit card cashback scenario of a certain bank, in the credit card cashback scenario of a certain bank, test sample data (for example, test data of a credit card user) to be predicted corresponding to the credit card cashback service of the bank is input to the decision tree model migrated from the financial wind control scenario of the pay bank service, so as to obtain the binary prediction of whether the credit card user is the cashback user or not.

In the above-described technical solutions and embodiments, a non-label sample of a target scene is used, and a structural feature of a decision tree model is combined to prune and fine tune a model structure and determine a threshold value for a feature of a decision tree node, so that migration adaptation of the model from a source scene to the target scene is realized, on one hand, the problem of model initialization under the condition that the target scene is not labeled is solved, and the generalization efficiency of a migration model in the target scene is improved; on the other hand, only the model parameters of the decision tree model trained in the source scene need to be output to the target scene, data under the source scene does not need to be used, and the user requirements of data safety and privacy protection are met.

Corresponding to the above method embodiments, the present specification also provides an embodiment of a model migration apparatus. The embodiment of the model migration apparatus in the present specification can be applied to an electronic device. The apparatus embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. Taking a software implementation as an example, as a logical device, the device is formed by reading, by a processor of the electronic device where the device is located, a corresponding computer program instruction in the nonvolatile memory into the memory for operation. In terms of hardware, as shown in fig. 3, the present disclosure is a hardware structure diagram of an electronic device where a model migration apparatus is located, and besides the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 3, the electronic device where the apparatus is located in the embodiment may also include other hardware according to the actual function of the electronic device, which is not described again.

FIG. 4 is a block diagram of a model migration apparatus, as shown in an exemplary embodiment of the present description.

Referring to fig. 4, the model migration apparatus 40 may be applied in the electronic device shown in fig. 4, where the model is a decision tree model including a plurality of decision trees, and the apparatus includes:

a training module 401, configured to perform model training on the model based on first training sample data in a source scene and a sample label corresponding to the first training sample data, to obtain the trained decision tree model;

the migration module 402 performs pruning on each trained decision tree in the decision tree model based on second training sample data corresponding to no sample label in the target scene, so as to complete the migration of the decision tree model from the source scene to the target scene.

In this embodiment, the feature space of the first training sample data of the target scene is the same as the feature space of the second training sample data of the source scene; the feature distribution of first training sample data of the target scene is different from the feature distribution of second training sample data of the source scene.

In this embodiment, after performing pruning processing on each decision tree in the trained decision tree model, the migration module 402 further:

In this embodiment, each decision tree includes a root node, a non-leaf node, and a leaf node;

the migration module 402:

traversing each non-leaf node in all non-leaf nodes in each decision tree;

In this embodiment, the decision parameter is a decision characteristic threshold used for each node in the decision tree to make a decision;

the migration module 402 further:

In this embodiment, the loss function is characterized based on the following equation:

f (x) is trained under a representation source scene to obtain a classification function, p, corresponding to the decision tree model _s (f (x)) characterizing the predicted distribution of the machine learning model in the source scene, p _T (f (x)) characterizing a predicted distribution of the machine learning model in a target scene; p is a radical of _s (x _i ) Characterizing variable x _i Probability distribution in the source scene, p _T (x _i ) Characterizing variable x _i Probability distribution in the target scene; t is _i Characterizing variable x _i A series of decision feature thresholds for nodes in the decision tree model; by searching for the variable x in the target scene _i New decision feature threshold value to maximize the value of the loss functionSmall;

characterizing probability distributions p and p

The KL divergence of (a) is,

characterizing probability distributions q and q

KL divergence of.

In this embodiment, the decision tree in the decision tree model is constructed based on a random forest algorithm or a GDBT algorithm.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.

The apparatuses, modules or modules illustrated in the above embodiments may be specifically implemented by a computer chip or an entity, or implemented by a product with certain functions. A typical implementation device is a computer, which may be in the form of a personal computer, laptop, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This specification is intended to cover any variations, uses, or adaptations of the specification following, in general, the principles of the specification and including such departures from the present disclosure as come within known or customary practice within the art to which the specification pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the specification being indicated by the following claims.

It will be understood that the present description is not limited to the precise arrangements that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present description is limited only by the appended claims.

The above description is only a preferred embodiment of the present disclosure, and should not be taken as limiting the present disclosure, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims

1. A method of model migration, the model being a decision tree model comprising a number of decision trees, the method comprising:

based on second training sample data corresponding to no sample label in the target scene, pruning processing is respectively carried out on each decision tree in the trained decision tree model, so that the decision tree model is migrated from the source scene to the target scene.

2. The method of claim 1, the feature space of first training sample data of the target scene being the same as the feature space of second training sample data of the source scene; the feature distribution of first training sample data of the target scene is different from the feature distribution of second training sample data of the source scene.

3. The method of claim 1, after pruning each of the trained decision trees in the decision tree model, further comprising:

4. The method of claim 1, the each decision tree comprising a root node, a non-leaf node, a leaf node;

traversing each non-leaf node in all non-leaf nodes in each decision tree;

5. The method of claim 3, the decision parameter being a decision feature threshold for each node in a decision tree to make a decision;

the step of performing decision parameter adjustment on each decision tree in the pruned decision tree model comprises:

6. The method of claim 5, the loss function characterized based on the following formula:

f (x) is trained under a representation source scene to obtain a classification function, p, corresponding to the decision tree model _S (f (x)) characterizing the machinePredictive distribution of learning models in source scenes, p _T (f (x)) characterizing a predicted distribution of the machine learning model in a target scene; p is a radical of _S (x _i ) Characterizing variable x _i Probability distribution in source scene, p _T (x _i ) Characterizing variable x _i Probability distribution in a target scene; t is a unit of _i Characterizing variable x _i A series of decision characteristic thresholds for nodes in the decision tree model; by searching for the variable x in the target scene _i A new decision characteristic threshold value is adopted, so that the value of the loss function is minimum;

characterizing probability distributions p and p

The KL divergence of (a) is,

characterizing probability distributions q and q

KL divergence of.

7. The method of claim 1, wherein a decision tree in the decision tree model is constructed based on a random forest algorithm or constructed based on a GDBT algorithm.

8. A model migration apparatus, the model being a decision tree model comprising a number of decision trees, the apparatus comprising:

and the migration module is used for performing pruning treatment on each decision tree in the trained decision tree model based on second training sample data corresponding to no sample label in the target scene so as to finish the migration of the decision tree model from the source scene to the target scene.

9. The device of claim 8, the feature space of first training sample data of the target scene is the same as the feature space of second training sample data of the source scene; the feature distribution of first training sample data of the target scene is different from the feature distribution of second training sample data of the source scene.

10. The apparatus of claim 8, after pruning each of the trained decision trees in the decision tree model, the migration module further:

11. The apparatus of claim 8, each decision tree comprising a root node, a non-leaf node, a leaf node;

the migration module:

traversing each non-leaf node in all non-leaf nodes in each decision tree;

12. The apparatus of claim 10, the decision parameter being a decision feature threshold for each node in a decision tree to make a decision;

the migration module further:

13. The apparatus of claim 12, the loss function characterized based on the following equation:

f (x) is trained under a representation source scene to obtain a classification function, p, corresponding to the decision tree model _S (f (x)) characterizing the predicted distribution of the machine learning model in the source scene, p _T (f (x)) characterizing a predicted distribution of the machine learning model in a target scene; p is a radical of formula _S (x _i ) Characterising variable x _i Probability distribution in source scene, p _T (x _i ) Characterizing variable x _i Probability distribution in the target scene; t is _i Characterising variable x _i A series of decision characteristic thresholds for nodes in the decision tree model; by searching for the variable x in the target scene _i A new decision characteristic threshold value is adopted, so that the value of the loss function is minimum;

JS (p) in the loss function _S (f(x))||p _T P in (f (x))) _S (f (x)) and p _T (f (x) as p and q, respectively, and JS (p) in loss function _S (x _i ))||p _T (x _i ) P in (1) _S (x _i ) And p _T (x _i ) As p and q, respectively, substituted into the following formula for calculation:

characterizing probability distributions p and p

The KL divergence of (a),

characterizing probability distributions q and q

KL divergence of.

14. The apparatus of claim 8, a decision tree in the decision tree model being constructed based on a random forest algorithm or based on a GDBT algorithm.

15. An electronic device comprises a communication interface, a processor, a memory and a bus, wherein the communication interface, the processor and the memory are connected with each other through the bus;

the memory has stored therein machine-readable instructions, the processor executing the method of any of claims 1 to 7 by calling the machine-readable instructions.