CN110852447B

CN110852447B - Meta learning method and apparatus, initializing method, computing device, and storage medium

Info

Publication number: CN110852447B
Application number: CN201911119332.0A
Authority: CN
Inventors: 柳玉豹; 蓝利君; 李超
Original assignee: Tencent Cloud Computing Beijing Co Ltd
Current assignee: Tencent Cloud Computing Beijing Co Ltd
Priority date: 2019-11-15
Filing date: 2019-11-15
Publication date: 2023-11-07
Anticipated expiration: 2039-11-15
Also published as: CN110852447A

Abstract

The invention relates to a meta learning method and device, an initializing method, a computing device and a computer-readable storage medium of a risk prediction model. The meta learning method comprises the following steps: generating a training task set comprising a plurality of training tasks, wherein the plurality of training tasks are provided with respective different class predictors; initializing network weights of a meta learner, a feature extractor and a task discriminator, wherein the category predictor, the meta learner, the feature extractor and the task discriminator are artificial neural networks, and the category predictor has the same network structure as the meta learner; dividing the training tasks in the training task set into a plurality of batches, and updating the network weights of the meta learner, the feature extractor and the task discriminator on a per-batch basis, wherein the updating is performed according to the category prediction loss and the task discrimination loss. The method can improve the generalization capability of the meta learner, thereby quickly obtaining a better risk prediction model in the training of small samples in a financial wind control scene.

Description

Meta learning method and apparatus, initializing method, computing device, and storage medium

Technical Field

The present invention relates to the field of machine learning technology, and in particular, to a meta learning method and apparatus, an initialization method, a computing device, and a computer readable storage medium.

Background

Machine learning, and in particular deep learning, has been successfully applied to many fields such as computer vision, natural language processing, data mining, and the like since the rise. An important factor in achieving good performance in these areas is the ease with which large amounts of marking data can be obtained in these areas. However, in a financial wind control scenario, such as risk control in financial business links of payment, loan, financial accounting, etc., the data distribution of different customer groups is greatly different, and there are many customer groups with small sample characteristics. Thus, it is often difficult to collect enough labeled samples for traditional machine learning to extract the wind-control related pattern features from the data, thereby facilitating model overfitting.

Meta Learning (Meta Learning) is a sub-domain of machine Learning. The traditional machine learning problem is to learn a mathematical model for prediction from scratch based on a massive dataset, which is far from the process of human learning, accumulating historical experience (also called meta knowledge), and guiding new machine learning tasks. Meta-learning is a learning training process that learns different machine learning tasks, and how to learn how to train a mathematical model faster and better.

Disclosure of Invention

It would be advantageous to provide a mechanism that can generalize the meta-knowledge learned from a large number of related tasks quickly into new task training so that risk prediction models can be trained effectively with fewer sample conditions.

According to a first aspect of the present invention, there is provided a computer-implemented method of meta learning of a risk prediction model, comprising: generating a training program including a plurality of training tasks T _i Wherein the plurality of training tasks T _i Are provided with respectively different class predictors P _i Different class predictors P _i Corresponding to different financial business scenes, each training task T _i Comprising a plurality of data samples d _j Each data sample d _j Including personal characteristics x of the corresponding user _j Credit risk category y of the corresponding user _j And indicating the data sample d _j Task label t of the associated training task _j The method comprises the steps of carrying out a first treatment on the surface of the Initializing network weights for meta learnerNetwork weight θ of feature extractor _F And a network of task discriminatorsWeight θ _D Wherein the class predictor P _i The element learner, the feature extractor and the task discriminator are artificial neural networks, and the different category predictors P _i Having the same network structure as the meta learner, the feature extractor is configured to map data samples to a feature space, each of the class predictors P _i Configured for predicting a credit risk class of a user represented by the data sample based on the mapped data sample, the task discriminator being configured for discriminating from which training task the mapped data sample comes, the meta-learner being configured for a class predictor P of different training tasks _i Learning network weights->For initializing network weights of class predictors of the target prediction task; and training tasks T in the training task set T _i Divided into a plurality of batches T _train And updates the network weight of the meta learner on a per lot basis>Network weight θ of the feature extractor _F And the network weight theta of the task discriminator _D . Updating the network weights of the meta learner on a per batch basis>Comprising the following steps: based on the batch T _train Training task T of (1) _i Respective category prediction loss->To update the network weights of the meta learnerWherein each of said categories predicts loss->Indicating the batchSecondary T _train Training task T of (1) _i Respective category predictors P _i Is used for the prediction error of (a). Updating the network weights θ of the feature extractors on a per-batch basis _F Comprising the following steps: updating network weights θ of the feature extractor based on countermeasures against loss _F Wherein the countermeasures are the batch T _train Training task T of (1) _i Respective category prediction loss->And the respective task discrimination loss->Wherein each of said task discrimination loss +.>Indicating a discrimination error of the task discriminator. Updating the network weights θ of the task discriminators on a per-batch basis _D Comprising the following steps: based on the batch T _train Training task T of (1) _i Respective task discrimination loss->To update the network weights θ of the task discriminators _D 。

In some embodiments, each training task T _i E T contains training sample setAnd test sample set->

In some embodiments, based on the lot T _train Training task T of (1) _i Respective category prediction lossesTo update the network weight of the meta learner +.>Involving for each training task T _i ∈T _train The following operations are performed: initializing training task T _i Class predictor P of (a) _i Is->Is->Calculating class prediction loss->

And updating class predictor P _i Is->The method comprises the following steps:

and updating network weights of the meta learnerThe method comprises the following steps:

where F () represents the transfer function of the feature extractor, P _i () Representing class predictors P _i Cross _ entropy () represents the cross entropy loss function,representing a gradient operation.

In some embodiments, the network weights θ of the feature extractor are updated based on countermeasures against loss _F Comprising the following steps: for each training task T _i ∈T _train Calculating task discrimination loss

And updating the network weights θ of the feature extractor _F The method comprises the following steps:

where D () represents the transfer function of the task arbiter, lambda is an adjustable parameter,to combat the loss.

In some embodiments, based on the lot T _train Training task T of (1) _i Loss of discrimination of respective tasksTo update the network weights θ of the task discriminators _D Comprising the following steps: updating the network weight theta of the task arbiter _D The method comprises the following steps:

in some embodiments, the method further comprises: the adjustable parameter lambda is set in the range of 0.5 to 1.

In some embodiments, the network weights of the meta learner are initializedFeature extractorNetwork weight theta _F And network weight θ of task arbiter _D Comprising the following steps: weighting the network->θ _F And theta _D Initialized to a random value.

According to a second aspect of the present invention, there is provided an initialization method for initializing a risk prediction model in a training phase. The risk prediction model is used for predicting credit risk of a user in a financial service, and comprises a feature extraction network and a category prediction network, wherein the feature extraction network is the feature extractor updated by the method described in the first aspect, and the category prediction network has the same network structure as the meta learner used in the method described in the first aspect. The initialization method comprises the following steps: initializing network weights of the class prediction network with the network weights of the meta learner updated by the method described in the first aspect.

According to a third aspect of the present invention, there is provided a meta learning device of a risk prediction model. The risk prediction model is used for predicting credit risk of a user in financial business, and the device comprises:

-a generation module configured for generating a training system comprising a plurality of training tasks T _i Wherein the plurality of training tasks T _i Are provided with respectively different class predictors P _i Different class predictors P _i Corresponding to different financial business scenes, each training task T _i Comprising a plurality of data samples d _j Each data sample d _j Including personal characteristics x of the corresponding user _j Credit risk category y of the corresponding user _j And indicating the data sample d _j Task label t of the associated training task _j ；

-a feature extractor configured for mapping the data samples to a feature space;

-a task discriminator configured to discriminate from which training task the mapped data samples come;

-meta-elementLearner configured to category predictors P from different training tasks _i Learning network weightsNetwork weights for initializing class predictors of a target prediction task, wherein the class predictors P _i The element learner, the feature extractor and the task discriminator are artificial neural networks, and the different category predictors P _i Having the same network structure as the meta learner, each of the class predictors P _i Configured to predict a credit risk category of a user represented by the data sample based on the mapped data sample;

-an initialization module configured for randomly initializing network weights of the meta learnerNetwork weight θ of feature extractor _F And network weight θ of task arbiter _D ；

-a partitioning module configured for partitioning training tasks T of said training task set T _i Divided into a plurality of batches T _train ；

-and a training module configured for updating the network weights of the meta learner on a per batch basisNetwork weight θ of the feature extractor _F And the network weight theta of the task discriminator _D ；

Wherein the network weights of the meta learner are updated on a per-batch basisComprising the following steps: based on the batch T _train Training task T of (1) _i Respective category prediction loss->To update the network weight of the meta learner +.>Wherein each of said categories predicts loss->Indicating the batch T _train Training task T of (1) _i Respective category predictors P _i Is a prediction error of (2); wherein the network weights θ of the feature extractors are updated on a per-batch basis _F Comprising the following steps: updating network weights θ of the feature extractor based on countermeasures against loss _F Wherein the countermeasures are the batch T _train Training task T of (1) _i Respective category prediction loss->And the respective task discrimination loss->Wherein each of said task discrimination loss +.>Indicating a discrimination error of the task discriminator; and wherein the network weights θ of the task discriminators are updated on a per-batch basis _D Comprising the following steps: based on the batch T _train Training task T of (1) _i Respective task discrimination loss->To update the network weights θ of the task discriminators _D 。

In some embodiments, the feature extractor includes two convolutional layers and two pooling layers.

In some embodiments, the meta learner includes three fully connected layers and one softmax layer.

In some embodiments, the task arbiter comprises: two fully attached layers and one softmax layer.

In some embodiments, the initialization module is further configured to initialize the network weights of the class predictors of the target prediction task with the updated network weights of the meta-learner.

According to a fourth aspect of the present invention there is provided a computing device comprising a memory and a processor, the memory being configured to store thereon a program which when executed on the processor causes the processor to perform the method described in the first or second aspect.

According to a fifth aspect of the present invention there is provided a computer readable storage medium having stored thereon a program which when executed on a processor causes the processor to perform the method described in the first or second aspect.

Embodiments of the present invention provide several advantages. By means of the antagonism between the task discriminator and the feature extractor, the data distribution from different tasks mapped by the feature extractor is more and more similar as the training process advances, and the task identifier has an increasing ability to identify from which task the current data comes by the extracted features. Finally, the feature extractor will achieve the goal of mapping different tasks onto a certain space with similar data distribution. At the same time, the network weights of the meta learnerThe updating of (a) allows the meta-learner to output the correct task labels for the training tasks on the mapped space (i.e., on the space where different tasks have similar data distributions). In this way, the trained feature extractor and meta learner have good generalization capability, so that an effective risk prediction model can be quickly trained for different target tasks with only a small number of labeled samples.

These and other aspects of the invention will be apparent from and elucidated with reference to the embodiments described hereinafter.

Drawings

Further details, features and advantages of the invention are disclosed in the following description of exemplary embodiments with reference to the following drawings, in which:

FIG. 1 shows a schematic diagram of an exemplary application scenario in which techniques according to embodiments of the present invention may be applied;

FIG. 2 shows a schematic flow chart of a meta learning method of a risk prediction model according to an embodiment of the present invention;

FIG. 3 further shows a schematic flow chart of a meta learning method of a risk prediction model according to an embodiment of the present invention;

FIG. 4 shows a schematic block diagram of a meta learning device of a risk prediction model according to an embodiment of the present invention;

FIG. 5 shows a schematic diagram of an algorithmic structure of meta learning of a risk prediction model, according to an embodiment of the present invention;

FIG. 6 shows a schematic and exemplary illustration of an initialization method for initializing a risk prediction model during a training phase in accordance with an embodiment of the present invention;

FIGS. 7A-7D illustrate examples of training data sets used in validation experiments according to embodiments of the present invention; and is also provided with

FIG. 8 illustrates an example system including an example computing device that represents one or more systems and/or devices that can implement the various techniques described herein.

Detailed Description

To make the features and advantages of embodiments of the present application more apparent, concepts related to the meta-learning method proposed by the present application, including model-independent meta-learning (MAML) and migration learning, are briefly described below.

MAML is one of the currently popular meta-learning algorithms, whose core idea is to learn initial parameters of the neural network, i.e., meta-parameters, from a large number of training tasks, which can enable new machine learning tasks to quickly converge to a better solution even under small sample conditions. The training process of MAML mainly comprises two parts: a Meta learner (Meta-learner) that trains by seeking initial parameters of the base learner by minimizing Meta-loss of the Meta learner over a number of target tasks (i.e., the base learner); and a Base learner (Base-learner), which is a predictive model used by the target task, whose initialization parameters are given by the metalearner and trained through a small number of gradient iterations. Because the meta-learner of MAML is based on a large number of learning tasks, it is able to produce generalized performance on new tasks with a small number of gradient iterations, i.e., training a model that is easy to fine tune. In addition, since MAML does not impose any restrictions on the form of the base learner, it can be adapted to any machine learning problem using gradient descent, such as classification problems, regression problems, and reinforcement learning problems. However, the main goal of MAML is to learn a set of meta-initial parameters that can quickly converge to a better solution for each target task under small sample conditions, which requires that each target task have some correlation with the target tasks in the training set for the meta-learner. When the data distribution difference between the new target task and the target task in the training set is large, the generalization capability of MAML is reduced, namely the meta-initial parameters cannot be quickly adapted to the new target task.

Transfer learning is a branch of machine learning, and the goal is to transfer existing marked data (also called source domain data) to unmarked data (also called target domain data), so that the learning effect of a target domain prediction model is improved through knowledge contained in the source domain data. Typically, the amount of source domain data is sufficient, while the amount of target domain data is less. A common method of migration learning is to pretrain a deep neural network on a source domain dataset with a large amount of marker data, and then use the weight of the network as an initial value or use the network as a feature extractor of the relevant task, and wholly fine-tune or partially fine-tune the deep neural network in the dataset of the relevant target task through gradient descent. Using this approach, the learning process over a small number of datasets for the target task becomes more efficient. However, migration learning typically requires a sufficient amount of source domain data. In addition, because the pre-trained model based on the source domain data still needs to be retrained on the target domain data to adapt to the target task, the overfitting phenomenon can still be easily caused when the data volume of the target domain is too small.

Therefore, in view of the large data distribution difference among different customer groups in the financial wind control scene, and the difficulty in collecting enough marking data, the related technical schemes are not suitable for the prediction model training in the scene. In order to solve the problem, the application provides a meta-learning method based on a task countermeasure mechanism. Tasks with different data distributions from different wind control scenes are mapped to a space with similar data distribution through a task countermeasure mechanism, so that the influence of data distribution differences among different tasks on the generalization capability of the model is reduced. In the mapped space, the meta-knowledge of different tasks during model training, namely, the initial parameters of a prediction model can be learned, and then the meta-knowledge is used for guiding training of a new wind control task under a small sample condition. Therefore, a better prediction model can be trained through a small number of gradient iteration steps, and the meta-knowledge learned from a large number of related tasks is quickly generalized into new task training, so that the risk prediction model can be effectively trained under a small sample condition, and the overfitting probability under a small sample condition is reduced.

Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

First, several terms used herein are defined.

1. Task (Task): the training Set for meta learning is composed of a plurality of tasks, each task being a machine learning task including a training Set (Support Set) and a test Set (Query Set).

2. Artificial Neural Networks (ANNs): an arithmetic mathematical model which simulates the behavior characteristics of an animal neural network and performs distributed parallel information processing. The network relies on the complexity of the system and achieves the purpose of processing information by adjusting the relationship of the interconnection among a large number of nodes.

Fig. 1 shows a schematic diagram of an exemplary application scenario 100 in which a scheme according to an embodiment of the invention may be applied.

The server 110 may be used to train a meta-learner using the meta-learning device 112 in accordance with the task countermeasure mechanism-based meta-learning method proposed by the present invention, and to train the risk prediction model 114 based on the meta-learner based on, for example, customer training samples provided by the customer server 120. The client server 120 may be a server of an enterprise providing financial services such as payment, loan, financing, etc. of a bank, securities, mutual funds, etc. When a user desiring to transact a related financial service accesses a service provided by the client server 120 of the enterprise using the client device 130, the client server 120 may submit personal characteristic information of the user to the trained risk prediction model 114 for risk prediction to obtain a credit risk category for the user. The risk prediction may assist the financial institution in determining whether to provide financial services to the user, or may assist the financial institution in managing users of different credit risk types differently, etc.

Servers 110 and 120 may be a single server or a cluster of servers, or may be other computing devices. The meta-learning apparatus 112, the risk prediction model 114, while shown simplified as being on the same server, in practice, may typically be deployed on different servers or other different computing devices. The client device 130 may be a variety of different types of devices, such as a desktop computer, a notebook computer, a netbook computer, a tablet, a smart phone, smart glasses, a smart watch, etc., or an application running on a variety of different types of devices. The servers 110, client servers 120, client devices 130 are connected to each other by a network 140, and the network 140 may be a variety of different networks including the Internet, a Local Area Network (LAN), a telephone network, an intranet, other public and/or proprietary networks, combinations thereof, and the like.

Fig. 2 shows a schematic flow chart 200 of a meta learning method of a risk prediction model according to an embodiment of the invention.

At step 210, a training task T comprising a plurality of training tasks is generated _i Wherein a plurality of training tasks T _i Are provided with respectively different class predictors P _i Different class predictors P _i Corresponding to different financial business scenes, each training task T _i Comprising a plurality of data samples d _j Each data sample d _j Including personal characteristics x of the corresponding user _j Credit risk category y of the corresponding user _j And indicating the data sampleThe d _j Task label t of the associated training task _j . i. j are integers greater than zero. Illustratively, the personal characteristics x of the user _j Personal basic information (such as the sex, age, usual address, family condition, etc. of the user), social behavior information (such as the login frequency, last login time, etc. of social software such as QQ, weChat, etc.), financial behavior information (such as consumption payment information, history lending information, history default behavior, etc.), and used device information, etc. may be included. Credit risk category y of user _j May simply include, for example, black users (high risk users) and white users (low risk users), or be expressed in more detail in terms of credit scores, credit ratings, etc. Task label t _j May be task tags in different financial business scenarios, such as payment (such as task tags defined in terms of payment time, type of goods or services to be purchased, amount of payment, etc.), loan (such as task tags defined in terms of loan type, amount to be loaned, predetermined repayment time period, etc.), financial (such as task tags defined in terms of financial type, financial return level, corresponding risk level, etc.).

Each training task T _i E T may contain a training sample setAnd test sample set->So that each data sample->In some embodiments, training sample set +.>The N-way K-shot setting can be satisfied. N-way K-shot is a common experimental setting in small sample Learning (Few-shot Learning), N-way refers to N categories in training data, and K-shot refers to K marked data under each category respectively. The small sample learning mainly solves the problem of how to quickly and efficiently learn the prediction model under the condition of a small number of marked samples, and isApplication of meta learning in the field of supervision and learning.

At step 220, the network weights of the meta learner are initializedNetwork weight θ of feature extractor _F And network weight θ of task arbiter _D . Class predictor P _i The meta learner, feature extractor and task arbiter are artificial neural networks, and different class predictors P _i Has the same network structure as the meta learner. The feature extractor is configured for mapping the data samples to a feature space. Each category predictor P _i Is configured to predict a credit risk category for a user represented by the data sample based on the mapped data sample. The task discriminator is configured to discriminate from which training task the mapped data sample comes. The meta learner is configured for category predictors P for different training tasks _i Learning network weights->For initializing the network weights of the class predictors of the target prediction task.

In some embodiments, the network weights of the meta learner are initializedNetwork weight θ of feature extractor _F And network weight θ of task arbiter _D Comprising the following steps: network weight +.>θ _F And theta _D Initialized to a random value. Initialized network weightsθ _F And theta _D Will be updated continuously as training proceeds.

At step 230, training tasks T in the training task set T are combined _i Divided into a plurality of batches T _train . This may be accomplished by randomly extracting multiple batches of tasks T in sequence in a training set of tasks _train E T to complete until all training tasks T in the training task set T are extracted _i 。

At step 240, at each batch T _train Updating network weights of a meta learner based on (a)Network weight θ of feature extractor _F And network weight θ of task arbiter _D 。

Updating network weights of meta learner on a per-batch basisMay include: based on the batch T _train Training task T of (1) _i Respective category prediction loss->To update the network weight of the meta learner +.>Wherein each category predicts loss->Indicating the batch T _train Training task T of (1) _i Respective category predictors P _i Is used for the prediction error of (a).

Updating the network weights θ of the feature extractors on a per-batch basis _F May include: updating network weights θ of feature extractors based on countermeasures against loss _F Wherein the countermeasures against losses are the batch T _train Training task T of (1) _i Respective category prediction lossesAnd the respective task discrimination loss->Wherein each task discriminates loss +.>Indicating a discrimination error of the task discriminator.

Updating the network weight θ of a task arbiter on a per-batch basis _D May include: based on the batch T _train Training task T of (1) _i Loss of discrimination of respective tasksUpdating the network weight θ of the task arbiter _D 。

The flowchart in fig. 3 shows the specific operation at step 240 in fig. 2 in further detail.

First, for each training task T _i ∈T _train The operation in step 241 is performed.

At step 2411, a training task T is initialized _i Class predictor P of (a) _i Network weights of (2)Is->I.e. network weight of the current meta learner +.>

At step 2412, category predicted losses are calculated as follows

At step 2413, updating the category predictor P according to _i Network weights of (2)

At step 2414, task discrimination loss is calculated as follows

F () represents the transfer function of the feature extractor, D () represents the transfer function of the task arbiter, P _i () Representing class predictors P _i Is a transfer function of (a). Namely, F (-) represents the feature information mapped by the feature extractor, D (-) represents the task information output after being judged by the task judging device, and P _i (. Cndot.) represents class label information output after being predicted by a class predictor. cross _ entropy () represents a cross entropy loss function,representing a gradient operation.

Subsequently, at step 242, the network weights of the meta learner are updated as follows

At step 243, the network weights θ of the feature extractor are updated as follows _F ：

At step 244, the network weights θ of the task discriminators are updated as follows _D ：

To combat losses, λ is an adjustable parameter. The network convergence speed and convergence effect of the feature extractor can be adjusted by adjusting the parameters. In some embodiments, the parameter may be adjusted in the range of 0.5 to 1.

In the training process, training task T is performed in each batch _i Continuously updating the network weights of the element learner, the feature extractor and the task discriminatorθ _F 、θ _D 。θ _F The updating of the class prediction network enables the class prediction network to output a correct class label as far as possible, and meanwhile, the task identifier cannot judge which task the current data comes from through the extracted characteristics. θ _D The updating of (c) allows the task discriminator to distinguish between the tasks and thereby form a countermeasure relationship with the feature extractor. In this ongoing challenge, the data distribution from different tasks mapped by the feature extractor is more and more similar, while the recognition capabilities of the task identifier are continually increasing. Finally, the feature extractor will achieve the goal of mapping different tasks onto a certain space with similar data distribution. At the same time (I) >The updating of (a) allows the meta-learner to output the correct labels for the training tasks on the mapped space (i.e., on the space where different tasks have similar data distributions). In this way, the trained feature extractor and meta-learner have good generalization capability, such that only a small number of labeled samples are used for different target tasksAn effective risk prediction model can be quickly trained.

It should be noted that the above steps are not necessarily performed in the order shown in the flowcharts, and some steps may be performed in reverse order or performed in parallel. For example, steps 2413 and 2414 may be performed in reverse or in parallel, steps 242, 243, 244 may be performed in a different order or in parallel, or may be performed in parallel with iterations of steps 2411 through 2414, such as performing an accumulation operation therein as the iterations proceed.

Fig. 4 shows a schematic block diagram of a meta learning device 112 of a risk prediction model according to an embodiment of the present invention.

As shown in fig. 4, the meta learning device 112 includes a generation module 1121, a feature extractor 1122, a task discriminator 1123, a meta learner 1124, an initialization module 1125, a division module 1126, and a training module 1127.

The generation module 1121 is configured for generating a training set comprising a plurality of training tasks T _i This operation has been described in detail above with respect to the embodiment of the method illustrated in connection with fig. 2 and is not repeated here for the sake of brevity.

Feature extractor 1122 is configured to map data samples to feature space.

Task discriminator 1123 is configured to discriminate from which training task the mapped data sample comes.

The meta learner 1124 is configured to use a class predictor P from different training tasks _i Learning network weightsThe operation of initializing the network weights of the class predictors of the target prediction task has been described in detail above with respect to the method embodiment illustrated in connection with fig. 2 and is not repeated here for brevity.

The initialization module 1125 is configured for randomly initializing network weights of the meta learner 1124Network weight θ of feature extractor 1122 _F And network weight θ of task discriminator 1123 _D . And, it is further configured to initialize the network weights of the class predictors of the target prediction task with the updated network weights of the meta-learner 1124.

The partitioning module 1126 is configured to divide the training tasks T in the training task set T _i Divided into a plurality of batches T _train 。

Training module 1127 is configured to update the network weights of meta learner 1124 on a per-batch basisNetwork weights θ of the feature extractor 1122 _F And the network weight θ of the task discriminator 1123 _D This operation has been described in detail above with respect to the method embodiment illustrated in connection with fig. 2, 3 and is not repeated here for the sake of brevity.

It will be appreciated that these modules in the meta learning device 112 may be implemented by software, firmware, hardware, or a combination thereof, as will be further described below.

Fig. 5 shows a schematic diagram of an algorithm structure 500 of meta learning of a risk prediction model according to an embodiment of the present invention, which more clearly shows an algorithm relationship and respective structures among a feature extractor, a task discriminator, a meta learner, and class predictors of different training tasks, and more clearly shows meta learning and countermeasure ideas in the meta learning method and apparatus provided by the present disclosure.

The meta learner assigns its own network weight to the class predictor as an initial value. Tasks in the training task set are mapped by the feature extractor and then flow into the task discriminator and the class predictor. The task discriminator judges the target from which the received characteristic information comes, calculates the gradient based on the task discrimination loss, and optimizes the network weight theta of the task discriminator _D The discrimination capability of the task discriminator is improved continuously. The class predictor predicts class information reflected by the received feature information and calculates a gradient based on the predicted lossOptimizing network weights of a meta learnerI.e. the network weights of the class predictors in the next iteration are optimized accordingly, so that the class prediction capability of the class predictors, i.e. the meta learner, is continuously improved. Feature extractor calculates gradients based on a combination of task discrimination loss and category prediction loss, optimizes the network weights θ of the feature extractor _F The task identifier is made more and more difficult to determine which target the feature information comes from, and the category predictor can predict the category label corresponding to the feature information. In this way, when the task discriminator cannot determine from which task the received feature information comes, the feature extractor reaches the final objective, that is, maps different tasks to a certain space with missing data distribution, and thereby completes training of the meta learner.

The meta-learning method based on the task countermeasure mechanism can have good generalization capability, and can be applied to application scenes in the small sample learning field requiring rapid generalization capability, in particular to financial risk control scenes. And the mechanism of the method can be used independently or embedded in the inner layer circulation of the big multiple learning.

In some embodiments, the meta learner includes three fully connected layers and one softmax layer (the class predictor is identical to the meta learner structure), the task arbiter includes two fully connected layers and one softmax layer, and the feature extractor includes two convolution layers and two pooling layers. The softmax function, also commonly referred to as a normalized exponential function, can "compress" a k-dimensional vector z containing arbitrary real numbers into another k-dimensional real vector σ (z) such that each element ranges between (0, 1) and the sum of all elements is 1.

Fig. 6 shows a schematic and exemplary illustration of an initialization method 600 for initializing risk prediction model 114 during a training phase in accordance with an embodiment of the present invention. Risk prediction model 114 is designed to predict the credit risk of users in financial transactions. The risk prediction model 114 includes a feature extraction network 1141 and a class prediction network 1142. The feature extraction network 1141 is the feature extractor 1122 updated by the meta learning method described above, and the class prediction network 1142 has the same network structure as the meta learner 1124 used in the meta learning method. The initialization method 600 includes: the network weights of the class prediction network 1142 are initialized with the network weights of the meta learner 1124 updated by the meta learning method described above.

Since the network weights of the meta-learner 1124 already contain the a priori knowledge obtained by training and this a priori knowledge is accumulated over the mapping space where different tasks have similar data distributions, the risk prediction model 114 initialized with such a priori knowledge is expected to converge rapidly under small sample training for new task scenarios (specific financial business scenarios).

In order to verify the effectiveness of the method proposed by the present invention, the inventors conducted several experiments. Considering (1) that training samples of an actual financial scenario are related to personal privacy and are not readily available, and (2) that experimental purposes are only to verify the generalization ability of the method, which is not limited by the specific training sample type, the use of a digital dataset for the experiment was chosen. Also, a transfer learning algorithm was selected as a comparative example.

Fig. 7A-7D show examples of training data sets used in validation experiments. Specifically, fig. 7A shows a sample of a Mnist dataset, which is a large hand-written digital database collected and consolidated by the national institute of standards and technology, containing a training set of 60,000 examples and a test set of 10,000 examples. Fig. 7B shows a sample of the Mnist-M dataset, which was generated from conversion of the Mnist dataset, as a color handwritten picture. Fig. 7C shows a SYN Number dataset, one of which is a composite digital dataset. Fig. 7D shows the SVHN dataset, which is an actual machine learning and object recognition dataset, obtained from house numbers in google street pictures.

For the meta-learning method based on the task countermeasure mechanism, task 1 (Mnist), task 2 (Mnist-M) and task 3 (SYN Num) are selected as training sets to train a feature extractor, a meta-learner (class predictor) and a task discriminator, and task 4 (SVHN) with larger data distribution difference with the previous 3 task sets is selected as a test set to test the generalization capability of the meta-learner (class predictor) on new tasks. For transfer learning, all data sets of task 1 (Mnist), task 2 (Mnist-M), task 3 (SYNNum) are combined according to their training patterns, and feature extractors and class predictors are pre-trained on the combined data sets, and finally tested on task 4 (SVHN). The number of iterations for both experiments was 10000, taking 10 minutes and 11 minutes, respectively.

The experimental results are shown in table 1, and the accuracy of the method on task 4 is higher than that of the transfer learning method, and is improved to 0.6068 from 0.3962. This shows that the meta-learning method based on the task countermeasure mechanism provided by the application has good generalization capability on new tasks with large data distribution differences. In addition, the accuracy of task 3, which is a training task, is also greatly improved by 0.39. In contrast, in the training process of the transfer learning, as the data distribution of the task 1 and the task 2 is similar and the sum of the data amounts of the task 1 and the task 2 is far greater than the data amount of the task 3, model weights tend to fit the data distribution of the task 1 and the task 2 in the training process, so that the phenomenon that the accuracy is not high although the data set of the task 3 also participates in the pre-training is caused. And its accuracy on the dataset of task 4 is also not high, indicating that its generalization capability is not high. Experimental results show that the meta-learning method based on the task countermeasure mechanism makes up the defects of transfer learning.

TABLE 1 accuracy vs. results based on task 4 (SVHN)

As can be seen from the above description and experimental results, the meta-learning based on the task countermeasure mechanism provided by the invention can improve the generalization capability of the target prediction model, and can be effectively applied to new tasks with small samples, especially new tasks with large data distribution differences.

FIG. 8 illustrates an example system 800 that includes an example computing device 810 that represents one or more systems and/or devices that can implement the various techniques described herein.

Computing device 810 may be, for example, a server of a service provider or any other suitable computing device or computing system, ranging from full resource devices with substantial memory and processor resources to low resource devices with limited memory and/or processing resources. In some embodiments, the meta learning device 112 described above with respect to fig. 4 may take the form of a computing device 810.

The example computing device 810 as illustrated includes a processing system 811, one or more computer-readable media 812, and one or more I/O interfaces 813 communicatively coupled to each other. Although not shown, computing device 810 may also include a system bus or other data and command transfer system that couples the various components to one another. A system bus may include any one or combination of different bus structures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and/or a processor or local bus that utilizes any of a variety of bus architectures. Various other examples are also contemplated, such as control and data lines.

The processing system 811 is representative of functionality that performs one or more operations using hardware. Thus, the processing system 811 is illustrated as including hardware elements 814 that may be configured as processors, functional blocks, and the like. This may include implementation in hardware as an application specific integrated circuit or other logic device formed using one or more semiconductors. The hardware element 814 is not limited by the materials from which it is formed or the processing mechanisms employed therein. For example, the processor may be comprised of semiconductor(s) and/or transistors (e.g., electronic Integrated Circuits (ICs)). In such a context, the processor-executable instructions may be electronically-executable instructions.

Computer-readable medium 812 is illustrated as including memory/storage 815. Memory/storage 815 represents memory/storage capacity associated with one or more computer-readable media. Memory/storage 815 may include volatile media (such as Random Access Memory (RAM)) and/or nonvolatile media (such as Read Only Memory (ROM), flash memory, optical disks, magnetic disks, and so forth). Memory/storage 815 may include fixed media (e.g., RAM, ROM, a fixed hard drive, etc.) and removable media (e.g., flash memory, a removable hard drive, an optical disk, etc.). The computer-readable medium 812 may be configured in a variety of other ways as described further below.

One or more input/output interfaces 813 represent functionality that allows a user to enter commands and information to computing device 810, and also allows information to be presented to the user and/or sent to other components or devices using various input/output devices. Examples of input devices include a keyboard, a cursor control device (e.g., a mouse), a microphone (e.g., for voice input), a scanner, touch functionality (e.g., capacitive or other sensors configured to detect physical touches), a camera (e.g., motion that does not involve touches may be detected as gestures using visible or invisible wavelengths such as infrared frequencies), a network card, a receiver, and so forth. Examples of output devices include a display device (e.g., a display or projector), speakers, a printer, a haptic response device, a network card, a transmitter, and so forth.

Computing device 810 also includes meta-learning strategy 816. The meta learning strategy 816 may be stored as computer program instructions in the memory/storage 815. The meta learning strategy 816 may implement all of the functions of the various modules of the meta learning device 112 described with respect to fig. 4 (specifically, the generation module 1121, the feature extractor 1122, the task discriminator 1123, the meta learner 1124, the initialization module 1125, the division module 1126, and the training module 1127) along with the processing system 811.

Various techniques may be described herein in the general context of software hardware elements or program modules. Generally, these modules include routines, programs, objects, elements, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The terms "module," "functionality," and "component" as used herein generally represent software, firmware, hardware, or a combination thereof. The features of the techniques described herein are platform-independent, meaning that the techniques may be implemented on a variety of computing platforms having a variety of processors.

An implementation of the described modules and techniques may be stored on or transmitted across some form of computer readable media. Computer-readable media can include a variety of media that are accessible by computing device 810. By way of example, and not limitation, computer readable media may comprise "computer readable storage media" and "computer readable signal media".

"computer-readable storage medium" refers to a medium and/or device that can permanently store information and/or a tangible storage device, as opposed to a mere signal transmission, carrier wave, or signal itself. Thus, computer-readable storage media refers to non-signal bearing media. Computer-readable storage media include hardware such as volatile and nonvolatile, removable and non-removable media and/or storage devices implemented in methods or techniques suitable for storage of information such as computer-readable instructions, data structures, program modules, logic elements/circuits or other data. Examples of a computer-readable storage medium may include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical storage, hard disk, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or other storage devices, tangible media, or articles of manufacture adapted to store the desired information and which may be accessed by a computer.

"computer-readable signal medium" refers to a signal bearing medium configured to hardware, such as to send instructions to computing device 810 via a network. Signal media may typically be embodied in computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, data signal, or other transport mechanism. Signal media also include any information delivery media. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.

As previously described, the hardware elements 814 and computer-readable media 812 represent instructions, modules, programmable device logic, and/or fixed device logic implemented in hardware that, in some embodiments, may be used to implement at least some aspects of the techniques described herein. The hardware elements may include integrated circuits or components of a system on a chip, application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), complex Programmable Logic Devices (CPLDs), and other implementations in silicon or other hardware devices. In this context, the hardware elements may be implemented as processing devices that perform program tasks defined by instructions, modules, and/or logic embodied by the hardware elements, as well as hardware devices that store instructions for execution, such as the previously described computer-readable storage media.

Combinations of the foregoing may also be used to implement the various techniques and modules described herein. Accordingly, software, hardware, or program modules, and other program modules may be implemented as one or more instructions and/or logic embodied on some form of computer readable storage medium and/or by one or more hardware elements 814. Computing device 810 may be configured to implement particular instructions and/or functions corresponding to software and/or hardware modules. Thus, for example, by using the computer-readable storage medium of the processing system and/or the hardware element 814, a module may be implemented at least in part in hardware as a module executable by the computing device 810 as software. The instructions and/or functions may be executable/operable by one or more articles of manufacture (e.g., one or more computing devices 810 and/or processing systems 811) to implement the techniques, modules, and examples described herein.

The techniques described herein may be supported by these various configurations of computing device 810 and are not limited to the specific examples of techniques described herein. The functionality of computing device 810 may also be implemented in whole or in part on "cloud" 820 using a distributed system, such as through platform 830 as described below.

Cloud 820 includes and/or is representative of a platform 830 for resources 832. Platform 830 abstracts underlying functionality of hardware (e.g., servers) and software resources of cloud 820. Resources 832 may include applications and/or data that may be used when executing computer processes on servers remote from computing device 810. Resources 832 may also include services provided over the internet and/or over a customer network such as a cellular or Wi-Fi network.

Platform 830 may abstract resources and functionality to connect computing device 810 with other computing devices. Platform 830 may also be used to abstract a hierarchy of resources to provide a corresponding level of hierarchy of requirements encountered for resources 832 implemented via platform 830. Thus, in an interconnected device embodiment, the implementation of the functionality described herein may be distributed throughout the system 800. For example, the functionality may be implemented in part on computing device 810 and by platform 830 abstracting the functionality of cloud 820.

Variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed subject matter, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

Claims

1. A method of training a risk prediction model for predicting credit risk of a user in a financial business and comprising a feature extraction network and a category prediction network, the method comprising:

generating by a first server a training task T comprising a plurality of training tasks _i Wherein the plurality of training tasks T _i Are provided with respectively different class predictors P _i Different class predictors P _i Corresponding to different financial business scenes, each training task T _i Comprising a plurality of data samples d _j Each data sample d _j Including personal characteristics x of the corresponding user _j Credit risk category y of the corresponding user _j And indicating the data sample d _j Task tag t in corresponding financial business of training task _j ；

Initializing a meta-by a first serverNetwork weights of learnerNetwork weight θ of feature extractor _F And network weight θ of task arbiter _D Wherein the class predictor P _i The element learner, the feature extractor and the task discriminator are artificial neural networks, and the different category predictors P _i Having the same network structure as the meta learner, the feature extractor is configured to map data samples to a feature space, each of the class predictors P _i Configured for predicting a credit risk class of a user represented by the data sample based on the mapped data sample, the task discriminator being configured for discriminating from which training task the mapped data sample comes, the meta-learner being configured for a class predictor P of different training tasks _i Learning network weights->For initializing network weights of class predictors of the target prediction task; and is also provided with

Training tasks T in the training task set T are transmitted to a first server _i Divided into a plurality of batches T _train And updating the network weights of the meta learner on a per-batch basisNetwork weight θ of the feature extractor _F And the network weight theta of the task discriminator _D ；

The method comprises the steps that a first server takes a feature extractor after updating network weights as a feature extraction network of a risk prediction model, and the network weights of a category prediction network of the risk prediction model are initialized by utilizing the updated network weights of the meta learner, so that an initialized risk prediction model is obtained, wherein the category prediction network has the same network structure as the meta learner;

receiving a training sample provided by a second server through a first server, and training the initialized risk prediction model based on the training sample to obtain a trained risk prediction model, so that when a user accesses a service provided by the second server by using client equipment, the second server performs risk prediction based on personal characteristics of the user by using the trained risk prediction model, and thus a credit risk category of the user is obtained;

Wherein the network weights of the meta learner are updated on a per-batch basisComprising the following steps: based on the batch T _train Training task T of (1) _i Respective category prediction loss->To update the network weight of the meta learner +.>Wherein each of said categories predicts loss->Indicating the batch T _train Training task T of (1) _i Respective category predictors P _i Prediction error for credit risk category of user, wherein the network weight θ of the feature extractor is updated on a per-batch basis _F Comprising the following steps: updating network weights θ of the feature extractor based on countermeasures against loss _F Wherein the countermeasures are the batch T _train Training task T of (1) _i Respective category prediction loss->And the respective task discrimination loss->Wherein each of said task discriminates a loss/>Indicating a discrimination error of the task discriminator for a training task corresponding to a financial business, and

wherein the network weight θ of the task arbiter is updated on a per-batch basis _D Comprising the following steps: based on the batch T _train Training task T of (1) _i Loss of discrimination of respective tasksTo update the network weights θ of the task discriminators _D 。

2. The method of claim 1, wherein each training task T _i E T contains training sample set And test sample set->

3. The method of claim 2, wherein the batch T is based on _train Training task T of (1) _i Respective category prediction lossesTo update the network weight of the meta learner +.>Comprising the following steps:

for each training task T _i ∈T _train The following operations are performed:

initializing training task T _i Class predictor P of (a) _i Network weights of (2)Is->

Calculating class prediction losses

And

updating class predictor P _i Network weights of (2)The method comprises the following steps:

and is also provided with

Updating network weights of a meta learnerThe method comprises the following steps:

4. A method as claimed in claim 3, wherein the network weights θ of the feature extractor are updated based on countermeasures against losses _F Comprising the following steps:

for each training task T _i ∈T _train Calculating task discrimination loss

And is also provided with

Updating network weights θ of the feature extractor _F The method comprises the following steps:

5. The method of claim 4, wherein the batch T is based on _train Training task T of (1) _i Loss of discrimination of respective tasksTo update the network weights θ of the task discriminators _D Comprising the following steps:

updating the network weight theta of the task arbiter _D The method comprises the following steps:

6. the method of claim 4, further comprising: the adjustable parameter lambda is set in the range of 0.5 to 1.

7. Any one of claims 1 to 6The method wherein initializing network weights of a meta learnerNetwork weight θ of feature extractor _F And network weight θ of task arbiter _D Comprising the following steps:

weighting the networkθ _F And theta _D Initialized to a random value.

8. A training apparatus of a risk prediction model for predicting credit risk of a user in a financial transaction and comprising a feature extraction network and a category prediction network, the apparatus comprising:

a generation module configured to generate a first server including a plurality of training tasks T _i Wherein the plurality of training tasks T _i Are provided with respectively different class predictors P _i Different class predictors P _i Corresponding to different financial business scenes, each training task T _i Comprising a plurality of data samples d _j Each data sample d _j Including personal characteristics x of the corresponding user _j Credit risk category y of the corresponding user _j And indicating the data sample d _j Task label t in financial business corresponding to training task _j ；

A feature extractor configured to map the data samples to a feature space by the first server;

a task discriminator configured to discriminate, by the first server, from which training task the mapped data sample comes;

a meta learner configured for category predictors P for different training tasks via the first server _i Learning network weightsNetwork weights for initializing class predictors of a target prediction task, wherein the class predictors P _i The element learner, the feature extractor and the task discriminator are artificial neural networks, and the different category predictors P _i Having the same network structure as the meta learner, each of the class predictors P _i Configured to predict a credit risk category of a user represented by the data sample based on the mapped data sample;

an initialization module configured to randomly initialize network weights of the meta learner through the first serverNetwork weight θ of feature extractor _F And network weight θ of task arbiter _D ；

A dividing module configured to divide training tasks T in the training task set T by a first server _i Divided into a plurality of batches T _train The method comprises the steps of carrying out a first treatment on the surface of the And

a training module configured to update the network weights of the meta learner on a per-batch basis via a first server Network weight θ of the feature extractor _F And the network weight theta of the task discriminator _D Using the updated network weights as the feature extraction network of the risk prediction model, and initializing the network weights of the class prediction network of the risk prediction model by using the updated network weights of the meta learner to obtain an initialized risk prediction model, wherein the class prediction network has the same network structure as the meta learner, and is configured to receive training samples provided by a second server through a first server, and train the initialized risk prediction model based on the training samples to obtain a trained risk prediction model, so that the second server uses the trained risk prediction model when a user accesses services provided by the second server using a client deviceThe risk prediction model carries out risk prediction based on the personal characteristics of the user, so that the credit risk category of the user is obtained;

wherein the network weights of the meta learner are updated on a per-batch basisComprising the following steps: based on the batch T _train Training task T of (1) _i Respective category prediction loss->To update the network weight of the meta learner +. >Wherein each of said categories predicts loss->Indicating the batch T _train Training task T of (1) _i Respective category predictors P _i For a prediction error of the credit risk category of the user,

wherein the network weights θ of the feature extractors are updated on a per-batch basis _F Comprising the following steps: updating network weights θ of the feature extractor based on countermeasures against loss _F Wherein the countermeasures are the batch T _train Training task T of (1) _i Respective category prediction lossesAnd the respective task discrimination loss->Wherein each of said task discrimination loss +.>Instruct the task discriminator to discriminate the training task corresponding to the financial businessError, and

9. The apparatus of claim 8, wherein the feature extractor comprises two convolutional layers and two pooling layers.

10. The apparatus of claim 8, wherein the meta learner comprises three fully connected layers and one softmax layer.

11. The apparatus of claim 8, wherein the task arbiter comprises: two fully attached layers and one softmax layer.

12. A computing device comprising a memory and a processor, the memory configured to store a program thereon, which when executed on the processor causes the processor to perform the method of any of claims 1-7.

13. A computer readable storage medium having stored thereon a program which when executed on a processor causes the processor to perform the method of any of claims 1-7.