CN113191434A

CN113191434A - Method and device for training risk recognition model

Info

Publication number: CN113191434A
Application number: CN202110493567.7A
Authority: CN
Inventors: 赵宇琦; 陈彪; 刘腾飞; 陆逊; 张梦娇; 陈佩弦
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2021-05-07
Filing date: 2021-05-07
Publication date: 2021-07-30

Abstract

The embodiment of the specification provides a method and a device for training a risk recognition model. According to the method of the embodiment, a newly added task is determined at first; then determining the similarity between the newly added task and an existing task corresponding to the trained first risk identification model; if the similarity meets a preset condition, modifying a structure in the first risk identification model according to the newly added task, wherein the modified structure comprises a newly added structure and/or a deleted structure; and finally, updating model parameters of the first risk identification model containing the modified structure by using the newly added task by adopting an incremental learning algorithm to obtain a second risk identification model.

Description

Method and device for training risk recognition model

Technical Field

One or more embodiments of the present specification relate to the technical field of computer application, and in particular, to a method and an apparatus for training a risk recognition model in the technical field of machine learning.

Background

With the rapid development of internet technology, people increasingly utilize the internet to perform communication, study and work, and even perform economic behaviors such as transaction, payment, account transfer, investment and the like through the internet. On the one hand these behaviors may present certain risks; on the other hand, some lawbreakers can easily perform lawbreakers by utilizing technical defects, legal defects and the like of the internet. These pose a threat to the security of network behavior. In order to improve safety, risk identification is required in the daily prevention and control process, and how to train a risk identification model with high efficiency and low consumption becomes a problem to be solved in the industry.

Disclosure of Invention

One or more embodiments of the present specification describe a method and apparatus for training a risk recognition model, so as to improve training efficiency and save computing resources.

According to a first aspect, there is provided a method of training a risk recognition model, comprising:

determining a newly added task;

determining the similarity between the newly added task and an existing task corresponding to the trained first risk identification model;

if the similarity meets a preset condition, modifying a structure in the first risk identification model according to the newly added task, wherein the modified structure comprises a newly added structure and/or a deleted structure;

and updating model parameters of the first risk identification model containing the modification structure by using the newly added task by adopting an incremental learning algorithm to obtain a second risk identification model.

In one embodiment, the newly added task includes at least one of:

adding sample data, adding a risk type, reducing a risk type or adding a characteristic.

In another embodiment, determining the similarity between the newly added task and the existing task corresponding to the trained first risk identification model includes:

determining the distance between the information value IV vector of the newly added task on the risk domain and the IV vector of the existing task on the risk domain, and determining the similarity between the newly added task and the existing task based on the distance; or,

determining chi-square values of the newly added task and the existing task in a risk domain, and determining similarity between the newly added task and the existing task according to the chi-square values, wherein the risk domain comprises risk types related to the newly added task and the existing task; or,

utilizing the newly added task training to obtain a first fully connected model and utilizing the existing task training to obtain a second fully connected model, wherein the parameter dimensions of the first fully connected model and the second fully connected model are the same; and determining the distance between the parameter vectors of the same structure parts in the first fully connected model and the second fully connected model, and determining the similarity between the newly added task and the existing task based on the distance.

In one embodiment, modifying structure in the first risk identification model in accordance with the newly added task includes:

if the newly added task comprises newly added sample data, adding a structure for extracting feature vector representation in the first risk identification model;

if the newly added task comprises a newly added risk type, adding a full-connection structure corresponding to the newly added risk type in the first risk identification model, or adding a full-connection structure corresponding to the newly added risk type and a feature attention structure corresponding to the full-connection structure;

if the newly added task comprises a risk reduction type, deleting a full-connection structure corresponding to the reduced risk type in the first risk identification model, or deleting the full-connection structure corresponding to the reduced risk type and a feature attention structure corresponding to the full-connection structure;

and if the newly added task comprises newly added features, adding neurons in a structure represented by the extracted feature vector in the first risk identification model.

In another embodiment, the updating the model parameters of the first risk identification model including the modified structure using the incremental learning algorithm with the new task includes:

updating parameters of only a modified structure in the first risk identification model by using the newly added task, and keeping parameters of other structures unchanged; or,

updating all model parameters in the first risk identification model by using the newly added task; or,

updating all model parameters of the first risk identification model containing the modified structure by using the new task in an L2 regularization mode; or,

and updating all model parameters of the first risk identification model containing the modified structure by using the new task in a mode of strengthening EWC by adopting plastic weight.

In one embodiment, the updating of all model parameters of the first risk identification model containing the modified structure by the new task in the way of L2 regularization comprises:

when the newly added task is used for updating all model parameters of the first risk identification model containing the modified structure, L2 is adopted for regularizing a loss function of the parameters;

a regularization term is superimposed in the loss function of the L2 regularization parameter, where the regularization term is determined by a square of a first difference value, and the first difference value is a difference value between a parameter after the current iteration and a parameter before the current iteration.

In another embodiment, updating all model parameters of the first risk identification model containing the modified structure with the new task by using the EWC comprises:

when the newly added task is used for updating all model parameters of the first risk identification model containing the modified structure, a Fisher regularized loss function is adopted;

wherein, a regular term is superimposed in the Fisher regularized loss function, and the regular term is determined by a Fisher information matrix and a square of a second difference value, where the second difference value is: and the difference value of the parameter after the iteration and the parameter of the first risk identification model before the newly added task.

According to a second aspect, there is provided a risk identification method comprising:

acquiring characteristic data of an object to be identified;

inputting the characteristic data of the object to be recognized into a second risk recognition model to obtain a risk recognition result of the object to be recognized;

wherein the second risk identification model is pre-trained using the method described above.

According to a third aspect, there is provided an apparatus for training a risk recognition model, comprising:

a task determination unit configured to determine a newly added task;

a similarity determining unit configured to determine similarity between the newly added task and an existing task corresponding to the trained first risk identification model;

a structure modification unit configured to modify a structure in the first risk identification model according to the newly added task if the similarity satisfies a preset condition, wherein the modified structure comprises a newly added structure and/or a deleted structure;

and the increment learning unit is configured to update model parameters of the first risk identification model containing the modification structure by using the newly added task by adopting an increment learning algorithm to obtain a second risk identification model.

In one embodiment, the newly added task includes at least one of:

In another embodiment, the similarity determination unit is specifically configured to:

In an embodiment, the structure modification unit is specifically configured to:

In another embodiment, the incremental learning unit is specifically configured to:

In one embodiment, the incremental learning unit is specifically configured to: when the newly added task is used for updating all model parameters of the first risk identification model containing the modified structure, L2 is adopted for regularizing a loss function of the parameters;

According to a fourth aspect, there is provided a risk identification apparatus comprising:

a feature acquisition unit configured to acquire feature data of an object to be recognized;

the risk identification unit is configured to input the characteristic data of the object to be identified into a second risk identification model to obtain a risk identification result of the object to be identified;

wherein the second risk identification model is pre-trained using the apparatus as described above.

According to a third aspect, there is provided a computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method of the first aspect.

According to the method and the device provided by the embodiment of the specification, under the condition that the similarity between the newly added task and the existing task meets the preset condition, incremental learning is performed by using the risk recognition model (namely the first risk recognition model) obtained by the existing task through training, so that the training efficiency is improved, and the computing resources are saved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 illustrates a flow diagram of a method of training a risk recognition model, according to one embodiment;

FIG. 2 illustrates a structural schematic of a risk identification model according to one embodiment;

FIG. 3 shows a schematic block diagram of an apparatus for training a risk recognition model according to one embodiment;

fig. 4 shows a schematic block diagram of a risk identification device according to an embodiment.

Detailed Description

The scheme provided by the specification is described below with reference to the accompanying drawings.

Risk identification typically involves a division of risk domains, i.e. risk types, such as risk of fraud, risk of theft, risk of cheating, risk of gambling, risk of money laundering, etc., depending on the business. The risk subjects are, for example, risk users, risk merchants, risk accounts, and the like. Although the risk level and nature are different, there are some common characteristics. If the experience and knowledge of each risk type can be utilized to realize the sharing of the algorithm and the model level, the efficiency of model training is effectively improved, and the computing resources are effectively saved.

Specific implementations of the above concepts are described below.

FIG. 1 illustrates a flow diagram of a method of training a risk recognition model, according to one embodiment. It is to be appreciated that the method can be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities. As shown in fig. 1, the method includes:

step 101, determining a newly added task.

The method is implemented based on a trained first risk identification model, and the first risk identification model is obtained by training with an existing task. The existing task is trained by using training data used by the existing task to obtain N risk categories, wherein N is a positive integer. If a new task exists currently, the new task is called as a new task in the embodiment of the present specification, a risk identification model corresponding to the new task needs to be trained. The idea of this embodiment is to adjust the trained first risk recognition model without retraining the risk recognition model for the newly added task, and efficiently obtain the risk recognition model corresponding to the newly added task (referred to as the second risk recognition model in this embodiment) by using the existing knowledge of the first risk recognition model.

It should be noted that the expressions "first", "second", and the like, referred to in the embodiments of the present application, are merely used for distinguishing by names, and do not have limitations in size, order, number, and the like.

From the whole life cycle of the model, the new task related to the risk identification can be at least one of new sample data, new risk types, risk reduction types or new features.

Adding the sample data means that the added task supplements the sample data compared with the existing task.

The newly added risk type means that the newly added task supplements the risk type compared to the existing task.

Reducing the risk type means that the newly added task deletes the risk type compared to the existing task.

The added features refer to the feature types added with model learning compared with the existing tasks.

For example, the new task may be adding the sample data and adding the risk type at the same time. For another example, the new task may be to replace the risk type, that is, to add the risk type while reducing the risk type. For another example, the new task may be to add the sample data and the feature. And so on.

The related information of the newly added task can be stored in the storage device after being input by the user through the input interface. The execution device in the embodiment of the present specification acquires the information related to the new task from the storage device. In addition, the execution device in the embodiment of the present specification may also obtain the information related to the new task from the server side. If the execution device in the embodiment of the present specification is located at the server, the information related to the newly added task may be acquired from the server.

And 103, determining the similarity between the newly added task and the existing task corresponding to the trained first risk identification model.

The knowledge transfer between the newly added task and the existing task needs to be established on the basis of the existence of correlation. If the two tasks are perfectly orthogonal in the parameter space, logically passing the indication anyway cannot improve the effect.

In this step, the similarity between the newly added task and the existing task may be determined by, but not limited to, the following methods:

the first mode is as follows: determining the distance between the IV (Information Value) vector of the newly added task on the risk domain and the IV vector of the existing task on the risk domain, and determining the similarity between the newly added task and the existing task based on the distance.

Suppose that the risk domain involved by the newly added task contains m risk types, and m is a positive integer. The IV value is calculated for each risk type using the following calculation:

where n is the number of training samples grouped for the newly added task, IV_iThe IV value for this feature in the ith packet.

Wherein, y_iFor the number of samples labeled as risky in the ith packet, y_sFor all the samples labeled as risky, n_iFor the number of samples in the ith packet, n_sIs the total number of samples.

And respectively calculating IV values aiming at the m risk types to obtain a vector, wherein the vector is the IV vector of the newly added task on the risk domain. In a similar way, an IV vector of the existing task on the risk domain can be obtained. By calculating the distance between the two IV vectors, the similarity between the newly added task and the existing task can be obtained.

The second mode is as follows: and determining the chi-square value of the newly added task and the existing task on the risk domain, and determining the similarity between the newly added task and the existing task according to the chi-square value. Here, the risk domain includes the risk types involved by the newly added task and the existing task.

Chi square value²The following calculation may be used:

and dividing each sample in the newly added task into k1 groups according to the risk type, dividing each sample in the existing task into k2 groups according to the risk type, and taking the sum of k1 and k2 as k in the calculation formula (3). f. of_jFor the number of samples labeled as at risk in the jth group, n is the total number of samples, p_jThe expected number of samples labeled as risky.

The larger the chi-square value is, the larger the deviation between the newly added task and the existing task is, and the smaller the similarity is; otherwise, the smaller the chi-square value is, the smaller the deviation between the newly added task and the existing task is, and the greater the similarity is.

Both the two methods are based on statistics to calculate the similarity, so that the method has the advantages of wide application range and objective similarity calculation result. Both of the above approaches are applicable to any type of task.

The third mode is as follows: a first full-connection model is obtained by utilizing newly added task training and a second full-connection model is obtained by utilizing existing task training, wherein the parameter dimensions of the first full-connection model and the second full-connection model are the same; and determining the distance between the parameter vectors of the same structural parts in the first fully connected model and the second fully connected model, and determining the similarity between the newly added task and the existing task based on the distance.

For the two fully connected models, the same structural part is a neuron structure, and the mapping layers may be different due to different optimization objectives, for example, the mapping layers in the multi-classification adopt a Softmax function, the mapping layers in the two-classification adopt a Sigmoid function, and the regression task does not need the mapping layers. The distance between the parameter vectors of the neuron structure parts in the two trained fully connected models represents the similarity between the two tasks.

The method is suitable for convex optimization tasks such as classification, regression, sorting and the like, and the convex optimization refers to a type of optimization problem in which an objective function and a loss function are convex functions. Although the method is not as objective as the former two methods, different optimization targets can be compared, that is, the similarity between the newly added task and the existing task can be determined even if the newly added task and the existing task adopt different optimization targets.

In the above three methods, the euclidean distance, the cosine distance, and the like may be used in calculating the distance between the vectors.

The first risk identification model and the information related to the existing task may be stored on a storage device or a server side. The execution device in this embodiment of the present specification may obtain the first risk identification model and the information related to the existing task from a storage device or a server. If the execution device in the embodiment of the present specification is located at a server, the first risk identification model and the information related to the existing task may be acquired from the server.

And 105, if the similarity meets a preset condition, modifying the structure in the first risk identification model according to the newly added task. Wherein, the modification structure comprises an added structure and/or a deleted structure.

The preset condition may be that a preset similarity threshold is exceeded, and the threshold may be set according to an empirical value or an experimental value.

For ease of understanding, the basic concepts of the risk identification model are first briefly described. The risk identification models referred to in this application may be categorical models, regression models, ranking models, and the like.

From the aspect of multi-task learning algorithm, the method mainly comprises share-bottom, Cross-stick, MMOE, SNR method and the like.

The Share-bottom method is a sharing layer formed by constructing full connection, has the advantages of simplicity and easiness in implementation, and is a sharing mode with the highest sharing degree.

Each task in the Cross-batch introduces other task weights for knowledge sharing, but the hyper-parameters between the tasks need to be manually set, and the method is a sharing mode with a low sharing degree.

MMOE (Multi-gate texture-of-Experts) is a type of multitask learning scheme with delicate design, is elegant, concise, few in parameters, high in efficiency and high in speed, and is more suitable for risk identification scenes in embodiments of the description. Under this scenario, as shown in fig. 2, the risk identification model will typically include several important parts:

the structure for extracting feature vector representation includes Expert0, Expert1, and Expert2 in fig. 2. The number of each unit structure in fig. 2 is illustrative and not intended to limit the present application. Each Expert contains a plurality of neurons for extracting features from the input data and obtaining a feature vector representation.

The feature attention structure, namely gateA and gateB in the figure, inputs data and simultaneously inputs the feature attention structure, and the feature attention structure outputs the probability that each Expert is selected. As shown in fig. 2, gateA outputs the probabilities that Expert0, Expert1, and Expert2 were selected, similar to weights, and then outputs the weighted sum of the outputs of Expert0, Expert1, and Expert2 to TowerA. gateB also outputs the probabilities that Expert0, Expert1, and Expert2 were selected, similar to weights, and then outputs the weighted sum of the outputs of Expert0, Expert1, and Expert2 to TowerB. Usually the gate corresponds to Tower, and the processing of gate is similar to the attention mechanism.

The fully connected structure, i.e., the TowerA and the TowerB in fig. 2, is used to map the weighted and summed feature vectors to the recognition result, i.e., to each risk type in this embodiment. For example, the TowerA output is the probability of risk type A and the TowerB output is the probability of risk type B.

It can be seen that the MMOE first extracts features through the Expert structure and then weights the Expert through the gate. Where the gate is also learned from the input data. And then the Tower carries out mapping according to the feature vector after weighted summation. In which an end-to-end optimization approach is used. The number of parameters is more in the expert structure, the rest parts keep the rapidness and the simplicity of the full-connection structure, and a balanced state is obtained in complexity and high efficiency.

SNR (Sub-Network Routing) is one of modification algorithms based on MMOE, and is different in that an expert parallel abstract structure is optimized into a structure with many parallel and serial coexists, and then the weight configuration of each subtask is determined in a way of finding path activation. The method has more dispersed structure, but is also overstaffed, and the convergence speed in the experiment is very slow.

Therefore, combining the above several multitask learning algorithms, the MMOE is preferred in the embodiments of this specification, but other algorithms may be used.

Returning to the step 105, when modifying the structure in the first risk identification model according to the newly added task, the following situations may be included:

if the new task includes new sample data, a structure for extracting feature vector representation may be added to the first risk identification model. Referring to fig. 2, when sample data is newly added, it is possible to increase the expert and improve the feature extraction capability.

If the newly added task includes a newly added risk type, a full connection structure corresponding to the newly added risk type and a feature attention structure corresponding to the full connection structure may be added to the first risk identification model. Referring to fig. 2, if a risk type is newly added, a Tower and a corresponding gate structure may be added, so as to identify the newly added risk type. However, in other algorithm architectures, such as the architecture of the Shared-Bottom algorithm, the gate structure is not included in the first risk model, and therefore, only the Tower structure may be added.

And if the newly added task comprises a risk reduction type, deleting a full connection structure corresponding to the reduced risk type and a feature attention structure corresponding to the full connection structure in the first risk identification model. Referring to fig. 2, if the risk type is reduced, the Tower and gate structure corresponding to the reduced risk type may be deleted. However, in other algorithm architectures, such as the architecture of the Shared-Bottom algorithm, the gate structure is not included in the first risk model, and thus only the Tower structure can be reduced.

And if the newly added task comprises newly added features, adding neurons in the structure for extracting the feature vector representation in the first risk identification model. Referring to fig. 2, if the new task includes a new feature, a neuron may be added to the expert to extract the new feature.

And 107, updating model parameters of the first risk identification model containing the modified structure by using the newly added task by adopting an incremental learning algorithm to obtain a second risk identification model.

The incremental learning algorithm involved in this step may take the following forms, but is not limited to:

the first mode is as follows: and (4) carrying out unconstrained updating.

Namely, all model parameters in the first risk identification model are updated by using the newly added task. I.e. equivalent to making all adjustments based on the first risk recognition model using the training data and training targets of the newly added task. The incremental learning process is changed into a new task data re-fitting process, and only the initialization parameters are the model parameters of the first risk identification model.

This approach fits well to newly added task data, but may forget the knowledge of the existing tasks.

The second mode is as follows: the frozen network is not updated.

Namely, only the modified structure in the first risk identification model is subjected to parameter updating by utilizing the newly added task, and the parameters of other structures are kept unchanged. That is, the model parameters of the shared part are maintained, and personalized learning is performed on the basis of the model parameters. This way the knowledge in the existing tasks can be completely preserved so that the existing tasks are not affected.

This way, the newly added task may obtain the knowledge of the existing task, but the existing task may not improve the effect according to the newly added task.

The third mode is as follows: l2 regularization mode.

In incremental learning, one class of methods is to set a penalty term to restrict the degree of network variation, and the L2 regular term is a very concise method. In the present specification, when all model parameters of the first risk identification model including the modified structure are updated by the newly added task, the loss function normalized by L2 is adopted. The L2 regularized loss function is superimposed with a regularization term, and the regularization term is determined by a parameter after the current iteration and a square of a parameter vector before the current iteration.

For example, the loss function of the L2 regularization parameter can be represented by the following calculation:

wherein,

is a regularized loss function, L is a loss function corrected by the iteration in the training iteration process_nAs a loss function before the present iteration,

is the kth parameter, θ, in the parameter vector after the iteration_kIs the kth parameter in the parameter vector before the iteration.

This approach can limit the degree of model parameter update, is easy to implement and fast to calculate, but cannot control the parameter variation direction.

The fourth mode is that: EWC (Elastic Weight adaptation) method.

EWC appears to solve the catastrophic forgetfulness of neural networks and is a typical incremental learning method. The idea is to use the thought of Bayesian conditional probability to resolve the total loss into the sum of the loss of the old model and the loss of the new model; by means of Laplace approximation, the loss difference generated by the independent training of new data and the mixed training of new and old data is approximated by a regular term containing Fisher Information Matrix, so that an effect equivalent to the mixed retraining of old data can be approximated, and Information in the old data cannot be forgotten. The EWC algorithm can be viewed as moving the parameters in the old model towards the overlapping empirical part of the old and new models.

In the embodiment of the present specification, when all model parameters of the first risk identification model including the modified structure are updated by using the adding task, the Fisher regularized loss function is adopted. And the Fisher regularized loss function is superposed with a regularization term, and the regularization term is determined by the square of the difference value between the parameter after the current iteration and the parameter of the first risk identification model before the newly added task and a Fisher information matrix.

This approach can limit the extent and direction of model parameter updates, consistent with strict mathematical reasoning.

To facilitate the understanding of EWC, the following reasoning process for the Fisher regularized loss function described above is introduced:

firstly, resolving a loss function by adopting Bayes law.

The training data set A of the first risk identification model (hereinafter referred to as the old model) obtained by training with the existing task cannot be obtained, and the parameter vector of the old model is represented as

The training data set for the newly added task is denoted as B, and the total data set is denoted as Σ.

For Σ, the training process of the model can be regarded as the process of maximizing the probability of the occurrence of the model parameter θ under the batch data, and is equivalent to solving:

meanwhile, the above equation (5) can be rewritten into by the bayesian total probability calculation equation:

log(p(θ|∑))＝log(p(∑|θ))+log(p(θ))-log(p(∑)) (6)

assuming that the training data sets a and B are independent of each other, there are:

log(p(θ|∑))＝log(p(B|θ))+log(p(θ|A))-log(p(B)) (8)

only p (θ | A) in the above equation is relevant to the old model.

And then an approximate expression is performed.

Firstly, firstly

Nearby pair

Performing second-order Taylor expansion:

first, neglecting the high-order infinitesimal quantity, because the derivative of the extreme point in the convex optimization problem is zero, the first derivative term is 0, and we can only consider the second derivative term as the approximate target. Then log (p (θ | a)) can be rewritten as:

wherein, the first term on the right of the equal sign is also constant, and only the second term needs to be considered.

Another concept is introduced here: laplace approximation, we observed that the standard form of the density function of a normal distribution is:

at this time, the above formula (10) is rewritten as follows:

wherein ε is exp (Δ), and Δ represents

At this time, the exponential term of p (θ | a) and the exponential term of the normal distribution density function have the same form, that is, Laplace adaptation of p (θ | a) is:

with the above reasoning we return to the original place to rewrite log (p (θ | ∑)) as follows:

it can be deduced that:

wherein,

in order to adopt Fisher regularized loss function, the loss function is modified in the current iteration in the training iteration process.

Lambda is a hyperparameter for the loss function before this iteration, which is used to balance the degree of learning new knowledge and preserving old knowledge. Theta is a parameter after the current iteration,

identification of the parameters of the model for the first risk before the newly added task, I_AAs a Fisher information matrix, i.e.

The derivation process described above is thus complete.

After the second risk identification model is obtained through training in the above manner provided in the embodiment of the present specification, risk identification can be performed by using the second risk identification model, so as to determine the risk type of the identified object.

Namely, the characteristic data of the object to be recognized is obtained, and the risk recognition of the object to be recognized can be obtained after the characteristic data of the object to be recognized is input into the second risk recognition model.

The object to be identified may be a user, a merchant, an account, or the like. The risk types may be, for example, fraud risks, theft risks, cheating risks, gambling risks, money laundering risks, etc. This is not intended to be exhaustive.

According to an embodiment of another aspect, an apparatus for training a risk recognition model is provided. FIG. 3 shows a schematic block diagram of the apparatus for training a risk recognition model according to one embodiment. It is to be appreciated that the apparatus can be implemented by any apparatus, device, platform, and cluster of devices having computing and processing capabilities. As shown in fig. 3, the apparatus 300 includes: a task determination unit 301, a similarity determination unit 302, a structure modification unit 303, and an incremental learning unit 304. The main functions of each component unit are as follows:

a task determination unit 301 configured to determine a newly added task.

A similarity determining unit 302 configured to determine similarity between the newly added task and an existing task corresponding to the trained first risk identification model.

A structure modifying unit 303 configured to modify structures in the first risk identification model according to the added task if the similarity satisfies a preset condition, the modified structures including added structures and/or deleted structures.

And an incremental learning unit 304, configured to perform model parameter updating on the first risk identification model including the modified structure by using an incremental learning algorithm and using an additional task to obtain a second risk identification model.

Wherein, the newly added task may include at least one of the following:

As an implementable manner, the similarity determination unit 302 may be specifically configured to: and determining the distance between the information value IV vector of the newly added task on the risk domain and the IV vector of the existing task on the risk domain, and determining the similarity between the newly added task and the existing task based on the distance.

As another realizable approach, the similarity determining unit 302 may be specifically configured to: and determining chi-square values of the newly added task and the existing task in a risk domain, and determining the similarity between the newly added task and the existing task according to the chi-square values, wherein the risk domain comprises the risk types related to the newly added task and the existing task.

As another implementable manner, the similarity determination unit 302 may be specifically configured to: a first full-connection model is obtained by utilizing newly added task training and a second full-connection model is obtained by utilizing existing task training, wherein the parameter dimensions of the first full-connection model and the second full-connection model are the same; and determining the distance between the parameter vectors of the same structural parts in the first fully connected model and the second fully connected model, and determining the similarity between the newly added task and the existing task based on the distance.

The structure modification unit 303 may be specifically configured to:

if the newly added task comprises a risk reduction type, deleting a full connection structure corresponding to the reduced risk type in the first risk identification model, or deleting the full connection structure corresponding to the reduced risk type and a feature attention structure corresponding to the full connection structure;

and if the newly added task comprises newly added features, adding neurons in the structure for extracting the feature vector representation in the first risk identification model.

As one implementable approach, the incremental learning unit 304 may be specifically configured to: and updating parameters of only the modified structure in the first risk identification model by using the newly added task, and keeping the parameters of other structures unchanged.

As another implementable approach, the incremental learning unit 304 may be specifically configured to: and updating all model parameters in the first risk identification model by using the newly added task.

As yet another implementable manner, the incremental learning unit 304 may be specifically configured to: and updating all model parameters of the first risk identification model containing the modified structure by using the newly added task in a mode of L2 regularization.

Specifically, the incremental learning unit 304 may regularize the loss function of the parameters using L2 when performing all model parameter updates for the first risk identification model containing the modified structure with the newly added task. The loss function of the L2 regularization parameter is superimposed with a regularization term, and the regularization term is determined by the square of a first difference value, where the first difference value is the difference value between the parameter after the current iteration and the parameter before the current iteration.

As a preferred approach, the incremental learning unit 304 may be specifically configured to: and updating all model parameters of the first risk identification model containing the modified structure by using a newly added task in an EWC (enhanced programmable logic controller) mode.

Specifically, the incremental learning unit 304 employs a Fisher regularized penalty function when updating all model parameters for the first risk identification model containing the modified structure with the newly added task. The Fisher regularized loss function is superposed with a regularization term, the regularization term is determined by a Fisher information matrix and the square of a second difference value, wherein the second difference value is as follows: and the difference value of the parameter after the iteration and the parameter of the first risk identification model before the newly added task.

According to an embodiment of another aspect, a risk identification device is provided. Fig. 4 shows a schematic block diagram of the risk identification means according to an embodiment. It is to be appreciated that the apparatus can be implemented by any apparatus, device, platform, and cluster of devices having computing and processing capabilities. As shown in fig. 4, the apparatus 400 includes: a feature acquisition unit 401 and a risk identification unit 402.

A feature acquisition unit 401 configured to acquire feature data of an object to be recognized.

And a risk identification unit 402 configured to input the feature data of the object to be identified into the second risk identification model, and obtain a risk identification result of the object to be identified.

Wherein the second risk identification model is pre-trained using the apparatus shown in fig. 3.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 1.

According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method described in connection with fig. 1.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. A method of training a risk recognition model, comprising:

determining a newly added task;

2. The method of claim 1, wherein the new added task comprises at least one of:

3. The method of claim 1, wherein determining the similarity between the newly added task and an existing task corresponding to the trained first risk identification model comprises:

4. The method of claim 2, wherein modifying structure in the first risk identification model in accordance with the newly added task comprises:

5. The method of claim 1, wherein updating model parameters of the first risk identification model including the modified structure with the new task using an incremental learning algorithm comprises:

6. The method of claim 1, wherein updating all model parameters of the first risk identification model containing the modified structure with the new task in an L2 regularized manner comprises:

7. The method of claim 1, wherein updating all model parameters of the first risk identification model containing the modified structure with the newly added task in an EWC manner comprises:

8. A risk identification method, comprising:

acquiring characteristic data of an object to be identified;

wherein the second risk identification model is pre-trained using the method of any one of claims 1 to 7.

9. An apparatus for training a risk recognition model, comprising:

a task determination unit configured to determine a newly added task;

10. The apparatus of claim 9, wherein the new added task comprises at least one of:

11. The apparatus according to claim 9, wherein the similarity determining unit is specifically configured to:

12. The apparatus according to claim 10, wherein the structure modification unit is specifically configured to:

13. The apparatus of claim 9, wherein the incremental learning unit is specifically configured to:

14. The apparatus of claim 13, wherein the incremental learning unit is specifically configured to: when the newly added task is used for updating all model parameters of the first risk identification model containing the modified structure, L2 is adopted for regularizing a loss function of the parameters;

15. The apparatus of claim 13, wherein the incremental learning unit is specifically configured to:

16. A risk identification device comprising:

wherein the second risk identification model is pre-trained using the apparatus of any one of claims 9 to 15.

17. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any of claims 1-8.