CN110659744B

CN110659744B - Training event prediction model, and method and device for evaluating operation event

Info

Publication number: CN110659744B
Application number: CN201910916976.6A
Authority: CN
Inventors: 宋博文; 朱勇椿; 陈帅; 顾曦
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2019-09-26
Filing date: 2019-09-26
Publication date: 2021-06-04
Anticipated expiration: 2039-09-26
Also published as: CN110659744A

Abstract

The embodiment of the specification provides a method and a device for training an event prediction model and performing event evaluation. In the training method, firstly, a training sample set is obtained, wherein the training sample set comprises a large number of source domain samples and a small number of target domain samples; and inputting each sample into an event prediction model, wherein the model comprises a source domain extractor, a target domain extractor and a shared extractor, the source domain sample is processed by the source domain extractor and the shared extractor, and the target domain sample is processed by the target domain extractor and the shared extractor. Then, based on the processed sample feature vector, the classification category of the current sample is predicted, and the classification loss is obtained according to the classification category. In another aspect, the domain adaptation loss is also determined based on a first characterization of each source domain sample at a model-specific network layer and a second characterization of the target domain sample at that network layer. Thus, the event prediction model is updated and trained in the direction of the total loss reduction consisting of classification loss and domain adaptation loss.

Description

Training event prediction model, and method and device for evaluating operation event

Technical Field

One or more embodiments of the present specification relate to the field of machine learning, and more particularly, to training an event prediction model using machine learning, and a method and apparatus for evaluating an operational event using the model.

Background

In many scenarios, analysis and processing of user operation behaviors or operation events are required. For example, in order to identify high-risk operation behaviors which may threaten network security or user information security, such as account stealing, traffic attack, fraudulent transaction and the like, the risk degree of the user operation behaviors can be evaluated so as to perform risk prevention and control.

To assess the risk of an operational behavior, an analysis may be performed based on characteristics of the operational behavior itself. Further, the behavior sequence of the user can be more fully considered. The behavior sequence is the occurrence process of a series of events such as clicking, accessing, purchasing and the like generated in daily operation and use by a user, can be represented as a time sequence of an event set, contains the characteristics of fine-grained habit preference and the like of the user, and is convenient for analyzing the operation history and the operation mode of the user more comprehensively. However, both operational events and behavior sequence data face the problem of feature characterization and characterization, i.e., extracting representative aggregate features from a huge feature space for characterizing the risk of operational events. The feature extraction work is often performed empirically by a business person. However, it is understood that manual feature engineering is extremely labor and time consuming, and the effect is heavily dependent on manual business experience and efficiency, and there is also a risk of security leakage.

In some schemes, the work of feature extraction is also finished through machine learning, namely, a large number of relevant features are input into a model, model training is carried out through labeled data, and extraction and combination of features are automatically learned. This puts high demands on the model design. In addition, such model training is difficult in a region where labeling data is rare.

Accordingly, improved approaches are desired for more accurately and efficiently analyzing operational events to facilitate risk prevention and control.

Disclosure of Invention

One or more embodiments of the present disclosure describe a method and an apparatus for training an event prediction model and evaluating an operation event, in which a source domain sample with rich data and a target domain sample with relatively sparse data are used to train and obtain an event prediction model applicable to both a source domain and a target domain, thereby comprehensively improving accuracy and efficiency of event classification prediction.

According to a first aspect, there is provided a method of training an event prediction model, the method comprising:

obtaining a training sample set, wherein the training sample set comprises a first number of source domain samples and a second number of target domain samples, the first number is larger than the second number, and each sample has a corresponding classification label; each source domain sample comprises a source domain event sequence formed by a plurality of source domain events, and each target domain sample comprises a target domain event sequence formed by a plurality of target domain events;

inputting each sample as a current sample into an event prediction model, wherein the event prediction model at least comprises a source domain feature extractor, a target domain feature extractor, a shared feature extractor and a classifier, and when the current sample is a source domain sample, the source domain feature extractor is adopted to perform feature extraction on the source domain sample to obtain a source domain feature representation; performing feature extraction on the source domain sample by using the shared feature extractor to obtain a first feature representation; obtaining a sample feature vector of a source domain sample according to the source domain feature representation and the first feature representation;

when the current sample is a target domain sample, performing feature extraction on the target domain sample by using the target domain feature extractor to obtain target domain feature representation; performing feature extraction on the target domain sample by using the shared feature extractor to obtain a second feature representation; obtaining a sample feature vector of the target domain sample according to the target domain feature representation and the second feature representation;

predicting the event category of the current sample by using the classifier based on the sample feature vector of the current sample to obtain a prediction result;

determining classification loss according to the prediction result of each sample and the corresponding classification label;

determining a domain adaptation loss according to a first characterization of each source domain sample in a specific network layer of the event prediction model and a second characterization of each target domain sample in the specific network layer;

determining a total loss according to the classification loss and the domain adaptive loss;

updating the event prediction model in the direction of the total loss reduction.

In one embodiment, each source domain event has a first plurality of attributes; each target domain event having a second plurality of attributes, wherein there is an intersection of the first plurality of attributes and the second plurality of attributes;

in such a case, the source domain feature extractor is configured to perform feature extraction in a first feature space corresponding to the first plurality of attributes; the target domain feature extractor is used for extracting features in a second feature space corresponding to the second multiple attributes;

and the shared feature extractor is used for extracting features in a shared feature space, wherein the shared feature space corresponds to a union of the first multiple attributes and the second multiple attributes.

In a further embodiment, the feature extraction process of the shared feature extractor on the source domain samples may include:

filling attribute values of a first plurality of attributes of each source domain event in the source domain sample into fields corresponding to the first plurality of attributes in the shared feature space, filling other fields with default values to obtain a first attribute representation of the source domain event in the shared feature space, and extracting features according to the first attribute representation;

the process of the shared feature extractor for extracting the features of the target domain samples may include:

and filling the attribute values of the second multiple attributes of each target domain event in the target domain sample into fields corresponding to the second multiple attributes in the shared feature space, filling the other fields with default values to obtain a second attribute representation of the target domain event in the shared feature space, and extracting features according to the second attribute representation.

According to one implementation mode, the source domain feature extractor, the target domain feature extractor and the shared feature extractor are double-layer feature extractors with different parameters and the same structure, and each double-layer feature extractor comprises a coding layer, a first embedding layer and a second embedding layer; wherein the content of the first and second substances,

the encoding layer is used for encoding a plurality of items of attribute information of each event in a current event sequence corresponding to an input current sample into a plurality of corresponding encoding vectors;

the first embedding layer is used for carrying out first combination on the plurality of coding vectors of each event to obtain each event vector corresponding to each event;

and the second embedded layer is used for carrying out second combination on the event vectors to obtain the feature representation corresponding to the current event sequence.

Further, in one embodiment, the first combining by the first embedding layer comprises an inter-vector combining operation of order N involving multiplication of N encoded vectors, where N > 2.

In an embodiment, the second embedding layer includes a time-series-based neural network, and is configured to sequentially perform iterative processing on the event vectors to obtain the feature representation corresponding to the current event sequence.

In another embodiment, the second combination employed by the second embedding layer comprises an M-th order inter-vector combining operation involving multiplication of M event vectors, where M > -2.

According to one embodiment, a sample feature vector of source domain samples is obtained by: carrying out weighted combination on the source domain feature representation and the first feature representation by utilizing a first weight distribution factor to obtain a sample feature vector of the source domain sample;

obtaining a sample feature vector of the target domain sample by the following method: and carrying out weighted combination on the target domain feature representation and the second feature representation by using a second weight distribution factor to obtain a sample feature vector of the target domain sample.

In one embodiment, the domain adaptation loss is determined as follows:

and determining the domain adaptation loss according to the distribution difference between the first characterization of each source domain sample with each classification label in the specific network layer and the second characterization of each target domain sample with the corresponding classification label in the specific network layer.

Further, in one embodiment, determining the domain adaptation loss comprises:

obtaining a first representation of each source domain sample with any first classification label at the specific network layer;

obtaining a second representation of each target domain sample with the first classification label at the specific network layer;

determining the same-class distance corresponding to the first classification according to the distribution difference of the first characterization and the second characterization;

and determining the domain adaptation loss as being proportional to the sum of the homogeneous distances corresponding to the various classes.

Further, in an example, the specific network layer is a predicted value output layer in the classifier, the first characteristic is a first predicted value, and the second characteristic is a second predicted value; in such a case, determining the same-class distance corresponding to the first classification according to the distribution difference between the first characterization and the second characterization may specifically include:

determining a first mean value of the first predicted values of the source domain samples with the first classification labels;

determining a second mean value of the second predicted values of the target domain samples with the first classification label;

and determining the same-class distance corresponding to the first classification according to the difference between the first average value and the second average value.

In another example, the first token is a first vector token; the second characterization is a second vector characterization; in such a case, determining the same-class distance corresponding to the first classification according to the distribution difference between the first characterization and the second characterization may specifically include:

determining a first average vector of respective first vector representations of respective source domain samples having a first class label;

determining a second average vector of respective second vector representations of respective target domain samples having the first classification label;

and determining the same-class distance corresponding to the first classification according to the norm distance of the first average vector and the second average vector.

In one embodiment, determining the domain adaptation loss may further comprise:

obtaining a third representation of each target domain sample at the particular network layer with a second class label, the second class label being different from the first class label;

determining an inter-class distance between the first class and the second class according to the distribution difference of the first characterization and the third characterization;

the domain adaptation loss is determined to be inversely proportional to a sum of the inter-class distances between the respective different classes.

According to a second aspect, there is provided a method of evaluating a user operated event, the method comprising:

acquiring a first event sequence, wherein the first event sequence comprises a current operation event to be evaluated and at least one historical operation event, and each operation event is a target domain event;

obtaining an event prediction model obtained by training according to the method of the first aspect, wherein the event prediction model comprises a trained source domain feature extractor, a trained target domain feature extractor, a trained shared feature extractor and a trained classifier;

performing feature extraction on the first event sequence by adopting the target domain feature extractor to obtain a target domain feature representation; extracting the characteristics of the first event sequence by adopting the shared characteristic extractor to obtain shared characteristic representation;

obtaining a sequence feature vector of the first event sequence according to the target domain feature representation and the shared feature representation;

predicting, with the classifier, an event class of a current operational event in the first sequence of events based on the sequence feature vector.

In one embodiment, the sequence feature vector for the first sequence of events is obtained as follows: carrying out weighted combination on the target domain feature representation and the shared feature representation by using a weight distribution factor to obtain the sequence feature vector;

the method may further include outputting the weight assignment factor to indicate an impact of the target domain feature extractor and the shared feature extractor on the prediction result.

Further, in one embodiment, the weight distribution factor is determined by a training process of the event prediction model.

According to a third aspect, there is provided an apparatus for training an event prediction model, the apparatus comprising:

a sample set obtaining unit configured to obtain a training sample set, wherein the training sample set includes a first number of source domain samples and a second number of target domain samples, the first number is greater than the second number, and each sample has a corresponding classification label; each source domain sample comprises a source domain event sequence formed by a plurality of source domain events, and each target domain sample comprises a target domain event sequence formed by a plurality of target domain events;

a processing unit configured to input, as a current sample, each sample into an event prediction model including at least a source domain feature extractor, a target domain feature extractor, a shared feature extractor, and a classifier,

when the current sample is a source domain sample, performing feature extraction on the source domain sample by using the source domain feature extractor to obtain a source domain feature representation; performing feature extraction on the source domain sample by using the shared feature extractor to obtain a first feature representation; obtaining a sample feature vector of a source domain sample according to the source domain feature representation and the first feature representation;

the prediction unit is configured to predict the event category of the current sample based on the sample feature vector of the current sample by using the classifier to obtain a prediction result;

a first loss determination unit configured to determine a classification loss according to the prediction result of each sample and the corresponding classification label;

a second loss determination unit configured to determine a domain adaptation loss according to a first characterization of each source domain sample in a specific network layer of the event prediction model and a second characterization of each target domain sample in the specific network layer;

a total loss determination unit configured to determine a total loss according to the classification loss and the domain adaptation loss;

an updating unit configured to update the event prediction model in a direction in which the total loss decreases.

According to a fourth aspect, there is provided an apparatus for evaluating a user operation event, the apparatus comprising:

the event sequence acquiring unit is configured to acquire a first event sequence, wherein the first event sequence comprises a current operation event to be evaluated and at least one historical operation event, and each operation event is a target domain event;

a model obtaining unit, configured to obtain an event prediction model obtained by training the apparatus of the third aspect, where the event prediction model includes a trained source domain feature extractor, a trained target domain feature extractor, a trained shared feature extractor, and a trained classifier;

the feature extraction unit is configured to perform feature extraction on the first event sequence by using the target domain feature extractor to obtain a target domain feature representation; extracting the characteristics of the first event sequence by adopting the shared characteristic extractor to obtain shared characteristic representation;

the vector acquisition unit is configured to obtain a sequence feature vector of the first event sequence according to the target domain feature representation and the shared feature representation;

a prediction unit configured to predict an event class of a current operation event in the first sequence of events based on the sequence feature vector using the classifier.

According to a fifth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first and second aspects.

According to a sixth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and wherein the processor, when executing the executable code, implements the methods of the first and second aspects.

According to the method and the device provided by the embodiment of the specification, under the condition that the number of target domain samples is small, the source domain samples with rich sample data are utilized in a transfer learning mode to perform differentiated unified training, and the event prediction model suitable for the source domain and the target domain is obtained. Specifically, the event prediction model includes an active domain model part, a shared model part and a target domain model part. In the training process, because the source domain samples are rich, the source domain model part can quickly establish applicable model parameters. The shared model part processes both the source domain samples and the target domain samples, and model parameters obtained by training the source domain data can be transferred to the target domain part. By combining the domain adaptive losses set for the source domain and the target domain characterization, the target domain model part can obtain the characteristic characterization similar to that of the source domain, so that an applicable event prediction model is obtained by training based on a small amount of target domain data. The event prediction model can be used for evaluating operation events of a source domain or a target domain.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram illustrating an implementation scenario of an embodiment disclosed herein;

FIG. 2 illustrates a flow diagram of a method of training an event prediction model, according to one embodiment;

FIG. 3 illustrates a schematic structural diagram of an event prediction model according to one embodiment;

FIG. 4 shows a schematic structural diagram of a two-layer feature extractor according to one embodiment;

FIG. 5 shows a schematic structural diagram of a two-layer feature extractor according to another embodiment;

FIG. 6 illustrates a method of evaluating a user-operated event according to one embodiment;

FIG. 7 shows a schematic block diagram of an apparatus to train an event prediction model according to one embodiment;

FIG. 8 shows a schematic block diagram of an apparatus to evaluate an operational event according to one embodiment.

Detailed Description

The scheme provided by the specification is described below with reference to the accompanying drawings.

As previously mentioned, characterization and characterization of operational events is important in order to evaluate them. To avoid the drawbacks of manual feature engineering, feature characterization and event evaluation are learned through modeling and model training. However, as known to those skilled in the art, model training relies on a large amount of annotation data. In some areas where annotation data is sparse, such model training and learning is difficult.

In view of the above problem, in the embodiment of the present specification, a migration learning manner is used, and model training is performed by using labeled data of a similar domain with a richer data volume, so that a trained model can be used in a domain with a smaller data volume. Generally, a domain with a richer data amount may be referred to as a source domain, and a domain to be analyzed and learned, but a domain with a smaller data amount is referred to as a target domain.

For example, in one scenario, a user's interaction events in a customer service platform need to be analyzed. Assuming that the hot-line customer service platform is started for a long time and accumulates a large amount of data, and the data of the online customer service platform to be analyzed is rare due to the fact that the online customer service platform is on line soon, and the data of the online customer service platform have certain similarity, the hot-line customer service platform can be used as a source domain, and the online customer service platform can be used as a target domain. For another example, in another scenario, there is a need to analyze the operational events of users in different areas of a certain service platform. Assuming that the east China has a long service starting time and more accumulated data, and the north China to be analyzed has a short service opening time and rare data, the east China can be used as a source domain and the north China can be used as a target domain.

Because the source domain data is rich, in the conventional transfer learning, a model is usually trained based on the source domain data, then the target domain data is fitted with the source domain data by generating countermeasures and the like, and the model suitable for the target domain is obtained by multi-step multi-stage training. Unlike the conventional migration learning described above, in the embodiment disclosed in the present specification, the source domain data and the target domain data are uniformly trained in a differentiated manner, so that the event prediction model suitable for both the source domain and the target domain is obtained quickly and efficiently.

Fig. 1 shows a schematic illustration of an implementation scenario according to an embodiment. As shown in fig. 1, historical data from a source domain and a target domain is collected as a training sample set to train an event prediction model. More specifically, the training sample set includes a large number of source domain samples and a relatively small number of target domain samples, each sample including an event sequence of historical events of a corresponding domain.

The event prediction model can be divided into a source domain part, a shared part and a target domain part.

In the training process, the source domain samples are input into the source domain part and the sharing part for comprehensive processing, the target domain samples are input into the sharing part and the target domain part for comprehensive processing, the prediction loss is obtained according to the comprehensive processing result of the two domain samples, and the whole event prediction model is trained according to the prediction loss.

In the process, because the source domain samples are rich, the source domain part can quickly establish applicable model parameters. The sharing part processes both the source domain samples and the target domain samples, so that the effect of transferring model parameters obtained by training aiming at the source domain data to the target domain part can be achieved, and a model suitable for a target domain is obtained by training based on a small amount of target domain data.

After the event prediction model is obtained through training in the training mode, the model can be used for analyzing and evaluating an event sequence to be evaluated in a target domain. Specifically, the target domain event sequence to be evaluated may be input to the sharing part and the target domain part of the event prediction model, and an event evaluation result for the event sequence, such as an event classification result, may be output according to a comprehensive processing result of the two parts, and more specifically, may be a risk classification result.

The following describes the training process and model structure of the above event prediction model in detail.

FIG. 2 illustrates a flow diagram of a method of training an event prediction model, according to one embodiment. It is to be appreciated that the method can be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities. As shown in fig. 2, the training process includes at least the following steps.

First, in step 201, a training sample set is obtained, wherein the training sample set includes a first number of source domain samples and a second number of target domain samples, and the first number is greater than the second number.

It will be appreciated that the source domain and the target domain may depend on the business scenario to be analyzed. Generally, the source domain is a domain with rich data, and the target domain is a domain to be analyzed but with sparse data. For example, in one example, the source domain is a hotline service platform and the target domain is an online service platform; alternatively, in another example, the source domain is east China data and the target domain is North China data.

Since the data source of the source domain is richer, the number of source domain samples, i.e. the first number, in the training sample set formed by sample acquisition is generally much larger than the second number of target domain samples. Typically, the first number will be N times larger in size than the second number, e.g. N is between 5 and 15. More specifically, in one example, the number of source domain samples is 10 times the number of target domain samples.

For training the event prediction model, each training sample comprises a sequence of events comprising a predetermined number (e.g., 10 or 20) of historical events<E₁,E₂,…,E_n>And a classification tag labeled for the sequence of events. The classification label may be for the entire sequence of events, or for the last event E in the sequence of events_nFor example, a classification category of a fraudulent or non-fraudulent event, a classification category of an event risk level, etc.

More specifically, each source domain sample comprises a source domain event sequence S consisting of a plurality of source domain events, i.e.,

each historical event in the sequence S

Are all source domain events; and each oneThe target domain sample comprises a target domain event sequence T consisting of a plurality of target domain events, i.e.,

each historical event in the sequence T

Are all target domain events.

Each event may have a plurality of items of attribute information related to the event. For comprehensiveness of event evaluation, fine-grained comprehensive attribute information of each event can be acquired for subsequent processing. These attribute information may include a behavior type of a user operation behavior (e.g., a login operation, a recharge operation, a payment operation, a transaction operation, etc.), a behavior time, device information used by the user operation (e.g., a device model, a MAC address, an IP address, etc.), information on software used, such as a browser type, an app version, etc. If the operational behavior is a transaction behavior, the attribute information may also include a related amount of the transaction behavior, a payment channel, a transaction object, and so on. In one embodiment, the event attribute information may also include operation results of historical operation events, such as operation success, failure, timeouts, and the like.

In general, the attribute information data of the source domain event and the target domain event have a certain similarity. In one embodiment, the source domain event and the target domain event have identical attribute fields, except that there is a difference in the distribution of attribute values for some of the attribute fields. For example, the source domain event and the target domain event each contain an attribute ABCDE, where attribute A is the user's age, attribute B is the model of the device being used, and so on. If the source domain event and the target domain event are from different user populations, then the two types of events differ in the attribute value distribution of attribute A and attribute B.

In one embodiment, the source domain event and the target domain event have partially identical attribute fields and also have partially unique attribute fields. In particular, the source domain event may have a first plurality of attributes, such as the attribute ABCDE, and the target domain event may have a second plurality of attributes, such as the attribute CDEFG, where the first and second plurality of attributes intersect, such as CDE. More specifically, in an example where the source domain event is a service event in east China and the target domain event is a service event in north China, an attribute intersection (e.g., CDE) of the source domain event and the target domain event may be an attribute common to the two service events, such as user equipment information, event occurrence time, and the like; attributes unique to source domain events (e.g., attribute AB) may relate to service content provided only in the eastern region of china, while attributes unique to target domain events (e.g., attribute FG) may relate to service content provided only in the northern region of china.

Thus, the attribute information of each source domain event in the source domain event sequence is collected to form a sample characteristic, and a source domain sample is formed by combining the classification label of the source domain event sequence. Similarly, the attribute information of each target domain event in the target domain event sequence is collected to form a sample characteristic, and a target domain sample is formed by combining the classification label of the target domain event sequence. The first number of source domain samples and the second number of target domain samples together constitute a training sample set.

Next, in step 202, the samples in the training sample set are sequentially input to the event prediction model as current samples.

FIG. 3 illustrates a structural schematic of an event prediction model according to one embodiment. As shown in fig. 3, the event prediction model includes at least a source domain feature extractor 31, a shared feature extractor 32, a target domain feature extractor 33, a source domain attention layer 34, a target domain attention layer 35, and a classifier 36. The following steps in the training process are described with reference to the block diagram of fig. 3.

For the current sample input to the event prediction model, as shown in step 203 in fig. 2, it is necessary to distinguish the sample as a source domain sample or a target domain sample.

If the current sample is a source domain sample, it is input to the source domain feature extractor 31 and the shared feature extractor 32 at step 204. Specifically, a source domain feature extractor 31 is adopted to perform feature extraction on the source domain sample to obtain a source domain feature representation; performing feature extraction on the source domain sample by using a shared feature extractor 32 to obtain a first feature representation; through the source domain attention layer 34, a sample feature vector of the source domain sample is obtained according to the source domain feature representation and the first feature representation.

If the current sample is a target domain sample, it is input to the shared features extractor 32 and the target domain features extractor 33 at step 205. Specifically, a target domain feature extractor 33 is adopted to perform feature extraction on the target domain sample to obtain a target domain feature representation; performing feature extraction on the target domain sample by using a shared feature extractor 32 to obtain a second feature representation; and obtaining a sample feature vector of the target domain sample according to the target domain feature representation and the second feature representation through the target domain attention layer 35.

In one embodiment, the source domain feature extractor 31, the shared feature extractor 32, and the target domain feature extractor 33 each perform feature extraction in their corresponding feature spaces.

Further, in one example, the source domain event and the target domain event have identical attribute fields, such as the attribute ABCDE. In this case, the source domain feature extractor 31, the shared feature extractor 32, and the target domain feature extractor 33 each perform feature extraction in a feature space corresponding to the attribute ABCDE, and only model parameters to be adopted when performing extraction computation may be different.

In another example, as previously described, the source domain event has a partially common attribute with the target domain event, and in addition thereto has a partially unique attribute. Specifically, the source domain event has a first plurality of attributes, such as attribute ABCDE, the target domain event has a second plurality of attributes, such as attribute CDEFG, and there is an intersection between the first plurality of attributes and the second plurality of attributes, such as CDE. In such a case, the source domain feature extractor 31 may perform feature extraction in a first feature space corresponding to the first plurality of attributes; the target domain feature extractor 33 may perform feature extraction in a second feature space corresponding to the second plurality of attributes. The shared feature extractor 32 may perform feature extraction in a shared feature space, where the shared feature space corresponds to a union of the first plurality of attributes and the second plurality of attributes, for example, the union is ABCDEFG.

More specifically, in step 204, when the shared feature extractor 32 is used to perform feature extraction on the source domain samples, the following processing manner may be adopted. First, the attribute value (e.g., ABCDE) of the first multiple-item attribute (e.g., ABCDE) of each source domain event in the source domain sample is filled into the field (e.g., the first 5 fields) corresponding to the first multiple-item attribute in the shared feature space (corresponding to ABCDEFG), and the remaining fields are filled with a default value (e.g., 0), resulting in a first attribute representation (e.g., ABCDE00) of the source domain event in the shared feature space. The first attribute representation is then subjected to a feature extraction operation by the shared feature extractor 32.

Similarly, in step 205, when the shared feature extractor 32 is used to perform feature extraction on the target domain sample, the following processing manner may be adopted. And filling the attribute value (e.g. CDEFG) of the second multiple attributes (e.g. CDEFG) of each target domain event in the target domain sample into the fields (e.g. the last 5 fields) corresponding to the second multiple attributes in the shared feature space (corresponding to ABCDEFG), and filling the rest fields with a default value (e.g. 0) to obtain a second attribute representation (e.g. 00CDEFG) of the target domain event in the shared feature space. The second attribute representation is then subjected to a feature extraction operation by the shared feature extractor 32.

In terms of the processing procedure adopted by the feature extraction operation, in one embodiment, the source domain feature extractor 31, the shared feature extractor 32, and the target domain feature extractor 33 may be feature extractors with different parameters and the same structure, and perform feature extraction by using the same extraction algorithm. For example, the three feature extractors 31,32, and 33 may be implemented by using deep neural networks DNN with the same number of layers and the same algorithm.

More specifically, in one embodiment, the source domain feature extractor 31, the shared feature extractor 32, and the target domain feature extractor 33 employ two-level feature extractors with the same structure for feature extraction.

FIG. 4 shows a schematic structural diagram of a two-layer feature extractor according to one embodiment. As shown in fig. 4, the dual-layer feature extractor includes at least an encoding layer 41, a first embedding layer 42, and a second embedding layer 43.

When a current sample (source domain sample or target domain sample) is input to the two-layer feature extractor shown in FIG. 4, the coding layer 41 is directed to the current event sequence corresponding to the current sample<E₁,E₂,…,E_n>Each event E in_i(source domain event or target domain event) and encoding a plurality of items of attribute information of the event into a corresponding plurality of encoding vectors.

The encoding process of the attribute information may correspond to a feature space of the feature extractor, that is, encode the attribute information of the input event into a plurality of encoding vectors corresponding to dimensions of the feature space.

For example, when applied to a source domain feature extractor, for each source domain event, the encoding layer 41 encodes attribute information for a first plurality of attributes of the source domain event into a first plurality of encoding vectors; when applied to the target domain feature extractor, the encoding layer 41 encodes, for each target domain event, attribute information for a second plurality of attributes of the target domain event into a second plurality of encoding vectors. When applied to a shared feature extractor, for each event, whether target domain or source domain, the encoding layer 41 encodes attribute information for the event as a plurality of encoding vectors corresponding to dimensions of a shared feature space.

The coding layer 41 may be coded in a variety of coding schemes.

In one embodiment, a mapping table or lookup table is pre-constructed in the encoding layer 41, in which the mapping relationship between different attribute values and encoding vectors under each attribute is recorded. And for each attribute of the input event, searching the mapping table according to the specific attribute value to obtain a corresponding coding vector.

In one embodiment, the encoding layer 41 may encode an item of attribute information into an encoding vector by using a one-hot encoding method. The One-hot encoding method is suitable for attribute information with a plurality of limited attribute values.

For attribute information with unlimited attribute values, such as attribute fields containing text descriptions, in an embodiment, the coding layer 41 may also use a more complex neural network to perform word embedding to obtain a coding vector corresponding to the attribute information.

The coding layer 41 encodes each event E by adopting various coding modes_iThe plurality of items of attribute information are encoded into a corresponding plurality of encoding vectors.

Then, the first embedding layer 42 performs a first combination on the plurality of encoding vectors of each event, so as to obtain each event vector corresponding to each event.

In one embodiment, the first combination comprises a weighted linear combination of the plurality of code vectors.

According to one embodiment, the first combining further comprises an inter-vector combining operation of order N involving multiplication of N encoded vectors, where N > -2.

It is understood that the combination of feature vectors in conventional neural networks is generally in a linear combination manner. However, in the case that an event contains multiple items of attribute information, the attribute information is sometimes not completely independent, but has a certain dependency relationship or association relationship, and a simple linear combination is not enough to find and process such an association relationship. Thus, in one embodiment, the framework of an FM (factorization machine) factorizer is used to introduce high-order inter-vector combining operations in the first embedding layer 42.

The combining operation between N-order vectors relates to the multiplication operation of N code vectors, so that the association relation between the N code vectors can be characterized. The order N is a preset hyper-parameter, and may be set to 2, 3, or 4, for example.

For example, in one specific example, the first embedding layer 42 may also combine the 2 rd order vectors and the 3 rd order vectors for event E based on linear combination_iOf each code vector f_iThe operation is carried out to obtain an event vector A shown in the following formula (1)_i。

In the above formula (1), the first and second,

for a linear combination of the n code vectors,

is a 2 nd order inter-vector combining operation, which involves the multiplication of 2 code vectors,

is a 3 rd order vector combining operation, which involves the multiplication of 3 code vectors. The multiplication of the encoded vectors in the higher order operation may be performed by bit-wise multiplication, and the result is still a vector. Also, it should be understood that the weighting factors of the terms in the above formula (1) include the linear weighting factor w_i2-order weight coefficient w_ij3-order weight coefficient w_ijkAre determined by training of the neural network.

In one embodiment, the first vector combination shown in the above formula (1), for example, may be modified, for example, linear combination terms are omitted, or partial higher-order inter-vector combination terms are omitted, so as to obtain more transformation implementations of the first vector combination.

At the first embedding layer 42 for a sequence of events<E₁,E₂,…,E_n>In each event E_iGet the corresponding event vector A_iBased on the event vectors A, the second embedding layer 43 maps the respective event vectors A_iAnd carrying out second combination to obtain the characteristic representation corresponding to the current event sequence.

In one embodiment, the second embedding layer 43 employs a linear combination for each event vector A₁,A₂,…,A_nAnd performing linear weighted superposition to obtain a feature representation Y corresponding to the current event sequence.

In another embodiment, the second embedding layer 43 employs a timing-based neural network to perform timing processing on each event vector. Specifically, the timing-based neural network may be a recurrent neural network RNN or a long-short term memory neural network LSTM. Thus, the above-described individual event vectors may be vectorizedA₁,A₂,…,A_nAnd sequentially inputting the RNN or the LSTM according to the sequence of the occurrence time of the events. The RNN or LSTM neural network then iteratively processes the vector sequence of the plurality of event vectors in turn, resulting in a feature representation Y of the event sequence. More specifically, the RNN or LSTM may have finished processing the last event vector A_nAnd (4) taking the obtained implicit vector as a characteristic representation of the sequence.

In yet another embodiment, the second embedding layer 43 may also apply to the input multiple event vectors A₁,A₂,…,A_nAnd performing combination operation including linear combination and high-order vector inter-vector combination, thereby obtaining the feature representation Y of the current event sequence. In particular, the order M of the high-order combining operation in the second embedding layer 43 may be preset, where M>The order M and the order N in the first embedding layer 42 are independent superparameters, and may be the same or different from each other 2. Thus, in the second embedding layer 43, a plurality of event vectors a that can be output to the first embedding layer 42₁,A₂,…,A_nPerforming linear combination operation and performing combination operation between vectors within M orders to obtain sequence feature representation Y based on summation of each combination operation. The specific operation process is similar to the foregoing description of the first embedding layer 42, and is not repeated here.

In this way, according to the embodiment shown in fig. 4, the second embedding layer 43 directly combines the event vectors corresponding to the events in the input event sequence to obtain the sequence feature representation Y.

Fig. 5 shows a schematic structural diagram of a two-layer feature extractor according to another embodiment. The encoder layer 51 and the first embedding layer 52 in fig. 5 correspond to those shown in fig. 4, except that the second embedding layer 53 is processed in a different manner from that shown in fig. 4. In fig. 5, the second embedding layer 53 is for the last event E in the sequence of events_nSpecial treatment is carried out. This is because, when an event prediction model is used to evaluate an event, the event to be evaluated and previous historical events are input into the model to be evaluated in a sequence, and thus the event to be evaluated is the last event in the input sequence. Accordingly, in the training samples used for model training, the class labels tend to beIs labeled for the last event in the sequence of events. Thus, the last event in the sequence of events, either as an object to be evaluated or as an object of annotation, has different properties than the other events.

In view of the above, in the embodiment of fig. 5, the second embedding layer 53 first corresponds to the event vector a corresponding to the other events except the last event in the event sequence₁,A₂,…,A_n-1And carrying out third combination to obtain a combination vector. The third combination may be combined in the same manner as the second combination described in connection with the second embedding layer of fig. 4. Then, the event vector A corresponding to the last event is used_nAnd fourthly combining the combined vector with the combined vector to obtain the feature representation Y of the final event sequence. Wherein the fourth combination may be a linear weighted combination or a direct concatenation.

The above describes the feature extraction process of the source domain feature extractor 31, the shared feature extractor 32 and the target domain feature extractor 33 in fig. 3 by taking the two-layer feature extractor of fig. 4 and 5 as an example. It is understood that the source domain feature extractor 31, the shared feature extractor 32 and the target domain feature extractor 33 may also perform feature extraction in other manners, but the three extractors are required to be feature extractors with the same structure and algorithm.

Thus, as shown in step 204 in fig. 2, when the source domain sample is input to the source domain feature extractor 31 and the shared feature extractor 32, the source domain feature extractor 31 performs feature extraction on the source domain event sequence corresponding to the source domain sample to obtain a source domain feature representation Ys; the shared feature extractor 32 also performs feature extraction on the source domain event sequence to obtain a first feature representation Y1. Then, the source domain attention layer 34 obtains a sample feature vector V of the source domain samples from the source domain feature representation Ys and the first feature representation Y1. Specifically, the source domain attention layer 34 may perform a weighted combination of the source domain feature representation Ys and the first feature representation Y1 by using a first weight distribution factor, so as to obtain a sample feature vector V, where the first weight distribution factor may be preset or may be determined through training. In other embodiments, the source domain attention layer 34 may also combine the source domain feature representation Ys and the first feature representation Y1 in other manners, such as stitching, linear transformation, and the like, to obtain the sample feature vector V of the source domain sample.

On the other hand, as shown in step 205 in fig. 2, when the target domain sample is input into the target domain feature extractor 33 and the shared feature extractor 32, the target domain feature extractor 33 performs feature extraction on the target domain event sequence corresponding to the target domain sample to obtain a target domain feature representation Yt; the shared feature extractor 32 also performs feature extraction on the target domain event sequence to obtain a second feature representation Y2. Then, the target domain attention layer 35 obtains a sample feature vector V of the target domain sample according to the target domain feature representation Yt and the second feature representation Y2. Similarly, the target domain attention layer 35 may perform a weighted combination of the source domain feature representation Yt and the second feature representation Y2 by using a second weight distribution factor to obtain the sample feature vector V, where the second weight distribution factor may be preset or determined through training. Alternatively, the target domain attention layer 35 may also combine the target domain feature representation Yt and the second feature representation Y2 by other ways to obtain a sample feature vector V of the target domain sample.

Thus, for the current sample input to the event prediction model, the sample feature vector V of the current sample is obtained through the source domain feature extractor 31, the shared feature extractor 32, the target domain feature extractor 33, and the corresponding attention layer.

With continued reference to fig. 2 and 3. Next, in step 206, the sample feature vector V is input to the classifier 36 in the event prediction model. The classifier 36 predicts the event type of the current sample according to the sample feature vector V to obtain a prediction result.

In particular, the classifier 36 may further process the sample feature vectors using a multi-level perceptron (MLP), and finally apply a functional operation such as softmax to obtain a prediction result for the current sample. The prediction result may be embodied as a predicted classification category or a probability that the current sample belongs to each classification.

The processing of steps 203 to 206 described above may be employed for each sample in the training sample set. Thus, the prediction results of the respective samples can be obtained.

Then, in step 207, a classification loss is determined based on the prediction results for each sample and the corresponding classification label. In particular, classification loss may be determined from the alignment of the prediction results and the classification labels using various forms of impairment functions, such as cross entropy, L2 error, and the like. The classification Loss can be denoted as C Loss.

In addition, at step 208, a Domain Adaptation (Domain Adaptation) Loss, denoted as DA Loss, is determined based on the first characterization of each source Domain sample at a particular network layer of the event prediction model and the second characterization of each target Domain sample at the particular network layer.

The domain adaptation Loss, DA Loss, can be used as a measure of the difference in the characterization of samples from different domains in the model. In one embodiment, a particular network layer, such as a source domain/target domain attention layer, or a certain layer in a classifier, may be selected in the event prediction model to obtain the source domain samples and the target domain samples. In particular, in one example, the tokens may be vector tokens. In this way, a first representation of a certain number of source domain samples in the network layer forms a first matrix, a second representation of the same number of target domain samples in the network layer forms a second matrix, and the domain adaptive Loss DA Loss may be determined according to a similarity or distance between the first matrix and the second matrix. Thus, when the event prediction model is trained with loss reduction as a target, the source domain samples and the target domain samples will be made to obtain similar characterization.

Further, in one embodiment, the characterization metric of the samples is refined based on the event category such that the source domain samples and the target domain samples of the same event category have similar characterizations. That is, in step 208, a domain adaptation loss is determined based on a difference in distribution between a first characterization of each source domain sample having each class label at a particular network layer and a second characterization of each target domain sample having a corresponding class label at the particular network layer.

More specifically, in one embodiment, the homogeneous distance may be defined according to the characterization difference between the source domain sample and the target domain sample in the same event category, and the domain adaptation Loss DA Loss may be determined based on the homogeneous distance.

As previously mentioned, each sample in the training sample set has a class label, such as a label for a fraudulent or non-fraudulent event (binary class), or a label for a risk level of an event (multi-class is possible). For any first classification label c1, a first characterization at a particular network layer of each source domain sample i with the label c1 can be obtained

A second characterization of each target domain sample j at the particular network layer is also obtained, also having the classification label c1

The homogeneous distance corresponding to the first classification c1 can then be determined according to the distribution difference of the first and second characterizations.

In a specific example, the homogeneous distance d (c1) corresponding to the classification c1 can be defined as:

wherein, the superscript s represents the source domain and t represents the target domain;

is the number of source domain samples with classification label c1,

is that

A first characterization of each source domain sample i;

is the number of target domain samples having classification label c1,

is that

Of the respective target domain sample j.

In a specific example, the specific network layer is a predictor output layer in the classifier, such as a softmax layer. In this case, the first and second representations are predicted values, for example, probability values belonging to corresponding event categories.

In this case, in the formula (2),

a mean value representing the respective predicted values of the respective source domain samples i with the first classification label c1, referred to as a first mean value;

the mean value representing the predicted value of each target domain sample j with the label c1 is referred to as the second mean value. The homogeneous distance d (c1) for the first classification c1 is the absolute value of the difference between the first mean and the second mean.

In another specific example, the specific network layer is an intermediate layer in the event prediction model, for example, the attention layer 34/35, or a certain layer in the classifier; the intermediate layer outputs a vector representation. In such a case, the first and second representations are both vector representations, referred to as first vector representation and second vector representation.

Accordingly, according to the formula (2),

a first average vector representing respective first vector representations of respective source domain samples i having a first class label c 1;

each second vector characterization representing each target domain sample j with the label c1The second average vector of (2). The homogeneous distance d (c1) for the first classification c1 is the norm distance of the first average vector and the second average vector.

Thus, the same-class distance reflects the characterization difference between the source domain sample and the target domain sample under the same classification category. On the basis of thus defining the homogeneous distances of a certain class, the domain adaptation loss can be determined as the sum of the homogeneous distances corresponding to the classes.

For example, in one example, the domain adaptation loss DA loss is determined according to the following equation (3):

DA Loss＝∑_cid(ci) (3)

thus, when the event prediction model is trained with loss reduction as a target, the similar distance is reduced, that is, under the same classification category, the source domain samples and the target domain samples are more similar in characterization, so that the distance of feature representation of the two domain samples under the same classification is shortened.

In one embodiment, on the basis of the above, the inter-class distance is further defined.

Specifically, for any first classification label c1, a first characterization of each source domain sample i with the label c1 at a certain network layer can be similarly obtained; in addition, a third characterization of each target domain sample j at the particular network layer with another classification label c2 is also obtained. The inter-class distance between the first class c1 and the second class c2 may then be determined from the distribution difference of the first and third characterization described above.

In a specific example, the inter-class distance d (c1, c2) for class c1 and class c2 may be defined as:

wherein the content of the first and second substances,

is the number of source domain samples with classification label c1,

is that

A first characterization of each source domain sample i;

is the number of target domain samples having classification label c2,

is that

Of the respective target domain sample j.

It should be noted that, in the formula (4), the inter-class distance d (c1, c2) is the sample feature distance of the two classes c1 and c2 when the source domain sample belongs to the first class c1 and the target domain sample belongs to the second class c 2; if c1 and c2 are swapped, different distance values may be obtained, i.e., d (c2, c1) may be different from d (c1, c 2).

Accordingly, in one embodiment, the domain adaptation loss may be determined as being inversely proportional to the sum of the inter-class distances between the various classes.

For example, in one example, the domain adaptation loss DA loss is determined according to the following equation (5):

according to the above formula (5), the domain adaptation loss is proportional to the sum of the distances of the same class and inversely proportional to the sum of the distances between classes of different classes. Thus, when the event prediction model is trained with the loss reduction as a target, the distance between the same class is reduced, and the distance between the classes is increased, that is, the feature representations of the two domain samples under the same classification are more similar, and the feature representations of the two domain samples under different classifications are far away from each other. Therefore, inter-domain transfer learning is carried out on the fine granularity of different classifications.

The classification loss is determined in step 207 and the domain adaptation loss is determined in step 208, whereupon the total loss is determined in the following step 209 based on the classification loss and the domain adaptation loss. In one embodiment, respective weights α and β are set for the classification loss and the domain adaptation loss, so that the total loss L can be expressed as:

L＝αL1+βL2 (6)

wherein, L1 is the classification Loss, and L2 is the domain adaptation Loss DA Loss.

Thus, at step 210, the event prediction model is updated in the direction of the total loss reduction. Specifically, the model parameters of each module in the event prediction model can be adjusted by adopting modes such as back propagation, gradient descent and the like, so that the event prediction model is trained and updated.

In summary, based on the training process shown in fig. 2 and the network structure shown in fig. 3, under the condition of fewer target domain samples, a transfer learning manner can be utilized, and differentiated unified training can be performed by utilizing the source domain samples with rich sample data. Specifically, the event prediction model includes an active domain model part, a shared model part and a target domain model part. In the training process, because the source domain samples are rich, the source domain model part can quickly establish applicable model parameters. The shared model part processes both the source domain samples and the target domain samples, and model parameters obtained by training the source domain data can be transferred to the target domain part. In combination with the domain adaptation loss set for the source domain and the target domain characterization, the target domain model portion can be made to obtain a feature characterization similar to the source domain, so that a model suitable for both the source domain and the target domain is obtained by training based on a small amount of target domain data.

On the basis of training to obtain an event prediction model, the event prediction model can be used for evaluating and predicting the event of the target domain.

FIG. 6 illustrates a method of evaluating a user-operated event according to one embodiment. It is to be appreciated that the method can be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities. As shown in fig. 6, the method includes at least the following steps.

At step 61, a first sequence of events of the target domain is obtained, the first sequence of events comprising a current operational event to be evaluated and at least one historical operational event. In general, at least one historical operating event may be obtained by tracing back a predetermined number of events or events within a predetermined time forward for a current operating event to be evaluated. And arranging the historical operation events and the current operation events according to a time sequence to obtain the first event sequence. Here, the first event sequence is an event sequence of the target domain, that is, each operation event therein, including the current operation event and the historical operation event, is a target domain event.

In step 62, on the other hand, an event prediction model trained according to the method of FIG. 2 is obtained. As shown in fig. 3, the event prediction model at least includes a trained source domain feature extractor, a target domain feature extractor, a shared feature extractor, and a classifier.

Then, in step 63, a target domain feature extractor is used to perform feature extraction on the first event sequence to obtain a target domain feature representation; and a shared feature extractor is adopted to extract the features of the first event sequence to obtain shared feature representation. The structure and feature extraction manner of each feature extractor refer to the foregoing description of fig. 4 and 5, and are not repeated.

Then, in step 64, a sequence feature vector of the first event sequence is obtained according to the target domain feature representation and the shared feature representation. Specifically, in an embodiment, the target domain feature representation obtained by the target domain feature extractor and the shared feature representation obtained by the shared feature extractor may be weighted and combined by using a weight distribution factor to obtain a sequence feature vector. In one example, the weight distribution factor may be a preset hyper-parameter. In another example, the weight assignment factor may be a model parameter in the target domain attention layer in fig. 3, determined through a training process on the model of fig. 3.

Upon obtaining the sequence feature vector of the first event sequence, in step 65, the event class of the current operation event in the first event sequence is predicted based on the sequence feature vector by using a classifier in an event prediction model. In this way, the evaluation of the event category of the current operation event is realized.

For example, the current operational event may be an event that occurs at the target domain requesting a transfer. The event may be evaluated, for example, whether the event is a fraud (cash-out) event or the risk level of the event, by the evaluation process of fig. 6. In this manner, subsequent protection decisions may be made with respect to the current operational event, such as whether to allow the transfer, whether to back up the transfer event, and so forth.

In one embodiment, the relevant weight parameters in the event prediction model can also be output, so that the evaluation result of the event category has stronger interpretability. For example, in the case where the feature extractor adopts two-layer feature extraction, the weights of the respective attributes in the event and the weights of the respective events in the sequence in obtaining the above-described sequence feature vector may be output. Furthermore, the weight assignment factor used in step 64, i.e. the relative weights of the target domain feature extractor and the shared feature extractor, may also be output. In this way, it is convenient for the analyst to understand the influence and effect of the event evaluation result in step 65 and the factors of different levels, including the domain, the event, and the attribute.

FIG. 6 above illustrates a process for predicting a current operational event of a target domain using a trained event prediction model. It can be seen that in this process, only the target domain feature extractor and the shared feature extractor are used, and the source domain feature extractor is not necessarily used. The source domain feature extractor is mainly used for helping the target domain feature extractor to establish applicable model parameters more quickly in the training process.

Of course, the above event prediction model may also be used to predict the event to be evaluated in the source domain. In such a case, similar to the case of training, a source domain event sequence including a source domain event to be evaluated may be input to the source domain feature extractor and the shared feature extractor without using the target domain feature extractor, and finally, the event category may be predicted by the classifier.

By combining the above, through the training process shown in fig. 2, an event prediction model can be obtained through training based on a small number of target domain samples, and the event prediction model can accurately evaluate and predict both the event to be evaluated in the source domain and the event to be evaluated in the target domain.

According to an embodiment of another aspect, an apparatus for training an event prediction model is provided, which may be deployed in any device, platform or cluster of devices having computing and processing capabilities. FIG. 7 shows a schematic block diagram of an apparatus to train an event prediction model according to one embodiment. As shown in fig. 7, the training apparatus 700 includes:

a sample set obtaining unit 71, configured to obtain a training sample set, where the training sample set includes a first number of source domain samples and a second number of target domain samples, the first number is greater than the second number, and each sample has a corresponding classification label; each source domain sample comprises a source domain event sequence formed by a plurality of source domain events, and each target domain sample comprises a target domain event sequence formed by a plurality of target domain events;

a processing unit 72 configured to input the respective samples as current samples into an event prediction model, the event prediction model comprising at least a source domain feature extractor, a target domain feature extractor, a shared feature extractor, and a classifier, wherein,

a predicting unit 73 configured to predict, by using the classifier, an event class of the current sample based on the sample feature vector of the current sample, and obtain a prediction result;

a first loss determining unit 74 configured to determine a classification loss according to the prediction result of each sample and the corresponding classification label;

a second loss determination unit 75 configured to determine a domain adaptation loss according to a first characterization of each source domain sample in a specific network layer of the event prediction model and a second characterization of each target domain sample in the specific network layer;

a total loss determination unit 76 configured to determine a total loss based on the classification loss and the domain adaptation loss;

an updating unit 77 configured to update the event prediction model in a direction of the total loss reduction.

According to one embodiment, in the processing unit 72, the sample feature vector of the source domain samples is obtained by: carrying out weighted combination on the source domain feature representation and the first feature representation by utilizing a first weight distribution factor to obtain a sample feature vector of the source domain sample;

In one embodiment, the second loss determination unit 75 is configured to:

Further, in one embodiment, the second loss determination unit 75 determines the domain adaptation loss as follows:

Further, in an example, the specific network layer is a predicted value output layer in the classifier, the first characteristic is a first predicted value, and the second characteristic is a second predicted value; in such a case, the second loss determination unit 75 determines the homogeneous distance corresponding to the first classification as follows:

In another example, the first token is a first vector token; the second characterization is a second vector characterization; in such a case, the second loss determination unit 75 determines the homogeneous distance corresponding to the first classification as follows:

In one embodiment, the second loss determination unit 75 is further configured to:

According to an embodiment of yet another aspect, an apparatus for evaluating an operational event is provided, which may be deployed in any computing, processing capable device, platform, or cluster of devices. FIG. 8 shows a schematic block diagram of an apparatus to evaluate an operational event according to one embodiment. As shown in fig. 8, the evaluation device 800 includes:

an event sequence acquiring unit 81 configured to acquire a first event sequence, where the first event sequence includes a current operation event to be evaluated and at least one historical operation event, and each operation event is a target domain event;

a model obtaining unit 82 configured to obtain an event prediction model obtained by training according to the apparatus of fig. 7, wherein the event prediction model includes a trained source domain feature extractor, a trained target domain feature extractor, a trained shared feature extractor, and a trained classifier;

a feature extraction unit 83 configured to perform feature extraction on the first event sequence by using the target domain feature extractor to obtain a target domain feature representation; extracting the characteristics of the first event sequence by adopting the shared characteristic extractor to obtain shared characteristic representation;

a vector obtaining unit 84, configured to obtain a sequence feature vector of the first event sequence according to the target domain feature representation and the shared feature representation;

a prediction unit 85 configured to predict, by using the classifier, an event class of a current operation event in the first event sequence based on the sequence feature vector.

In one embodiment, the vector obtaining unit 84 is specifically configured to: carrying out weighted combination on the target domain feature representation and the shared feature representation by using a weight distribution factor to obtain the sequence feature vector;

the apparatus 800 may further include a weight output unit (not shown) configured to output the weight distribution factor to indicate an influence of the target domain feature extractor and the shared feature extractor on the prediction result.

By the device, the event prediction model which is simultaneously suitable for the source domain and the target domain is obtained by training by using the source domain sample data with abundant data and the target domain sample data with relatively rare data, and the event to be evaluated of the target domain can be accurately and effectively evaluated by using the event prediction model.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2 and 6.

According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method described in connection with fig. 2 and 6.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. A method of training an event prediction model, the method comprising:

obtaining a training sample set, wherein the training sample set comprises a first number of source domain samples and a second number of target domain samples, the first number is larger than the second number, and each sample has a corresponding classification label; each source domain sample comprises a source domain event sequence formed by a plurality of source domain events, and each target domain sample comprises a target domain event sequence formed by a plurality of target domain events; the source domain event and the target domain event are both user operation events;

inputting each sample as a current sample into an event prediction model, wherein the event prediction model at least comprises a source domain feature extractor, a target domain feature extractor, a shared feature extractor and a classifier,

2. The method of claim 1, wherein the source domain event has a first plurality of attributes; the target domain event has a second plurality of attributes, wherein there is an intersection of the first plurality of attributes and the second plurality of attributes;

the source domain feature extractor is used for extracting features in a first feature space corresponding to the first plurality of attributes;

the target domain feature extractor is used for extracting features in a second feature space corresponding to the second multiple attributes;

the shared feature extractor is configured to perform feature extraction in a shared feature space, where the shared feature space corresponds to a union of the first plurality of attributes and the second plurality of attributes.

3. The method of claim 2, wherein,

performing feature extraction on the source domain sample by using the shared feature extractor to obtain a first feature representation, including:

performing feature extraction on the target domain sample by using the shared feature extractor to obtain a second feature representation, including:

4. The method of claim 1, wherein the source domain feature extractor, the target domain feature extractor and the shared feature extractor are parametric, structurally identical, two-layer feature extractors, the two-layer feature extractors comprising an encoding layer, a first embedding layer and a second embedding layer; wherein the content of the first and second substances,

5. The method of claim 4, wherein the first combining comprises an inter-vector combining operation of order N involving multiplication of N encoded vectors, where N > -2.

6. The method of claim 4, wherein the second embedding layer comprises a time-series-based neural network for iteratively processing the event vectors in sequence to obtain the feature representation corresponding to the current event series.

7. The method of claim 4, wherein the second combining comprises an M-th order inter-vector combining operation involving multiplication of M event vectors, where M > -2.

8. The method of claim 1, wherein,

obtaining a sample feature vector of a source domain sample according to the source domain feature representation and the first feature representation, wherein the step of performing weighted combination on the source domain feature representation and the first feature representation by using a first weight distribution factor to obtain the sample feature vector of the source domain sample;

obtaining a sample feature vector of a target domain sample according to the target domain feature representation and the second feature representation comprises performing weighted combination on the target domain feature representation and the second feature representation by using a second weight distribution factor to obtain the sample feature vector of the target domain sample.

9. The method of claim 1, wherein determining a domain adaptation loss based on a first characterization of each source domain sample at a particular network layer of the event prediction model and a second characterization of each target domain sample at the particular network layer comprises:

10. The method of claim 9, wherein the determining a domain adaptation loss comprises:

11. The method of claim 10, wherein the particular network layer is a predictor output layer in the classifier, the first characterization is a first predictor, and the second characterization is a second predictor;

according to the distribution difference of the first characterization and the second characterization, determining the same-class distance corresponding to the first classification, including:

12. The method of claim 10, wherein the first characterization is a first vector characterization; the second characterization is a second vector characterization;

13. The method of claim 10, wherein the determining a domain adaptation loss further comprises:

14. A method of evaluating a user-operated event, the method comprising:

obtaining an event prediction model trained according to the method of claim 1, wherein the event prediction model comprises a trained source domain feature extractor, a trained target domain feature extractor, a trained shared feature extractor and a trained classifier;

15. The method of claim 14, wherein deriving a sequence feature vector for the first sequence of events from the target domain feature representation and the shared feature representation comprises:

carrying out weighted combination on the target domain feature representation and the shared feature representation by using a weight distribution factor to obtain the sequence feature vector;

the method further includes outputting the weight assignment factor to indicate an impact of the target domain feature extractor and the shared feature extractor on the prediction result.

16. The method of claim 15, wherein the weight assignment factor is determined by a training process of the event prediction model.

17. An apparatus to train an event prediction model, the apparatus comprising:

a sample set obtaining unit configured to obtain a training sample set, wherein the training sample set includes a first number of source domain samples and a second number of target domain samples, the first number is greater than the second number, and each sample has a corresponding classification label; each source domain sample comprises a source domain event sequence formed by a plurality of source domain events, and each target domain sample comprises a target domain event sequence formed by a plurality of target domain events; the source domain event and the target domain event are both user operation events;

18. An apparatus for evaluating a user-operated event, the apparatus comprising:

a model obtaining unit configured to obtain an event prediction model trained according to the apparatus of claim 17, wherein the event prediction model comprises a trained source domain feature extractor, a trained target domain feature extractor, a trained shared feature extractor, and a trained classifier;

19. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-16.

20. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any of claims 1-16.