CN110659744B - Training event prediction model, and method and device for evaluating operation event - Google Patents

Training event prediction model, and method and device for evaluating operation event Download PDF

Info

Publication number
CN110659744B
CN110659744B CN201910916976.6A CN201910916976A CN110659744B CN 110659744 B CN110659744 B CN 110659744B CN 201910916976 A CN201910916976 A CN 201910916976A CN 110659744 B CN110659744 B CN 110659744B
Authority
CN
China
Prior art keywords
event
feature
sample
domain
target domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910916976.6A
Other languages
Chinese (zh)
Other versions
CN110659744A (en
Inventor
宋博文
朱勇椿
陈帅
顾曦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN201910916976.6A priority Critical patent/CN110659744B/en
Publication of CN110659744A publication Critical patent/CN110659744A/en
Application granted granted Critical
Publication of CN110659744B publication Critical patent/CN110659744B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the specification provides a method and a device for training an event prediction model and performing event evaluation. In the training method, firstly, a training sample set is obtained, wherein the training sample set comprises a large number of source domain samples and a small number of target domain samples; and inputting each sample into an event prediction model, wherein the model comprises a source domain extractor, a target domain extractor and a shared extractor, the source domain sample is processed by the source domain extractor and the shared extractor, and the target domain sample is processed by the target domain extractor and the shared extractor. Then, based on the processed sample feature vector, the classification category of the current sample is predicted, and the classification loss is obtained according to the classification category. In another aspect, the domain adaptation loss is also determined based on a first characterization of each source domain sample at a model-specific network layer and a second characterization of the target domain sample at that network layer. Thus, the event prediction model is updated and trained in the direction of the total loss reduction consisting of classification loss and domain adaptation loss.

Description

Training event prediction model, and method and device for evaluating operation event
Technical Field
One or more embodiments of the present specification relate to the field of machine learning, and more particularly, to training an event prediction model using machine learning, and a method and apparatus for evaluating an operational event using the model.
Background
In many scenarios, analysis and processing of user operation behaviors or operation events are required. For example, in order to identify high-risk operation behaviors which may threaten network security or user information security, such as account stealing, traffic attack, fraudulent transaction and the like, the risk degree of the user operation behaviors can be evaluated so as to perform risk prevention and control.
To assess the risk of an operational behavior, an analysis may be performed based on characteristics of the operational behavior itself. Further, the behavior sequence of the user can be more fully considered. The behavior sequence is the occurrence process of a series of events such as clicking, accessing, purchasing and the like generated in daily operation and use by a user, can be represented as a time sequence of an event set, contains the characteristics of fine-grained habit preference and the like of the user, and is convenient for analyzing the operation history and the operation mode of the user more comprehensively. However, both operational events and behavior sequence data face the problem of feature characterization and characterization, i.e., extracting representative aggregate features from a huge feature space for characterizing the risk of operational events. The feature extraction work is often performed empirically by a business person. However, it is understood that manual feature engineering is extremely labor and time consuming, and the effect is heavily dependent on manual business experience and efficiency, and there is also a risk of security leakage.
In some schemes, the work of feature extraction is also finished through machine learning, namely, a large number of relevant features are input into a model, model training is carried out through labeled data, and extraction and combination of features are automatically learned. This puts high demands on the model design. In addition, such model training is difficult in a region where labeling data is rare.
Accordingly, improved approaches are desired for more accurately and efficiently analyzing operational events to facilitate risk prevention and control.
Disclosure of Invention
One or more embodiments of the present disclosure describe a method and an apparatus for training an event prediction model and evaluating an operation event, in which a source domain sample with rich data and a target domain sample with relatively sparse data are used to train and obtain an event prediction model applicable to both a source domain and a target domain, thereby comprehensively improving accuracy and efficiency of event classification prediction.
According to a first aspect, there is provided a method of training an event prediction model, the method comprising:
obtaining a training sample set, wherein the training sample set comprises a first number of source domain samples and a second number of target domain samples, the first number is larger than the second number, and each sample has a corresponding classification label; each source domain sample comprises a source domain event sequence formed by a plurality of source domain events, and each target domain sample comprises a target domain event sequence formed by a plurality of target domain events;
inputting each sample as a current sample into an event prediction model, wherein the event prediction model at least comprises a source domain feature extractor, a target domain feature extractor, a shared feature extractor and a classifier, and when the current sample is a source domain sample, the source domain feature extractor is adopted to perform feature extraction on the source domain sample to obtain a source domain feature representation; performing feature extraction on the source domain sample by using the shared feature extractor to obtain a first feature representation; obtaining a sample feature vector of a source domain sample according to the source domain feature representation and the first feature representation;
when the current sample is a target domain sample, performing feature extraction on the target domain sample by using the target domain feature extractor to obtain target domain feature representation; performing feature extraction on the target domain sample by using the shared feature extractor to obtain a second feature representation; obtaining a sample feature vector of the target domain sample according to the target domain feature representation and the second feature representation;
predicting the event category of the current sample by using the classifier based on the sample feature vector of the current sample to obtain a prediction result;
determining classification loss according to the prediction result of each sample and the corresponding classification label;
determining a domain adaptation loss according to a first characterization of each source domain sample in a specific network layer of the event prediction model and a second characterization of each target domain sample in the specific network layer;
determining a total loss according to the classification loss and the domain adaptive loss;
updating the event prediction model in the direction of the total loss reduction.
In one embodiment, each source domain event has a first plurality of attributes; each target domain event having a second plurality of attributes, wherein there is an intersection of the first plurality of attributes and the second plurality of attributes;
in such a case, the source domain feature extractor is configured to perform feature extraction in a first feature space corresponding to the first plurality of attributes; the target domain feature extractor is used for extracting features in a second feature space corresponding to the second multiple attributes;
and the shared feature extractor is used for extracting features in a shared feature space, wherein the shared feature space corresponds to a union of the first multiple attributes and the second multiple attributes.
In a further embodiment, the feature extraction process of the shared feature extractor on the source domain samples may include:
filling attribute values of a first plurality of attributes of each source domain event in the source domain sample into fields corresponding to the first plurality of attributes in the shared feature space, filling other fields with default values to obtain a first attribute representation of the source domain event in the shared feature space, and extracting features according to the first attribute representation;
the process of the shared feature extractor for extracting the features of the target domain samples may include:
and filling the attribute values of the second multiple attributes of each target domain event in the target domain sample into fields corresponding to the second multiple attributes in the shared feature space, filling the other fields with default values to obtain a second attribute representation of the target domain event in the shared feature space, and extracting features according to the second attribute representation.
According to one implementation mode, the source domain feature extractor, the target domain feature extractor and the shared feature extractor are double-layer feature extractors with different parameters and the same structure, and each double-layer feature extractor comprises a coding layer, a first embedding layer and a second embedding layer; wherein the content of the first and second substances,
the encoding layer is used for encoding a plurality of items of attribute information of each event in a current event sequence corresponding to an input current sample into a plurality of corresponding encoding vectors;
the first embedding layer is used for carrying out first combination on the plurality of coding vectors of each event to obtain each event vector corresponding to each event;
and the second embedded layer is used for carrying out second combination on the event vectors to obtain the feature representation corresponding to the current event sequence.
Further, in one embodiment, the first combining by the first embedding layer comprises an inter-vector combining operation of order N involving multiplication of N encoded vectors, where N > 2.
In an embodiment, the second embedding layer includes a time-series-based neural network, and is configured to sequentially perform iterative processing on the event vectors to obtain the feature representation corresponding to the current event sequence.
In another embodiment, the second combination employed by the second embedding layer comprises an M-th order inter-vector combining operation involving multiplication of M event vectors, where M > -2.
According to one embodiment, a sample feature vector of source domain samples is obtained by: carrying out weighted combination on the source domain feature representation and the first feature representation by utilizing a first weight distribution factor to obtain a sample feature vector of the source domain sample;
obtaining a sample feature vector of the target domain sample by the following method: and carrying out weighted combination on the target domain feature representation and the second feature representation by using a second weight distribution factor to obtain a sample feature vector of the target domain sample.
In one embodiment, the domain adaptation loss is determined as follows:
and determining the domain adaptation loss according to the distribution difference between the first characterization of each source domain sample with each classification label in the specific network layer and the second characterization of each target domain sample with the corresponding classification label in the specific network layer.
Further, in one embodiment, determining the domain adaptation loss comprises:
obtaining a first representation of each source domain sample with any first classification label at the specific network layer;
obtaining a second representation of each target domain sample with the first classification label at the specific network layer;
determining the same-class distance corresponding to the first classification according to the distribution difference of the first characterization and the second characterization;
and determining the domain adaptation loss as being proportional to the sum of the homogeneous distances corresponding to the various classes.
Further, in an example, the specific network layer is a predicted value output layer in the classifier, the first characteristic is a first predicted value, and the second characteristic is a second predicted value; in such a case, determining the same-class distance corresponding to the first classification according to the distribution difference between the first characterization and the second characterization may specifically include:
determining a first mean value of the first predicted values of the source domain samples with the first classification labels;
determining a second mean value of the second predicted values of the target domain samples with the first classification label;
and determining the same-class distance corresponding to the first classification according to the difference between the first average value and the second average value.
In another example, the first token is a first vector token; the second characterization is a second vector characterization; in such a case, determining the same-class distance corresponding to the first classification according to the distribution difference between the first characterization and the second characterization may specifically include:
determining a first average vector of respective first vector representations of respective source domain samples having a first class label;
determining a second average vector of respective second vector representations of respective target domain samples having the first classification label;
and determining the same-class distance corresponding to the first classification according to the norm distance of the first average vector and the second average vector.
In one embodiment, determining the domain adaptation loss may further comprise:
obtaining a third representation of each target domain sample at the particular network layer with a second class label, the second class label being different from the first class label;
determining an inter-class distance between the first class and the second class according to the distribution difference of the first characterization and the third characterization;
the domain adaptation loss is determined to be inversely proportional to a sum of the inter-class distances between the respective different classes.
According to a second aspect, there is provided a method of evaluating a user operated event, the method comprising:
acquiring a first event sequence, wherein the first event sequence comprises a current operation event to be evaluated and at least one historical operation event, and each operation event is a target domain event;
obtaining an event prediction model obtained by training according to the method of the first aspect, wherein the event prediction model comprises a trained source domain feature extractor, a trained target domain feature extractor, a trained shared feature extractor and a trained classifier;
performing feature extraction on the first event sequence by adopting the target domain feature extractor to obtain a target domain feature representation; extracting the characteristics of the first event sequence by adopting the shared characteristic extractor to obtain shared characteristic representation;
obtaining a sequence feature vector of the first event sequence according to the target domain feature representation and the shared feature representation;
predicting, with the classifier, an event class of a current operational event in the first sequence of events based on the sequence feature vector.
In one embodiment, the sequence feature vector for the first sequence of events is obtained as follows: carrying out weighted combination on the target domain feature representation and the shared feature representation by using a weight distribution factor to obtain the sequence feature vector;
the method may further include outputting the weight assignment factor to indicate an impact of the target domain feature extractor and the shared feature extractor on the prediction result.
Further, in one embodiment, the weight distribution factor is determined by a training process of the event prediction model.
According to a third aspect, there is provided an apparatus for training an event prediction model, the apparatus comprising:
a sample set obtaining unit configured to obtain a training sample set, wherein the training sample set includes a first number of source domain samples and a second number of target domain samples, the first number is greater than the second number, and each sample has a corresponding classification label; each source domain sample comprises a source domain event sequence formed by a plurality of source domain events, and each target domain sample comprises a target domain event sequence formed by a plurality of target domain events;
a processing unit configured to input, as a current sample, each sample into an event prediction model including at least a source domain feature extractor, a target domain feature extractor, a shared feature extractor, and a classifier,
when the current sample is a source domain sample, performing feature extraction on the source domain sample by using the source domain feature extractor to obtain a source domain feature representation; performing feature extraction on the source domain sample by using the shared feature extractor to obtain a first feature representation; obtaining a sample feature vector of a source domain sample according to the source domain feature representation and the first feature representation;
when the current sample is a target domain sample, performing feature extraction on the target domain sample by using the target domain feature extractor to obtain target domain feature representation; performing feature extraction on the target domain sample by using the shared feature extractor to obtain a second feature representation; obtaining a sample feature vector of the target domain sample according to the target domain feature representation and the second feature representation;
the prediction unit is configured to predict the event category of the current sample based on the sample feature vector of the current sample by using the classifier to obtain a prediction result;
a first loss determination unit configured to determine a classification loss according to the prediction result of each sample and the corresponding classification label;
a second loss determination unit configured to determine a domain adaptation loss according to a first characterization of each source domain sample in a specific network layer of the event prediction model and a second characterization of each target domain sample in the specific network layer;
a total loss determination unit configured to determine a total loss according to the classification loss and the domain adaptation loss;
an updating unit configured to update the event prediction model in a direction in which the total loss decreases.
According to a fourth aspect, there is provided an apparatus for evaluating a user operation event, the apparatus comprising:
the event sequence acquiring unit is configured to acquire a first event sequence, wherein the first event sequence comprises a current operation event to be evaluated and at least one historical operation event, and each operation event is a target domain event;
a model obtaining unit, configured to obtain an event prediction model obtained by training the apparatus of the third aspect, where the event prediction model includes a trained source domain feature extractor, a trained target domain feature extractor, a trained shared feature extractor, and a trained classifier;
the feature extraction unit is configured to perform feature extraction on the first event sequence by using the target domain feature extractor to obtain a target domain feature representation; extracting the characteristics of the first event sequence by adopting the shared characteristic extractor to obtain shared characteristic representation;
the vector acquisition unit is configured to obtain a sequence feature vector of the first event sequence according to the target domain feature representation and the shared feature representation;
a prediction unit configured to predict an event class of a current operation event in the first sequence of events based on the sequence feature vector using the classifier.
According to a fifth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first and second aspects.
According to a sixth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and wherein the processor, when executing the executable code, implements the methods of the first and second aspects.
According to the method and the device provided by the embodiment of the specification, under the condition that the number of target domain samples is small, the source domain samples with rich sample data are utilized in a transfer learning mode to perform differentiated unified training, and the event prediction model suitable for the source domain and the target domain is obtained. Specifically, the event prediction model includes an active domain model part, a shared model part and a target domain model part. In the training process, because the source domain samples are rich, the source domain model part can quickly establish applicable model parameters. The shared model part processes both the source domain samples and the target domain samples, and model parameters obtained by training the source domain data can be transferred to the target domain part. By combining the domain adaptive losses set for the source domain and the target domain characterization, the target domain model part can obtain the characteristic characterization similar to that of the source domain, so that an applicable event prediction model is obtained by training based on a small amount of target domain data. The event prediction model can be used for evaluating operation events of a source domain or a target domain.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram illustrating an implementation scenario of an embodiment disclosed herein;
FIG. 2 illustrates a flow diagram of a method of training an event prediction model, according to one embodiment;
FIG. 3 illustrates a schematic structural diagram of an event prediction model according to one embodiment;
FIG. 4 shows a schematic structural diagram of a two-layer feature extractor according to one embodiment;
FIG. 5 shows a schematic structural diagram of a two-layer feature extractor according to another embodiment;
FIG. 6 illustrates a method of evaluating a user-operated event according to one embodiment;
FIG. 7 shows a schematic block diagram of an apparatus to train an event prediction model according to one embodiment;
FIG. 8 shows a schematic block diagram of an apparatus to evaluate an operational event according to one embodiment.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
As previously mentioned, characterization and characterization of operational events is important in order to evaluate them. To avoid the drawbacks of manual feature engineering, feature characterization and event evaluation are learned through modeling and model training. However, as known to those skilled in the art, model training relies on a large amount of annotation data. In some areas where annotation data is sparse, such model training and learning is difficult.
In view of the above problem, in the embodiment of the present specification, a migration learning manner is used, and model training is performed by using labeled data of a similar domain with a richer data volume, so that a trained model can be used in a domain with a smaller data volume. Generally, a domain with a richer data amount may be referred to as a source domain, and a domain to be analyzed and learned, but a domain with a smaller data amount is referred to as a target domain.
For example, in one scenario, a user's interaction events in a customer service platform need to be analyzed. Assuming that the hot-line customer service platform is started for a long time and accumulates a large amount of data, and the data of the online customer service platform to be analyzed is rare due to the fact that the online customer service platform is on line soon, and the data of the online customer service platform have certain similarity, the hot-line customer service platform can be used as a source domain, and the online customer service platform can be used as a target domain. For another example, in another scenario, there is a need to analyze the operational events of users in different areas of a certain service platform. Assuming that the east China has a long service starting time and more accumulated data, and the north China to be analyzed has a short service opening time and rare data, the east China can be used as a source domain and the north China can be used as a target domain.
Because the source domain data is rich, in the conventional transfer learning, a model is usually trained based on the source domain data, then the target domain data is fitted with the source domain data by generating countermeasures and the like, and the model suitable for the target domain is obtained by multi-step multi-stage training. Unlike the conventional migration learning described above, in the embodiment disclosed in the present specification, the source domain data and the target domain data are uniformly trained in a differentiated manner, so that the event prediction model suitable for both the source domain and the target domain is obtained quickly and efficiently.
Fig. 1 shows a schematic illustration of an implementation scenario according to an embodiment. As shown in fig. 1, historical data from a source domain and a target domain is collected as a training sample set to train an event prediction model. More specifically, the training sample set includes a large number of source domain samples and a relatively small number of target domain samples, each sample including an event sequence of historical events of a corresponding domain.
The event prediction model can be divided into a source domain part, a shared part and a target domain part.
In the training process, the source domain samples are input into the source domain part and the sharing part for comprehensive processing, the target domain samples are input into the sharing part and the target domain part for comprehensive processing, the prediction loss is obtained according to the comprehensive processing result of the two domain samples, and the whole event prediction model is trained according to the prediction loss.
In the process, because the source domain samples are rich, the source domain part can quickly establish applicable model parameters. The sharing part processes both the source domain samples and the target domain samples, so that the effect of transferring model parameters obtained by training aiming at the source domain data to the target domain part can be achieved, and a model suitable for a target domain is obtained by training based on a small amount of target domain data.
After the event prediction model is obtained through training in the training mode, the model can be used for analyzing and evaluating an event sequence to be evaluated in a target domain. Specifically, the target domain event sequence to be evaluated may be input to the sharing part and the target domain part of the event prediction model, and an event evaluation result for the event sequence, such as an event classification result, may be output according to a comprehensive processing result of the two parts, and more specifically, may be a risk classification result.
The following describes the training process and model structure of the above event prediction model in detail.
FIG. 2 illustrates a flow diagram of a method of training an event prediction model, according to one embodiment. It is to be appreciated that the method can be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities. As shown in fig. 2, the training process includes at least the following steps.
First, in step 201, a training sample set is obtained, wherein the training sample set includes a first number of source domain samples and a second number of target domain samples, and the first number is greater than the second number.
It will be appreciated that the source domain and the target domain may depend on the business scenario to be analyzed. Generally, the source domain is a domain with rich data, and the target domain is a domain to be analyzed but with sparse data. For example, in one example, the source domain is a hotline service platform and the target domain is an online service platform; alternatively, in another example, the source domain is east China data and the target domain is North China data.
Since the data source of the source domain is richer, the number of source domain samples, i.e. the first number, in the training sample set formed by sample acquisition is generally much larger than the second number of target domain samples. Typically, the first number will be N times larger in size than the second number, e.g. N is between 5 and 15. More specifically, in one example, the number of source domain samples is 10 times the number of target domain samples.
For training the event prediction model, each training sample comprises a sequence of events comprising a predetermined number (e.g., 10 or 20) of historical events<E1,E2,…,En>And a classification tag labeled for the sequence of events. The classification label may be for the entire sequence of events, or for the last event E in the sequence of eventsnFor example, a classification category of a fraudulent or non-fraudulent event, a classification category of an event risk level, etc.
More specifically, each source domain sample comprises a source domain event sequence S consisting of a plurality of source domain events, i.e.,
Figure BDA0002216392000000111
each historical event in the sequence S
Figure BDA0002216392000000112
Are all source domain events; and each oneThe target domain sample comprises a target domain event sequence T consisting of a plurality of target domain events, i.e.,
Figure BDA0002216392000000121
each historical event in the sequence T
Figure BDA0002216392000000122
Are all target domain events.
Each event may have a plurality of items of attribute information related to the event. For comprehensiveness of event evaluation, fine-grained comprehensive attribute information of each event can be acquired for subsequent processing. These attribute information may include a behavior type of a user operation behavior (e.g., a login operation, a recharge operation, a payment operation, a transaction operation, etc.), a behavior time, device information used by the user operation (e.g., a device model, a MAC address, an IP address, etc.), information on software used, such as a browser type, an app version, etc. If the operational behavior is a transaction behavior, the attribute information may also include a related amount of the transaction behavior, a payment channel, a transaction object, and so on. In one embodiment, the event attribute information may also include operation results of historical operation events, such as operation success, failure, timeouts, and the like.
In general, the attribute information data of the source domain event and the target domain event have a certain similarity. In one embodiment, the source domain event and the target domain event have identical attribute fields, except that there is a difference in the distribution of attribute values for some of the attribute fields. For example, the source domain event and the target domain event each contain an attribute ABCDE, where attribute A is the user's age, attribute B is the model of the device being used, and so on. If the source domain event and the target domain event are from different user populations, then the two types of events differ in the attribute value distribution of attribute A and attribute B.
In one embodiment, the source domain event and the target domain event have partially identical attribute fields and also have partially unique attribute fields. In particular, the source domain event may have a first plurality of attributes, such as the attribute ABCDE, and the target domain event may have a second plurality of attributes, such as the attribute CDEFG, where the first and second plurality of attributes intersect, such as CDE. More specifically, in an example where the source domain event is a service event in east China and the target domain event is a service event in north China, an attribute intersection (e.g., CDE) of the source domain event and the target domain event may be an attribute common to the two service events, such as user equipment information, event occurrence time, and the like; attributes unique to source domain events (e.g., attribute AB) may relate to service content provided only in the eastern region of china, while attributes unique to target domain events (e.g., attribute FG) may relate to service content provided only in the northern region of china.
Thus, the attribute information of each source domain event in the source domain event sequence is collected to form a sample characteristic, and a source domain sample is formed by combining the classification label of the source domain event sequence. Similarly, the attribute information of each target domain event in the target domain event sequence is collected to form a sample characteristic, and a target domain sample is formed by combining the classification label of the target domain event sequence. The first number of source domain samples and the second number of target domain samples together constitute a training sample set.
Next, in step 202, the samples in the training sample set are sequentially input to the event prediction model as current samples.
FIG. 3 illustrates a structural schematic of an event prediction model according to one embodiment. As shown in fig. 3, the event prediction model includes at least a source domain feature extractor 31, a shared feature extractor 32, a target domain feature extractor 33, a source domain attention layer 34, a target domain attention layer 35, and a classifier 36. The following steps in the training process are described with reference to the block diagram of fig. 3.
For the current sample input to the event prediction model, as shown in step 203 in fig. 2, it is necessary to distinguish the sample as a source domain sample or a target domain sample.
If the current sample is a source domain sample, it is input to the source domain feature extractor 31 and the shared feature extractor 32 at step 204. Specifically, a source domain feature extractor 31 is adopted to perform feature extraction on the source domain sample to obtain a source domain feature representation; performing feature extraction on the source domain sample by using a shared feature extractor 32 to obtain a first feature representation; through the source domain attention layer 34, a sample feature vector of the source domain sample is obtained according to the source domain feature representation and the first feature representation.
If the current sample is a target domain sample, it is input to the shared features extractor 32 and the target domain features extractor 33 at step 205. Specifically, a target domain feature extractor 33 is adopted to perform feature extraction on the target domain sample to obtain a target domain feature representation; performing feature extraction on the target domain sample by using a shared feature extractor 32 to obtain a second feature representation; and obtaining a sample feature vector of the target domain sample according to the target domain feature representation and the second feature representation through the target domain attention layer 35.
In one embodiment, the source domain feature extractor 31, the shared feature extractor 32, and the target domain feature extractor 33 each perform feature extraction in their corresponding feature spaces.
Further, in one example, the source domain event and the target domain event have identical attribute fields, such as the attribute ABCDE. In this case, the source domain feature extractor 31, the shared feature extractor 32, and the target domain feature extractor 33 each perform feature extraction in a feature space corresponding to the attribute ABCDE, and only model parameters to be adopted when performing extraction computation may be different.
In another example, as previously described, the source domain event has a partially common attribute with the target domain event, and in addition thereto has a partially unique attribute. Specifically, the source domain event has a first plurality of attributes, such as attribute ABCDE, the target domain event has a second plurality of attributes, such as attribute CDEFG, and there is an intersection between the first plurality of attributes and the second plurality of attributes, such as CDE. In such a case, the source domain feature extractor 31 may perform feature extraction in a first feature space corresponding to the first plurality of attributes; the target domain feature extractor 33 may perform feature extraction in a second feature space corresponding to the second plurality of attributes. The shared feature extractor 32 may perform feature extraction in a shared feature space, where the shared feature space corresponds to a union of the first plurality of attributes and the second plurality of attributes, for example, the union is ABCDEFG.
More specifically, in step 204, when the shared feature extractor 32 is used to perform feature extraction on the source domain samples, the following processing manner may be adopted. First, the attribute value (e.g., ABCDE) of the first multiple-item attribute (e.g., ABCDE) of each source domain event in the source domain sample is filled into the field (e.g., the first 5 fields) corresponding to the first multiple-item attribute in the shared feature space (corresponding to ABCDEFG), and the remaining fields are filled with a default value (e.g., 0), resulting in a first attribute representation (e.g., ABCDE00) of the source domain event in the shared feature space. The first attribute representation is then subjected to a feature extraction operation by the shared feature extractor 32.
Similarly, in step 205, when the shared feature extractor 32 is used to perform feature extraction on the target domain sample, the following processing manner may be adopted. And filling the attribute value (e.g. CDEFG) of the second multiple attributes (e.g. CDEFG) of each target domain event in the target domain sample into the fields (e.g. the last 5 fields) corresponding to the second multiple attributes in the shared feature space (corresponding to ABCDEFG), and filling the rest fields with a default value (e.g. 0) to obtain a second attribute representation (e.g. 00CDEFG) of the target domain event in the shared feature space. The second attribute representation is then subjected to a feature extraction operation by the shared feature extractor 32.
In terms of the processing procedure adopted by the feature extraction operation, in one embodiment, the source domain feature extractor 31, the shared feature extractor 32, and the target domain feature extractor 33 may be feature extractors with different parameters and the same structure, and perform feature extraction by using the same extraction algorithm. For example, the three feature extractors 31,32, and 33 may be implemented by using deep neural networks DNN with the same number of layers and the same algorithm.
More specifically, in one embodiment, the source domain feature extractor 31, the shared feature extractor 32, and the target domain feature extractor 33 employ two-level feature extractors with the same structure for feature extraction.
FIG. 4 shows a schematic structural diagram of a two-layer feature extractor according to one embodiment. As shown in fig. 4, the dual-layer feature extractor includes at least an encoding layer 41, a first embedding layer 42, and a second embedding layer 43.
When a current sample (source domain sample or target domain sample) is input to the two-layer feature extractor shown in FIG. 4, the coding layer 41 is directed to the current event sequence corresponding to the current sample<E1,E2,…,En>Each event E ini(source domain event or target domain event) and encoding a plurality of items of attribute information of the event into a corresponding plurality of encoding vectors.
The encoding process of the attribute information may correspond to a feature space of the feature extractor, that is, encode the attribute information of the input event into a plurality of encoding vectors corresponding to dimensions of the feature space.
For example, when applied to a source domain feature extractor, for each source domain event, the encoding layer 41 encodes attribute information for a first plurality of attributes of the source domain event into a first plurality of encoding vectors; when applied to the target domain feature extractor, the encoding layer 41 encodes, for each target domain event, attribute information for a second plurality of attributes of the target domain event into a second plurality of encoding vectors. When applied to a shared feature extractor, for each event, whether target domain or source domain, the encoding layer 41 encodes attribute information for the event as a plurality of encoding vectors corresponding to dimensions of a shared feature space.
The coding layer 41 may be coded in a variety of coding schemes.
In one embodiment, a mapping table or lookup table is pre-constructed in the encoding layer 41, in which the mapping relationship between different attribute values and encoding vectors under each attribute is recorded. And for each attribute of the input event, searching the mapping table according to the specific attribute value to obtain a corresponding coding vector.
In one embodiment, the encoding layer 41 may encode an item of attribute information into an encoding vector by using a one-hot encoding method. The One-hot encoding method is suitable for attribute information with a plurality of limited attribute values.
For attribute information with unlimited attribute values, such as attribute fields containing text descriptions, in an embodiment, the coding layer 41 may also use a more complex neural network to perform word embedding to obtain a coding vector corresponding to the attribute information.
The coding layer 41 encodes each event E by adopting various coding modesiThe plurality of items of attribute information are encoded into a corresponding plurality of encoding vectors.
Then, the first embedding layer 42 performs a first combination on the plurality of encoding vectors of each event, so as to obtain each event vector corresponding to each event.
In one embodiment, the first combination comprises a weighted linear combination of the plurality of code vectors.
According to one embodiment, the first combining further comprises an inter-vector combining operation of order N involving multiplication of N encoded vectors, where N > -2.
It is understood that the combination of feature vectors in conventional neural networks is generally in a linear combination manner. However, in the case that an event contains multiple items of attribute information, the attribute information is sometimes not completely independent, but has a certain dependency relationship or association relationship, and a simple linear combination is not enough to find and process such an association relationship. Thus, in one embodiment, the framework of an FM (factorization machine) factorizer is used to introduce high-order inter-vector combining operations in the first embedding layer 42.
The combining operation between N-order vectors relates to the multiplication operation of N code vectors, so that the association relation between the N code vectors can be characterized. The order N is a preset hyper-parameter, and may be set to 2, 3, or 4, for example.
For example, in one specific example, the first embedding layer 42 may also combine the 2 rd order vectors and the 3 rd order vectors for event E based on linear combinationiOf each code vector fiThe operation is carried out to obtain an event vector A shown in the following formula (1)i
Figure BDA0002216392000000161
In the above formula (1), the first and second,
Figure BDA0002216392000000162
for a linear combination of the n code vectors,
Figure BDA0002216392000000163
is a 2 nd order inter-vector combining operation, which involves the multiplication of 2 code vectors,
Figure BDA0002216392000000164
is a 3 rd order vector combining operation, which involves the multiplication of 3 code vectors. The multiplication of the encoded vectors in the higher order operation may be performed by bit-wise multiplication, and the result is still a vector. Also, it should be understood that the weighting factors of the terms in the above formula (1) include the linear weighting factor wi2-order weight coefficient wij3-order weight coefficient wijkAre determined by training of the neural network.
In one embodiment, the first vector combination shown in the above formula (1), for example, may be modified, for example, linear combination terms are omitted, or partial higher-order inter-vector combination terms are omitted, so as to obtain more transformation implementations of the first vector combination.
At the first embedding layer 42 for a sequence of events<E1,E2,…,En>In each event EiGet the corresponding event vector AiBased on the event vectors A, the second embedding layer 43 maps the respective event vectors AiAnd carrying out second combination to obtain the characteristic representation corresponding to the current event sequence.
In one embodiment, the second embedding layer 43 employs a linear combination for each event vector A1,A2,…,AnAnd performing linear weighted superposition to obtain a feature representation Y corresponding to the current event sequence.
In another embodiment, the second embedding layer 43 employs a timing-based neural network to perform timing processing on each event vector. Specifically, the timing-based neural network may be a recurrent neural network RNN or a long-short term memory neural network LSTM. Thus, the above-described individual event vectors may be vectorizedA1,A2,…,AnAnd sequentially inputting the RNN or the LSTM according to the sequence of the occurrence time of the events. The RNN or LSTM neural network then iteratively processes the vector sequence of the plurality of event vectors in turn, resulting in a feature representation Y of the event sequence. More specifically, the RNN or LSTM may have finished processing the last event vector AnAnd (4) taking the obtained implicit vector as a characteristic representation of the sequence.
In yet another embodiment, the second embedding layer 43 may also apply to the input multiple event vectors A1,A2,…,AnAnd performing combination operation including linear combination and high-order vector inter-vector combination, thereby obtaining the feature representation Y of the current event sequence. In particular, the order M of the high-order combining operation in the second embedding layer 43 may be preset, where M>The order M and the order N in the first embedding layer 42 are independent superparameters, and may be the same or different from each other 2. Thus, in the second embedding layer 43, a plurality of event vectors a that can be output to the first embedding layer 421,A2,…,AnPerforming linear combination operation and performing combination operation between vectors within M orders to obtain sequence feature representation Y based on summation of each combination operation. The specific operation process is similar to the foregoing description of the first embedding layer 42, and is not repeated here.
In this way, according to the embodiment shown in fig. 4, the second embedding layer 43 directly combines the event vectors corresponding to the events in the input event sequence to obtain the sequence feature representation Y.
Fig. 5 shows a schematic structural diagram of a two-layer feature extractor according to another embodiment. The encoder layer 51 and the first embedding layer 52 in fig. 5 correspond to those shown in fig. 4, except that the second embedding layer 53 is processed in a different manner from that shown in fig. 4. In fig. 5, the second embedding layer 53 is for the last event E in the sequence of eventsnSpecial treatment is carried out. This is because, when an event prediction model is used to evaluate an event, the event to be evaluated and previous historical events are input into the model to be evaluated in a sequence, and thus the event to be evaluated is the last event in the input sequence. Accordingly, in the training samples used for model training, the class labels tend to beIs labeled for the last event in the sequence of events. Thus, the last event in the sequence of events, either as an object to be evaluated or as an object of annotation, has different properties than the other events.
In view of the above, in the embodiment of fig. 5, the second embedding layer 53 first corresponds to the event vector a corresponding to the other events except the last event in the event sequence1,A2,…,An-1And carrying out third combination to obtain a combination vector. The third combination may be combined in the same manner as the second combination described in connection with the second embedding layer of fig. 4. Then, the event vector A corresponding to the last event is usednAnd fourthly combining the combined vector with the combined vector to obtain the feature representation Y of the final event sequence. Wherein the fourth combination may be a linear weighted combination or a direct concatenation.
The above describes the feature extraction process of the source domain feature extractor 31, the shared feature extractor 32 and the target domain feature extractor 33 in fig. 3 by taking the two-layer feature extractor of fig. 4 and 5 as an example. It is understood that the source domain feature extractor 31, the shared feature extractor 32 and the target domain feature extractor 33 may also perform feature extraction in other manners, but the three extractors are required to be feature extractors with the same structure and algorithm.
Thus, as shown in step 204 in fig. 2, when the source domain sample is input to the source domain feature extractor 31 and the shared feature extractor 32, the source domain feature extractor 31 performs feature extraction on the source domain event sequence corresponding to the source domain sample to obtain a source domain feature representation Ys; the shared feature extractor 32 also performs feature extraction on the source domain event sequence to obtain a first feature representation Y1. Then, the source domain attention layer 34 obtains a sample feature vector V of the source domain samples from the source domain feature representation Ys and the first feature representation Y1. Specifically, the source domain attention layer 34 may perform a weighted combination of the source domain feature representation Ys and the first feature representation Y1 by using a first weight distribution factor, so as to obtain a sample feature vector V, where the first weight distribution factor may be preset or may be determined through training. In other embodiments, the source domain attention layer 34 may also combine the source domain feature representation Ys and the first feature representation Y1 in other manners, such as stitching, linear transformation, and the like, to obtain the sample feature vector V of the source domain sample.
On the other hand, as shown in step 205 in fig. 2, when the target domain sample is input into the target domain feature extractor 33 and the shared feature extractor 32, the target domain feature extractor 33 performs feature extraction on the target domain event sequence corresponding to the target domain sample to obtain a target domain feature representation Yt; the shared feature extractor 32 also performs feature extraction on the target domain event sequence to obtain a second feature representation Y2. Then, the target domain attention layer 35 obtains a sample feature vector V of the target domain sample according to the target domain feature representation Yt and the second feature representation Y2. Similarly, the target domain attention layer 35 may perform a weighted combination of the source domain feature representation Yt and the second feature representation Y2 by using a second weight distribution factor to obtain the sample feature vector V, where the second weight distribution factor may be preset or determined through training. Alternatively, the target domain attention layer 35 may also combine the target domain feature representation Yt and the second feature representation Y2 by other ways to obtain a sample feature vector V of the target domain sample.
Thus, for the current sample input to the event prediction model, the sample feature vector V of the current sample is obtained through the source domain feature extractor 31, the shared feature extractor 32, the target domain feature extractor 33, and the corresponding attention layer.
With continued reference to fig. 2 and 3. Next, in step 206, the sample feature vector V is input to the classifier 36 in the event prediction model. The classifier 36 predicts the event type of the current sample according to the sample feature vector V to obtain a prediction result.
In particular, the classifier 36 may further process the sample feature vectors using a multi-level perceptron (MLP), and finally apply a functional operation such as softmax to obtain a prediction result for the current sample. The prediction result may be embodied as a predicted classification category or a probability that the current sample belongs to each classification.
The processing of steps 203 to 206 described above may be employed for each sample in the training sample set. Thus, the prediction results of the respective samples can be obtained.
Then, in step 207, a classification loss is determined based on the prediction results for each sample and the corresponding classification label. In particular, classification loss may be determined from the alignment of the prediction results and the classification labels using various forms of impairment functions, such as cross entropy, L2 error, and the like. The classification Loss can be denoted as C Loss.
In addition, at step 208, a Domain Adaptation (Domain Adaptation) Loss, denoted as DA Loss, is determined based on the first characterization of each source Domain sample at a particular network layer of the event prediction model and the second characterization of each target Domain sample at the particular network layer.
The domain adaptation Loss, DA Loss, can be used as a measure of the difference in the characterization of samples from different domains in the model. In one embodiment, a particular network layer, such as a source domain/target domain attention layer, or a certain layer in a classifier, may be selected in the event prediction model to obtain the source domain samples and the target domain samples. In particular, in one example, the tokens may be vector tokens. In this way, a first representation of a certain number of source domain samples in the network layer forms a first matrix, a second representation of the same number of target domain samples in the network layer forms a second matrix, and the domain adaptive Loss DA Loss may be determined according to a similarity or distance between the first matrix and the second matrix. Thus, when the event prediction model is trained with loss reduction as a target, the source domain samples and the target domain samples will be made to obtain similar characterization.
Further, in one embodiment, the characterization metric of the samples is refined based on the event category such that the source domain samples and the target domain samples of the same event category have similar characterizations. That is, in step 208, a domain adaptation loss is determined based on a difference in distribution between a first characterization of each source domain sample having each class label at a particular network layer and a second characterization of each target domain sample having a corresponding class label at the particular network layer.
More specifically, in one embodiment, the homogeneous distance may be defined according to the characterization difference between the source domain sample and the target domain sample in the same event category, and the domain adaptation Loss DA Loss may be determined based on the homogeneous distance.
As previously mentioned, each sample in the training sample set has a class label, such as a label for a fraudulent or non-fraudulent event (binary class), or a label for a risk level of an event (multi-class is possible). For any first classification label c1, a first characterization at a particular network layer of each source domain sample i with the label c1 can be obtained
Figure BDA0002216392000000211
A second characterization of each target domain sample j at the particular network layer is also obtained, also having the classification label c1
Figure BDA0002216392000000212
The homogeneous distance corresponding to the first classification c1 can then be determined according to the distribution difference of the first and second characterizations.
In a specific example, the homogeneous distance d (c1) corresponding to the classification c1 can be defined as:
Figure BDA0002216392000000213
wherein, the superscript s represents the source domain and t represents the target domain;
Figure BDA0002216392000000214
is the number of source domain samples with classification label c1,
Figure BDA0002216392000000215
is that
Figure BDA0002216392000000216
A first characterization of each source domain sample i;
Figure BDA0002216392000000217
is the number of target domain samples having classification label c1,
Figure BDA0002216392000000218
is that
Figure BDA0002216392000000219
Of the respective target domain sample j.
In a specific example, the specific network layer is a predictor output layer in the classifier, such as a softmax layer. In this case, the first and second representations are predicted values, for example, probability values belonging to corresponding event categories.
In this case, in the formula (2),
Figure BDA00022163920000002110
a mean value representing the respective predicted values of the respective source domain samples i with the first classification label c1, referred to as a first mean value;
Figure BDA00022163920000002111
the mean value representing the predicted value of each target domain sample j with the label c1 is referred to as the second mean value. The homogeneous distance d (c1) for the first classification c1 is the absolute value of the difference between the first mean and the second mean.
In another specific example, the specific network layer is an intermediate layer in the event prediction model, for example, the attention layer 34/35, or a certain layer in the classifier; the intermediate layer outputs a vector representation. In such a case, the first and second representations are both vector representations, referred to as first vector representation and second vector representation.
Accordingly, according to the formula (2),
Figure BDA00022163920000002112
a first average vector representing respective first vector representations of respective source domain samples i having a first class label c 1;
Figure BDA00022163920000002113
each second vector characterization representing each target domain sample j with the label c1The second average vector of (2). The homogeneous distance d (c1) for the first classification c1 is the norm distance of the first average vector and the second average vector.
Thus, the same-class distance reflects the characterization difference between the source domain sample and the target domain sample under the same classification category. On the basis of thus defining the homogeneous distances of a certain class, the domain adaptation loss can be determined as the sum of the homogeneous distances corresponding to the classes.
For example, in one example, the domain adaptation loss DA loss is determined according to the following equation (3):
DA Loss=∑cid(ci) (3)
thus, when the event prediction model is trained with loss reduction as a target, the similar distance is reduced, that is, under the same classification category, the source domain samples and the target domain samples are more similar in characterization, so that the distance of feature representation of the two domain samples under the same classification is shortened.
In one embodiment, on the basis of the above, the inter-class distance is further defined.
Specifically, for any first classification label c1, a first characterization of each source domain sample i with the label c1 at a certain network layer can be similarly obtained; in addition, a third characterization of each target domain sample j at the particular network layer with another classification label c2 is also obtained. The inter-class distance between the first class c1 and the second class c2 may then be determined from the distribution difference of the first and third characterization described above.
In a specific example, the inter-class distance d (c1, c2) for class c1 and class c2 may be defined as:
Figure BDA0002216392000000221
wherein the content of the first and second substances,
Figure BDA0002216392000000222
is the number of source domain samples with classification label c1,
Figure BDA0002216392000000223
is that
Figure BDA0002216392000000224
A first characterization of each source domain sample i;
Figure BDA0002216392000000225
is the number of target domain samples having classification label c2,
Figure BDA0002216392000000226
is that
Figure BDA0002216392000000227
Of the respective target domain sample j.
It should be noted that, in the formula (4), the inter-class distance d (c1, c2) is the sample feature distance of the two classes c1 and c2 when the source domain sample belongs to the first class c1 and the target domain sample belongs to the second class c 2; if c1 and c2 are swapped, different distance values may be obtained, i.e., d (c2, c1) may be different from d (c1, c 2).
Accordingly, in one embodiment, the domain adaptation loss may be determined as being inversely proportional to the sum of the inter-class distances between the various classes.
For example, in one example, the domain adaptation loss DA loss is determined according to the following equation (5):
Figure BDA0002216392000000231
according to the above formula (5), the domain adaptation loss is proportional to the sum of the distances of the same class and inversely proportional to the sum of the distances between classes of different classes. Thus, when the event prediction model is trained with the loss reduction as a target, the distance between the same class is reduced, and the distance between the classes is increased, that is, the feature representations of the two domain samples under the same classification are more similar, and the feature representations of the two domain samples under different classifications are far away from each other. Therefore, inter-domain transfer learning is carried out on the fine granularity of different classifications.
The classification loss is determined in step 207 and the domain adaptation loss is determined in step 208, whereupon the total loss is determined in the following step 209 based on the classification loss and the domain adaptation loss. In one embodiment, respective weights α and β are set for the classification loss and the domain adaptation loss, so that the total loss L can be expressed as:
L=αL1+βL2 (6)
wherein, L1 is the classification Loss, and L2 is the domain adaptation Loss DA Loss.
Thus, at step 210, the event prediction model is updated in the direction of the total loss reduction. Specifically, the model parameters of each module in the event prediction model can be adjusted by adopting modes such as back propagation, gradient descent and the like, so that the event prediction model is trained and updated.
In summary, based on the training process shown in fig. 2 and the network structure shown in fig. 3, under the condition of fewer target domain samples, a transfer learning manner can be utilized, and differentiated unified training can be performed by utilizing the source domain samples with rich sample data. Specifically, the event prediction model includes an active domain model part, a shared model part and a target domain model part. In the training process, because the source domain samples are rich, the source domain model part can quickly establish applicable model parameters. The shared model part processes both the source domain samples and the target domain samples, and model parameters obtained by training the source domain data can be transferred to the target domain part. In combination with the domain adaptation loss set for the source domain and the target domain characterization, the target domain model portion can be made to obtain a feature characterization similar to the source domain, so that a model suitable for both the source domain and the target domain is obtained by training based on a small amount of target domain data.
On the basis of training to obtain an event prediction model, the event prediction model can be used for evaluating and predicting the event of the target domain.
FIG. 6 illustrates a method of evaluating a user-operated event according to one embodiment. It is to be appreciated that the method can be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities. As shown in fig. 6, the method includes at least the following steps.
At step 61, a first sequence of events of the target domain is obtained, the first sequence of events comprising a current operational event to be evaluated and at least one historical operational event. In general, at least one historical operating event may be obtained by tracing back a predetermined number of events or events within a predetermined time forward for a current operating event to be evaluated. And arranging the historical operation events and the current operation events according to a time sequence to obtain the first event sequence. Here, the first event sequence is an event sequence of the target domain, that is, each operation event therein, including the current operation event and the historical operation event, is a target domain event.
In step 62, on the other hand, an event prediction model trained according to the method of FIG. 2 is obtained. As shown in fig. 3, the event prediction model at least includes a trained source domain feature extractor, a target domain feature extractor, a shared feature extractor, and a classifier.
Then, in step 63, a target domain feature extractor is used to perform feature extraction on the first event sequence to obtain a target domain feature representation; and a shared feature extractor is adopted to extract the features of the first event sequence to obtain shared feature representation. The structure and feature extraction manner of each feature extractor refer to the foregoing description of fig. 4 and 5, and are not repeated.
Then, in step 64, a sequence feature vector of the first event sequence is obtained according to the target domain feature representation and the shared feature representation. Specifically, in an embodiment, the target domain feature representation obtained by the target domain feature extractor and the shared feature representation obtained by the shared feature extractor may be weighted and combined by using a weight distribution factor to obtain a sequence feature vector. In one example, the weight distribution factor may be a preset hyper-parameter. In another example, the weight assignment factor may be a model parameter in the target domain attention layer in fig. 3, determined through a training process on the model of fig. 3.
Upon obtaining the sequence feature vector of the first event sequence, in step 65, the event class of the current operation event in the first event sequence is predicted based on the sequence feature vector by using a classifier in an event prediction model. In this way, the evaluation of the event category of the current operation event is realized.
For example, the current operational event may be an event that occurs at the target domain requesting a transfer. The event may be evaluated, for example, whether the event is a fraud (cash-out) event or the risk level of the event, by the evaluation process of fig. 6. In this manner, subsequent protection decisions may be made with respect to the current operational event, such as whether to allow the transfer, whether to back up the transfer event, and so forth.
In one embodiment, the relevant weight parameters in the event prediction model can also be output, so that the evaluation result of the event category has stronger interpretability. For example, in the case where the feature extractor adopts two-layer feature extraction, the weights of the respective attributes in the event and the weights of the respective events in the sequence in obtaining the above-described sequence feature vector may be output. Furthermore, the weight assignment factor used in step 64, i.e. the relative weights of the target domain feature extractor and the shared feature extractor, may also be output. In this way, it is convenient for the analyst to understand the influence and effect of the event evaluation result in step 65 and the factors of different levels, including the domain, the event, and the attribute.
FIG. 6 above illustrates a process for predicting a current operational event of a target domain using a trained event prediction model. It can be seen that in this process, only the target domain feature extractor and the shared feature extractor are used, and the source domain feature extractor is not necessarily used. The source domain feature extractor is mainly used for helping the target domain feature extractor to establish applicable model parameters more quickly in the training process.
Of course, the above event prediction model may also be used to predict the event to be evaluated in the source domain. In such a case, similar to the case of training, a source domain event sequence including a source domain event to be evaluated may be input to the source domain feature extractor and the shared feature extractor without using the target domain feature extractor, and finally, the event category may be predicted by the classifier.
By combining the above, through the training process shown in fig. 2, an event prediction model can be obtained through training based on a small number of target domain samples, and the event prediction model can accurately evaluate and predict both the event to be evaluated in the source domain and the event to be evaluated in the target domain.
According to an embodiment of another aspect, an apparatus for training an event prediction model is provided, which may be deployed in any device, platform or cluster of devices having computing and processing capabilities. FIG. 7 shows a schematic block diagram of an apparatus to train an event prediction model according to one embodiment. As shown in fig. 7, the training apparatus 700 includes:
a sample set obtaining unit 71, configured to obtain a training sample set, where the training sample set includes a first number of source domain samples and a second number of target domain samples, the first number is greater than the second number, and each sample has a corresponding classification label; each source domain sample comprises a source domain event sequence formed by a plurality of source domain events, and each target domain sample comprises a target domain event sequence formed by a plurality of target domain events;
a processing unit 72 configured to input the respective samples as current samples into an event prediction model, the event prediction model comprising at least a source domain feature extractor, a target domain feature extractor, a shared feature extractor, and a classifier, wherein,
when the current sample is a source domain sample, performing feature extraction on the source domain sample by using the source domain feature extractor to obtain a source domain feature representation; performing feature extraction on the source domain sample by using the shared feature extractor to obtain a first feature representation; obtaining a sample feature vector of a source domain sample according to the source domain feature representation and the first feature representation;
when the current sample is a target domain sample, performing feature extraction on the target domain sample by using the target domain feature extractor to obtain target domain feature representation; performing feature extraction on the target domain sample by using the shared feature extractor to obtain a second feature representation; obtaining a sample feature vector of the target domain sample according to the target domain feature representation and the second feature representation;
a predicting unit 73 configured to predict, by using the classifier, an event class of the current sample based on the sample feature vector of the current sample, and obtain a prediction result;
a first loss determining unit 74 configured to determine a classification loss according to the prediction result of each sample and the corresponding classification label;
a second loss determination unit 75 configured to determine a domain adaptation loss according to a first characterization of each source domain sample in a specific network layer of the event prediction model and a second characterization of each target domain sample in the specific network layer;
a total loss determination unit 76 configured to determine a total loss based on the classification loss and the domain adaptation loss;
an updating unit 77 configured to update the event prediction model in a direction of the total loss reduction.
In one embodiment, each source domain event has a first plurality of attributes; each target domain event having a second plurality of attributes, wherein there is an intersection of the first plurality of attributes and the second plurality of attributes;
in such a case, the source domain feature extractor is configured to perform feature extraction in a first feature space corresponding to the first plurality of attributes; the target domain feature extractor is used for extracting features in a second feature space corresponding to the second multiple attributes;
and the shared feature extractor is used for extracting features in a shared feature space, wherein the shared feature space corresponds to a union of the first multiple attributes and the second multiple attributes.
In a further embodiment, the feature extraction process of the shared feature extractor on the source domain samples may include:
filling attribute values of a first plurality of attributes of each source domain event in the source domain sample into fields corresponding to the first plurality of attributes in the shared feature space, filling other fields with default values to obtain a first attribute representation of the source domain event in the shared feature space, and extracting features according to the first attribute representation;
the process of the shared feature extractor for extracting the features of the target domain samples may include:
and filling the attribute values of the second multiple attributes of each target domain event in the target domain sample into fields corresponding to the second multiple attributes in the shared feature space, filling the other fields with default values to obtain a second attribute representation of the target domain event in the shared feature space, and extracting features according to the second attribute representation.
According to one implementation mode, the source domain feature extractor, the target domain feature extractor and the shared feature extractor are double-layer feature extractors with different parameters and the same structure, and each double-layer feature extractor comprises a coding layer, a first embedding layer and a second embedding layer; wherein the content of the first and second substances,
the encoding layer is used for encoding a plurality of items of attribute information of each event in a current event sequence corresponding to an input current sample into a plurality of corresponding encoding vectors;
the first embedding layer is used for carrying out first combination on the plurality of coding vectors of each event to obtain each event vector corresponding to each event;
and the second embedded layer is used for carrying out second combination on the event vectors to obtain the feature representation corresponding to the current event sequence.
Further, in one embodiment, the first combining by the first embedding layer comprises an inter-vector combining operation of order N involving multiplication of N encoded vectors, where N > 2.
In an embodiment, the second embedding layer includes a time-series-based neural network, and is configured to sequentially perform iterative processing on the event vectors to obtain the feature representation corresponding to the current event sequence.
In another embodiment, the second combination employed by the second embedding layer comprises an M-th order inter-vector combining operation involving multiplication of M event vectors, where M > -2.
According to one embodiment, in the processing unit 72, the sample feature vector of the source domain samples is obtained by: carrying out weighted combination on the source domain feature representation and the first feature representation by utilizing a first weight distribution factor to obtain a sample feature vector of the source domain sample;
obtaining a sample feature vector of the target domain sample by the following method: and carrying out weighted combination on the target domain feature representation and the second feature representation by using a second weight distribution factor to obtain a sample feature vector of the target domain sample.
In one embodiment, the second loss determination unit 75 is configured to:
and determining the domain adaptation loss according to the distribution difference between the first characterization of each source domain sample with each classification label in the specific network layer and the second characterization of each target domain sample with the corresponding classification label in the specific network layer.
Further, in one embodiment, the second loss determination unit 75 determines the domain adaptation loss as follows:
obtaining a first representation of each source domain sample with any first classification label at the specific network layer;
obtaining a second representation of each target domain sample with the first classification label at the specific network layer;
determining the same-class distance corresponding to the first classification according to the distribution difference of the first characterization and the second characterization;
and determining the domain adaptation loss as being proportional to the sum of the homogeneous distances corresponding to the various classes.
Further, in an example, the specific network layer is a predicted value output layer in the classifier, the first characteristic is a first predicted value, and the second characteristic is a second predicted value; in such a case, the second loss determination unit 75 determines the homogeneous distance corresponding to the first classification as follows:
determining a first mean value of the first predicted values of the source domain samples with the first classification labels;
determining a second mean value of the second predicted values of the target domain samples with the first classification label;
and determining the same-class distance corresponding to the first classification according to the difference between the first average value and the second average value.
In another example, the first token is a first vector token; the second characterization is a second vector characterization; in such a case, the second loss determination unit 75 determines the homogeneous distance corresponding to the first classification as follows:
determining a first average vector of respective first vector representations of respective source domain samples having a first class label;
determining a second average vector of respective second vector representations of respective target domain samples having the first classification label;
and determining the same-class distance corresponding to the first classification according to the norm distance of the first average vector and the second average vector.
In one embodiment, the second loss determination unit 75 is further configured to:
obtaining a third representation of each target domain sample at the particular network layer with a second class label, the second class label being different from the first class label;
determining an inter-class distance between the first class and the second class according to the distribution difference of the first characterization and the third characterization;
the domain adaptation loss is determined to be inversely proportional to a sum of the inter-class distances between the respective different classes.
According to an embodiment of yet another aspect, an apparatus for evaluating an operational event is provided, which may be deployed in any computing, processing capable device, platform, or cluster of devices. FIG. 8 shows a schematic block diagram of an apparatus to evaluate an operational event according to one embodiment. As shown in fig. 8, the evaluation device 800 includes:
an event sequence acquiring unit 81 configured to acquire a first event sequence, where the first event sequence includes a current operation event to be evaluated and at least one historical operation event, and each operation event is a target domain event;
a model obtaining unit 82 configured to obtain an event prediction model obtained by training according to the apparatus of fig. 7, wherein the event prediction model includes a trained source domain feature extractor, a trained target domain feature extractor, a trained shared feature extractor, and a trained classifier;
a feature extraction unit 83 configured to perform feature extraction on the first event sequence by using the target domain feature extractor to obtain a target domain feature representation; extracting the characteristics of the first event sequence by adopting the shared characteristic extractor to obtain shared characteristic representation;
a vector obtaining unit 84, configured to obtain a sequence feature vector of the first event sequence according to the target domain feature representation and the shared feature representation;
a prediction unit 85 configured to predict, by using the classifier, an event class of a current operation event in the first event sequence based on the sequence feature vector.
In one embodiment, the vector obtaining unit 84 is specifically configured to: carrying out weighted combination on the target domain feature representation and the shared feature representation by using a weight distribution factor to obtain the sequence feature vector;
the apparatus 800 may further include a weight output unit (not shown) configured to output the weight distribution factor to indicate an influence of the target domain feature extractor and the shared feature extractor on the prediction result.
Further, in one embodiment, the weight distribution factor is determined by a training process of the event prediction model.
By the device, the event prediction model which is simultaneously suitable for the source domain and the target domain is obtained by training by using the source domain sample data with abundant data and the target domain sample data with relatively rare data, and the event to be evaluated of the target domain can be accurately and effectively evaluated by using the event prediction model.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2 and 6.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method described in connection with fig. 2 and 6.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (20)

1. A method of training an event prediction model, the method comprising:
obtaining a training sample set, wherein the training sample set comprises a first number of source domain samples and a second number of target domain samples, the first number is larger than the second number, and each sample has a corresponding classification label; each source domain sample comprises a source domain event sequence formed by a plurality of source domain events, and each target domain sample comprises a target domain event sequence formed by a plurality of target domain events; the source domain event and the target domain event are both user operation events;
inputting each sample as a current sample into an event prediction model, wherein the event prediction model at least comprises a source domain feature extractor, a target domain feature extractor, a shared feature extractor and a classifier,
when the current sample is a source domain sample, performing feature extraction on the source domain sample by using the source domain feature extractor to obtain a source domain feature representation; performing feature extraction on the source domain sample by using the shared feature extractor to obtain a first feature representation; obtaining a sample feature vector of a source domain sample according to the source domain feature representation and the first feature representation;
when the current sample is a target domain sample, performing feature extraction on the target domain sample by using the target domain feature extractor to obtain target domain feature representation; performing feature extraction on the target domain sample by using the shared feature extractor to obtain a second feature representation; obtaining a sample feature vector of the target domain sample according to the target domain feature representation and the second feature representation;
predicting the event category of the current sample by using the classifier based on the sample feature vector of the current sample to obtain a prediction result;
determining classification loss according to the prediction result of each sample and the corresponding classification label;
determining a domain adaptation loss according to a first characterization of each source domain sample in a specific network layer of the event prediction model and a second characterization of each target domain sample in the specific network layer;
determining a total loss according to the classification loss and the domain adaptive loss;
updating the event prediction model in the direction of the total loss reduction.
2. The method of claim 1, wherein the source domain event has a first plurality of attributes; the target domain event has a second plurality of attributes, wherein there is an intersection of the first plurality of attributes and the second plurality of attributes;
the source domain feature extractor is used for extracting features in a first feature space corresponding to the first plurality of attributes;
the target domain feature extractor is used for extracting features in a second feature space corresponding to the second multiple attributes;
the shared feature extractor is configured to perform feature extraction in a shared feature space, where the shared feature space corresponds to a union of the first plurality of attributes and the second plurality of attributes.
3. The method of claim 2, wherein,
performing feature extraction on the source domain sample by using the shared feature extractor to obtain a first feature representation, including:
filling attribute values of a first plurality of attributes of each source domain event in the source domain sample into fields corresponding to the first plurality of attributes in the shared feature space, filling other fields with default values to obtain a first attribute representation of the source domain event in the shared feature space, and extracting features according to the first attribute representation;
performing feature extraction on the target domain sample by using the shared feature extractor to obtain a second feature representation, including:
and filling the attribute values of the second multiple attributes of each target domain event in the target domain sample into fields corresponding to the second multiple attributes in the shared feature space, filling the other fields with default values to obtain a second attribute representation of the target domain event in the shared feature space, and extracting features according to the second attribute representation.
4. The method of claim 1, wherein the source domain feature extractor, the target domain feature extractor and the shared feature extractor are parametric, structurally identical, two-layer feature extractors, the two-layer feature extractors comprising an encoding layer, a first embedding layer and a second embedding layer; wherein the content of the first and second substances,
the encoding layer is used for encoding a plurality of items of attribute information of each event in a current event sequence corresponding to an input current sample into a plurality of corresponding encoding vectors;
the first embedding layer is used for carrying out first combination on the plurality of coding vectors of each event to obtain each event vector corresponding to each event;
and the second embedded layer is used for carrying out second combination on the event vectors to obtain the feature representation corresponding to the current event sequence.
5. The method of claim 4, wherein the first combining comprises an inter-vector combining operation of order N involving multiplication of N encoded vectors, where N > -2.
6. The method of claim 4, wherein the second embedding layer comprises a time-series-based neural network for iteratively processing the event vectors in sequence to obtain the feature representation corresponding to the current event series.
7. The method of claim 4, wherein the second combining comprises an M-th order inter-vector combining operation involving multiplication of M event vectors, where M > -2.
8. The method of claim 1, wherein,
obtaining a sample feature vector of a source domain sample according to the source domain feature representation and the first feature representation, wherein the step of performing weighted combination on the source domain feature representation and the first feature representation by using a first weight distribution factor to obtain the sample feature vector of the source domain sample;
obtaining a sample feature vector of a target domain sample according to the target domain feature representation and the second feature representation comprises performing weighted combination on the target domain feature representation and the second feature representation by using a second weight distribution factor to obtain the sample feature vector of the target domain sample.
9. The method of claim 1, wherein determining a domain adaptation loss based on a first characterization of each source domain sample at a particular network layer of the event prediction model and a second characterization of each target domain sample at the particular network layer comprises:
and determining the domain adaptation loss according to the distribution difference between the first characterization of each source domain sample with each classification label in the specific network layer and the second characterization of each target domain sample with the corresponding classification label in the specific network layer.
10. The method of claim 9, wherein the determining a domain adaptation loss comprises:
obtaining a first representation of each source domain sample with any first classification label at the specific network layer;
obtaining a second representation of each target domain sample with the first classification label at the specific network layer;
determining the same-class distance corresponding to the first classification according to the distribution difference of the first characterization and the second characterization;
and determining the domain adaptation loss as being proportional to the sum of the homogeneous distances corresponding to the various classes.
11. The method of claim 10, wherein the particular network layer is a predictor output layer in the classifier, the first characterization is a first predictor, and the second characterization is a second predictor;
according to the distribution difference of the first characterization and the second characterization, determining the same-class distance corresponding to the first classification, including:
determining a first mean value of the first predicted values of the source domain samples with the first classification labels;
determining a second mean value of the second predicted values of the target domain samples with the first classification label;
and determining the same-class distance corresponding to the first classification according to the difference between the first average value and the second average value.
12. The method of claim 10, wherein the first characterization is a first vector characterization; the second characterization is a second vector characterization;
according to the distribution difference of the first characterization and the second characterization, determining the same-class distance corresponding to the first classification, including:
determining a first average vector of respective first vector representations of respective source domain samples having a first class label;
determining a second average vector of respective second vector representations of respective target domain samples having the first classification label;
and determining the same-class distance corresponding to the first classification according to the norm distance of the first average vector and the second average vector.
13. The method of claim 10, wherein the determining a domain adaptation loss further comprises:
obtaining a third representation of each target domain sample at the particular network layer with a second class label, the second class label being different from the first class label;
determining an inter-class distance between the first class and the second class according to the distribution difference of the first characterization and the third characterization;
the domain adaptation loss is determined to be inversely proportional to a sum of the inter-class distances between the respective different classes.
14. A method of evaluating a user-operated event, the method comprising:
acquiring a first event sequence, wherein the first event sequence comprises a current operation event to be evaluated and at least one historical operation event, and each operation event is a target domain event;
obtaining an event prediction model trained according to the method of claim 1, wherein the event prediction model comprises a trained source domain feature extractor, a trained target domain feature extractor, a trained shared feature extractor and a trained classifier;
performing feature extraction on the first event sequence by adopting the target domain feature extractor to obtain a target domain feature representation; extracting the characteristics of the first event sequence by adopting the shared characteristic extractor to obtain shared characteristic representation;
obtaining a sequence feature vector of the first event sequence according to the target domain feature representation and the shared feature representation;
predicting, with the classifier, an event class of a current operational event in the first sequence of events based on the sequence feature vector.
15. The method of claim 14, wherein deriving a sequence feature vector for the first sequence of events from the target domain feature representation and the shared feature representation comprises:
carrying out weighted combination on the target domain feature representation and the shared feature representation by using a weight distribution factor to obtain the sequence feature vector;
the method further includes outputting the weight assignment factor to indicate an impact of the target domain feature extractor and the shared feature extractor on the prediction result.
16. The method of claim 15, wherein the weight assignment factor is determined by a training process of the event prediction model.
17. An apparatus to train an event prediction model, the apparatus comprising:
a sample set obtaining unit configured to obtain a training sample set, wherein the training sample set includes a first number of source domain samples and a second number of target domain samples, the first number is greater than the second number, and each sample has a corresponding classification label; each source domain sample comprises a source domain event sequence formed by a plurality of source domain events, and each target domain sample comprises a target domain event sequence formed by a plurality of target domain events; the source domain event and the target domain event are both user operation events;
a processing unit configured to input, as a current sample, each sample into an event prediction model including at least a source domain feature extractor, a target domain feature extractor, a shared feature extractor, and a classifier,
when the current sample is a source domain sample, performing feature extraction on the source domain sample by using the source domain feature extractor to obtain a source domain feature representation; performing feature extraction on the source domain sample by using the shared feature extractor to obtain a first feature representation; obtaining a sample feature vector of a source domain sample according to the source domain feature representation and the first feature representation;
when the current sample is a target domain sample, performing feature extraction on the target domain sample by using the target domain feature extractor to obtain target domain feature representation; performing feature extraction on the target domain sample by using the shared feature extractor to obtain a second feature representation; obtaining a sample feature vector of the target domain sample according to the target domain feature representation and the second feature representation;
the prediction unit is configured to predict the event category of the current sample based on the sample feature vector of the current sample by using the classifier to obtain a prediction result;
a first loss determination unit configured to determine a classification loss according to the prediction result of each sample and the corresponding classification label;
a second loss determination unit configured to determine a domain adaptation loss according to a first characterization of each source domain sample in a specific network layer of the event prediction model and a second characterization of each target domain sample in the specific network layer;
a total loss determination unit configured to determine a total loss according to the classification loss and the domain adaptation loss;
an updating unit configured to update the event prediction model in a direction in which the total loss decreases.
18. An apparatus for evaluating a user-operated event, the apparatus comprising:
the event sequence acquiring unit is configured to acquire a first event sequence, wherein the first event sequence comprises a current operation event to be evaluated and at least one historical operation event, and each operation event is a target domain event;
a model obtaining unit configured to obtain an event prediction model trained according to the apparatus of claim 17, wherein the event prediction model comprises a trained source domain feature extractor, a trained target domain feature extractor, a trained shared feature extractor, and a trained classifier;
the feature extraction unit is configured to perform feature extraction on the first event sequence by using the target domain feature extractor to obtain a target domain feature representation; extracting the characteristics of the first event sequence by adopting the shared characteristic extractor to obtain shared characteristic representation;
the vector acquisition unit is configured to obtain a sequence feature vector of the first event sequence according to the target domain feature representation and the shared feature representation;
a prediction unit configured to predict an event class of a current operation event in the first sequence of events based on the sequence feature vector using the classifier.
19. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-16.
20. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any of claims 1-16.
CN201910916976.6A 2019-09-26 2019-09-26 Training event prediction model, and method and device for evaluating operation event Active CN110659744B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910916976.6A CN110659744B (en) 2019-09-26 2019-09-26 Training event prediction model, and method and device for evaluating operation event

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910916976.6A CN110659744B (en) 2019-09-26 2019-09-26 Training event prediction model, and method and device for evaluating operation event

Publications (2)

Publication Number Publication Date
CN110659744A CN110659744A (en) 2020-01-07
CN110659744B true CN110659744B (en) 2021-06-04

Family

ID=69039262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910916976.6A Active CN110659744B (en) 2019-09-26 2019-09-26 Training event prediction model, and method and device for evaluating operation event

Country Status (1)

Country Link
CN (1) CN110659744B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11776323B2 (en) 2022-02-15 2023-10-03 Ford Global Technologies, Llc Biometric task network

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111275350B (en) * 2020-02-08 2021-06-04 支付宝(杭州)信息技术有限公司 Method and device for updating event evaluation model
CN111511016B (en) * 2020-04-09 2021-10-08 腾讯科技(深圳)有限公司 Method, device, server and storage medium for determining homologous wireless network
CN112036169B (en) * 2020-09-02 2023-06-20 深圳前海微众银行股份有限公司 Event recognition model optimization method, device, equipment and readable storage medium
CN112101484B (en) * 2020-11-10 2021-02-12 中国科学院自动化研究所 Incremental event identification method, system and device based on knowledge consolidation
CN113780610A (en) * 2020-12-02 2021-12-10 北京沃东天骏信息技术有限公司 Customer service portrait construction method and device
CN112288042B (en) * 2020-12-18 2021-04-02 蚂蚁智信(杭州)信息技术有限公司 Updating method and device of behavior prediction system, storage medium and computing equipment
CN112634048B (en) * 2020-12-30 2023-06-13 第四范式(北京)技术有限公司 Training method and device for money backwashing model
CN112785157B (en) * 2021-01-22 2022-07-22 支付宝(杭州)信息技术有限公司 Risk identification system updating method and device and risk identification method and device
CN112988186B (en) * 2021-02-19 2022-07-19 支付宝(杭州)信息技术有限公司 Updating method and device of abnormality detection system
CN112949752B (en) * 2021-03-25 2022-09-06 支付宝(杭州)信息技术有限公司 Training method and device of business prediction system
CN113129053B (en) * 2021-03-29 2024-05-21 北京沃东天骏信息技术有限公司 Information recommendation model training method, information recommendation method and storage medium
US20220343139A1 (en) * 2021-04-15 2022-10-27 Peyman PASSBAN Methods and systems for training a neural network model for mixed domain and multi-domain tasks
CN113762501A (en) * 2021-04-20 2021-12-07 京东城市(北京)数字科技有限公司 Prediction model training method, device, equipment and storage medium
CN114399344B (en) * 2022-03-24 2022-07-08 北京骑胜科技有限公司 Data processing method and data processing device

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107958286A (en) * 2017-11-23 2018-04-24 清华大学 A kind of depth migration learning method of field Adaptive Networking
CN108229589A (en) * 2018-02-09 2018-06-29 天津师范大学 A kind of ground cloud atlas sorting technique based on transfer learning
CN108846128A (en) * 2018-06-30 2018-11-20 合肥工业大学 A kind of cross-domain texts classification method based on adaptive noise encoder
CN108898218A (en) * 2018-05-24 2018-11-27 阿里巴巴集团控股有限公司 A kind of training method of neural network model, device and computer equipment
CN109034186A (en) * 2018-06-11 2018-12-18 东北大学秦皇岛分校 The method for establishing DA-RBM sorter model
CN109189921A (en) * 2018-08-07 2019-01-11 阿里巴巴集团控股有限公司 Comment on the training method and device of assessment models
CN109284662A (en) * 2018-07-12 2019-01-29 哈尔滨工程大学 A kind of transfer learning method towards the classification of underwater voice signal
CN109359557A (en) * 2018-09-25 2019-02-19 东北大学 A kind of SAR remote sensing images Ship Detection based on transfer learning
CN109523018A (en) * 2019-01-08 2019-03-26 重庆邮电大学 A kind of picture classification method based on depth migration study
CN109670588A (en) * 2017-10-16 2019-04-23 优酷网络技术(北京)有限公司 Neural net prediction method and device
WO2019113501A1 (en) * 2017-12-07 2019-06-13 Fractal Industries, Inc. Transfer learning and domain adaptation using distributable data models
CN110032646A (en) * 2019-05-08 2019-07-19 山西财经大学 The cross-domain texts sensibility classification method of combination learning is adapted to based on multi-source field

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10776693B2 (en) * 2017-01-31 2020-09-15 Xerox Corporation Method and system for learning transferable feature representations from a source domain for a target domain
US11880761B2 (en) * 2017-07-28 2024-01-23 Microsoft Technology Licensing, Llc Domain addition systems and methods for a language understanding system
US10643602B2 (en) * 2018-03-16 2020-05-05 Microsoft Technology Licensing, Llc Adversarial teacher-student learning for unsupervised domain adaptation

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670588A (en) * 2017-10-16 2019-04-23 优酷网络技术(北京)有限公司 Neural net prediction method and device
CN107958286A (en) * 2017-11-23 2018-04-24 清华大学 A kind of depth migration learning method of field Adaptive Networking
WO2019113501A1 (en) * 2017-12-07 2019-06-13 Fractal Industries, Inc. Transfer learning and domain adaptation using distributable data models
CN108229589A (en) * 2018-02-09 2018-06-29 天津师范大学 A kind of ground cloud atlas sorting technique based on transfer learning
CN108898218A (en) * 2018-05-24 2018-11-27 阿里巴巴集团控股有限公司 A kind of training method of neural network model, device and computer equipment
CN109034186A (en) * 2018-06-11 2018-12-18 东北大学秦皇岛分校 The method for establishing DA-RBM sorter model
CN108846128A (en) * 2018-06-30 2018-11-20 合肥工业大学 A kind of cross-domain texts classification method based on adaptive noise encoder
CN109284662A (en) * 2018-07-12 2019-01-29 哈尔滨工程大学 A kind of transfer learning method towards the classification of underwater voice signal
CN109189921A (en) * 2018-08-07 2019-01-11 阿里巴巴集团控股有限公司 Comment on the training method and device of assessment models
CN109359557A (en) * 2018-09-25 2019-02-19 东北大学 A kind of SAR remote sensing images Ship Detection based on transfer learning
CN109523018A (en) * 2019-01-08 2019-03-26 重庆邮电大学 A kind of picture classification method based on depth migration study
CN110032646A (en) * 2019-05-08 2019-07-19 山西财经大学 The cross-domain texts sensibility classification method of combination learning is adapted to based on multi-source field

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
Beyond Sharing Weights for Deep Domain Adaptation;Artem Rozantsev等;《IEEE Transactions on Pattern Analysis and Machine Intelligence》;20190401;第41卷(第4期);第801-814页 *
Domain Separation Networks;Konstantinos Bousmalis等;《Computer Vision and Pattern Recognition》;20160822;第1-15页 *
Introducing shared-hidden-layer autoencoders for transfer learning and;Jun Deng等;《2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)》;20140714;第4818-4822页 *
Open Set Domain Adaptation;Pau Panareda Busto等;《Proceedings of the IEEE International Conference on Computer Vision (ICCV)》;20171231;第754-763页 *
Simultaneous Deep Transfer Across Domains and Tasks;Eric Tzeng等;《Proceedings of the IEEE International Conference on Computer Vision (ICCV)》;20151231;第4068-4076页 *
基于Wasserstein距离分层注意力模型的跨域情感分类;杜永萍等;《模式识别与人工智能》;20190531;第32卷(第5期);第446-454页 *
基于深度学习的领域适应问题研究;马宇婷;《万方在线》;20181115;第1-71页 *
基于深度对抗域适应的高分辨率遥感影像跨域分类;滕文秀等;《激光与光电子学进展》;20190630;第56卷(第11期);第I12801-1-I12801-11页 *
迁移学习在图像分类中的应用研究;吴国琴;《中国优秀硕士学位轮廓全文数据库 信息科技辑》;20170815(第08期);第I138-432页 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11776323B2 (en) 2022-02-15 2023-10-03 Ford Global Technologies, Llc Biometric task network

Also Published As

Publication number Publication date
CN110659744A (en) 2020-01-07

Similar Documents

Publication Publication Date Title
CN110659744B (en) Training event prediction model, and method and device for evaluating operation event
CN111814977B (en) Method and device for training event prediction model
CN108875776B (en) Model training method and device, service recommendation method and device, and electronic device
AU2019202925A1 (en) Selecting threads for concurrent processing of data
CN110705688B (en) Neural network system, method and device for performing risk assessment on operation event
WO2019118639A1 (en) Residual binary neural network
CN114548300B (en) Method and device for explaining service processing result of service processing model
CN113592593A (en) Training and application method, device, equipment and storage medium of sequence recommendation model
CN111159241B (en) Click conversion estimation method and device
CN117113350B (en) Path self-adaption-based malicious software detection method, system and equipment
JP2017174004A (en) Sentence meaning classification calculation device, model learning device, method, and program
CN110717037B (en) Method and device for classifying users
CN112801784A (en) Bit currency address mining method and device for digital currency exchange
CN116506302A (en) Network alignment method based on inverse fact inference
CN116128339A (en) Client credit evaluation method and device, storage medium and electronic equipment
Maleki et al. Improvement of credit scoring by lstm autoencoder model
CN114861739A (en) Characteristic channel selectable multi-component system degradation prediction method and system
CN114238280A (en) Method and device for constructing financial sensitive information standard library and electronic equipment
Shahoud et al. Incorporating unsupervised deep learning into meta learning for energy time series forecasting
Muhammad et al. Modelling short‐scale variability and uncertainty during mineral resource estimation using a novel fuzzy estimation technique
KR102679889B1 (en) Method and Apparatus for Learning of Active learning algorithm with long-range observation
CN113255891B (en) Method, neural network model and device for processing event characteristics
CN117274616B (en) Multi-feature fusion deep learning service QoS prediction system and prediction method
CN116610783B (en) Service optimization method based on artificial intelligent decision and digital online page system
CN117372146A (en) Credit and quota raising method and device based on causal inference

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant