CN112949752B - Training method and device of business prediction system - Google Patents

Training method and device of business prediction system Download PDF

Info

Publication number
CN112949752B
CN112949752B CN202110322500.7A CN202110322500A CN112949752B CN 112949752 B CN112949752 B CN 112949752B CN 202110322500 A CN202110322500 A CN 202110322500A CN 112949752 B CN112949752 B CN 112949752B
Authority
CN
China
Prior art keywords
domain
target domain
source domain
source
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110322500.7A
Other languages
Chinese (zh)
Other versions
CN112949752A (en
Inventor
申书恒
郑霖
傅欣艺
刘蓓
王维强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202110322500.7A priority Critical patent/CN112949752B/en
Publication of CN112949752A publication Critical patent/CN112949752A/en
Application granted granted Critical
Publication of CN112949752B publication Critical patent/CN112949752B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing

Abstract

The embodiment of the specification provides a training method of a business prediction system. In the first stage of the method, rich label information of a source domain is utilized to train a strong feature extractor shared by the source domain and a target domain, and then the strong feature extractor is applied to the target domain; in the second stage, the service prediction model aiming at the target domain object is supervised and trained by using the strong features extracted from the target domain sample by the trained feature extractor and the original features and the service labels in the target domain sample. Therefore, the strong feature extractor trained in the first stage and the business prediction model trained in the second stage form a business prediction system applied to the target domain.

Description

Training method and device of business prediction system
Technical Field
One or more embodiments of the present disclosure relate to the field of machine learning, and in particular, to a method and an apparatus for training a business prediction system.
Background
With the rise of machine learning, more and more service platforms analyze and evaluate the service objects of the platform by training the machine learning model. For example, an e-commerce platform, a social platform and the like perform risk assessment on operation events in the platform by training a risk assessment model, and identify high-risk operation behaviors which may threaten network security or user information security, such as account stealing, traffic attack, fraudulent transactions and the like, so as to perform prevention and control in time.
Usually, the training of the model depends on a large amount of labeled data, however, in some areas where labeled data is rare, the training and learning of the model is difficult. For example, for a newly online service platform, the amount of accumulated service data is small. Models trained directly by such data often have problems of poor performance and the like.
Therefore, a scheme is needed to train a machine learning model with good performance under the condition of rare labeled data, so as to perform more accurate and effective analysis and evaluation on the business object.
Disclosure of Invention
One or more embodiments of the present disclosure describe a method and an apparatus for training a business prediction model, and a business prediction model trained by the method or the apparatus can perform more accurate and effective analysis and evaluation on a business object.
According to a first aspect, there is provided a training method for a traffic prediction system, including: a training sample set is obtained, wherein the training sample set comprises a plurality of source domain samples and a plurality of target domain samples, each sample comprises a characteristic value corresponding to a plurality of common characteristics, and a common characteristic part is formed. Taking the public characteristic part in each sample as a current characteristic part, and inputting the public characteristic part into a transfer learning system, wherein the transfer learning system comprises a characteristic characterizer, a domain discriminator and a source domain business predictor, and when the current characteristic part belongs to a source domain sample, the characteristic characterizer is adopted to characterize the current characteristic part to obtain source domain characteristic representation; inputting the source domain feature representation into the domain discriminator and the source domain service predictor respectively, and correspondingly obtaining a source domain discrimination result and a source domain service prediction result; when the current characteristic part belongs to the target domain sample, the characteristic characterizer is adopted to characterize the current characteristic part to obtain the target domain characteristic representation; and inputting the target domain feature representation into the domain discriminator to obtain a target domain discrimination result. Determining domain discrimination loss according to the source domain discrimination result and the source domain identifier, and the target domain discrimination result and the target domain identifier; determining source domain prediction loss according to the source domain service prediction result and a source domain service label in a source domain sample corresponding to the source domain service prediction result; and training the transfer learning system based on the domain discrimination loss and the source domain prediction loss. And processing a plurality of public characteristic parts corresponding to the plurality of target domain samples by using a characteristic characterizer in the transfer learning system after training to obtain a plurality of strong characteristic representations. Training a business prediction model aiming at a business object of the target domain by utilizing the plurality of target domain samples and the corresponding plurality of strong feature representations; and the trained service prediction model and the feature characterizer in the trained transfer learning system form a service prediction system.
In one embodiment, each target domain sample further includes a feature value corresponding to a number of target domain private features.
In one embodiment, inputting the source domain feature representation into the domain identifier and the source domain service predictor, respectively, to obtain a source domain identification result and a source domain service prediction result correspondingly, includes: inputting the source domain feature representation into the source domain service predictor to obtain a source domain service prediction result; and the source domain service prediction result and the source domain feature are expressed to be jointly input into a domain discriminator to obtain the source domain discrimination result. The migration learning system further comprises a target domain business predictor, and after the characteristic characterizer is adopted to characterize the target domain business to obtain a target domain characteristic representation, the method further comprises the following steps: inputting the target domain feature representation into a target domain service predictor to obtain a target domain service prediction result; inputting the target domain feature representation into the domain discriminator to obtain a target domain discrimination result, wherein the method comprises the following steps: and the target domain service prediction result and the target domain feature are expressed to be jointly input into a domain discriminator to obtain the target domain discrimination result.
In one embodiment, the feature characterizer is implemented as a deep neural network DNN.
In one embodiment, training a business prediction model for a target domain business object using the plurality of target domain samples and the corresponding plurality of strong feature representations comprises: all the characteristics and the corresponding strong characteristic representations in each target domain sample are input into the service prediction model together to obtain a prediction result; determining target domain prediction loss according to the prediction result and the target domain service label in each target domain sample; and training the business prediction model based on the target domain prediction loss.
In a specific embodiment, the business prediction model is implemented as a decision tree model.
In one embodiment, the target domain business object includes at least one of: users, merchants, goods, and events; the business prediction model is used for predicting the classification or regression value of the target domain business object.
In one embodiment, the number of samples of the plurality of source domain samples is greater than the number of samples of the plurality of target domain samples.
According to a second aspect, there is provided a training device of a traffic prediction system, comprising: the acquisition unit is configured to acquire a training sample set, wherein the training sample set comprises a plurality of source domain samples and a plurality of target domain samples, each sample comprises a feature value corresponding to a plurality of common features, and a common feature part is formed. The processing unit is configured to input a common characteristic part in each sample as a current characteristic part into a transfer learning system, wherein the transfer learning system comprises a characteristic characterizer, a domain discriminator and a source domain business predictor, and when the current characteristic part belongs to a source domain sample, the characteristic characterizer is adopted to characterize the current characteristic part to obtain source domain characteristic representation; inputting the source domain feature representation into the domain discriminator and the source domain service predictor respectively, and correspondingly obtaining a source domain discrimination result and a source domain service prediction result; when the current characteristic part belongs to a target domain sample, performing characteristic characterization on the current characteristic part by using the characteristic characterizer to obtain target domain characteristic representation; and inputting the target domain feature representation into the domain discriminator to obtain a target domain discrimination result. A loss determining unit configured to determine a domain discrimination loss according to the source domain discrimination result and the source domain identifier, and the target domain discrimination result and the target domain identifier; and determining the source domain prediction loss according to the source domain service prediction result and the source domain service label in the source domain sample corresponding to the source domain service prediction result. A system training unit configured to train the transfer learning system based on the domain discrimination loss and the source domain prediction loss. And the characterization unit is configured to process a plurality of common characteristic parts corresponding to the plurality of target domain samples by using a characteristic characterizer in the transfer learning system after training to obtain a plurality of strong characteristic representations. A model training unit configured to train a business prediction model for a target domain business object using the plurality of target domain samples and the corresponding plurality of strong feature representations; and the trained service prediction model and the feature characterizer in the trained transfer learning system form a service prediction system.
According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.
According to a fourth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and wherein the processor, when executing the executable code, implements the method of the first aspect.
According to the method and the device provided by the embodiment of the specification, in the first stage, only two feature parts with the same domain are used for training, the interpretable feature space alignment is realized, and the feature indicator shared by the source domain and the target domain is trained by utilizing rich label data of the source domain; in the second stage, in the target domain, the service pre-model is trained by using the strong feature representation extracted by the feature representation device and the original feature and the sample label in the target domain sample, in the training process, the service prediction model firstly pays attention to the strong feature, and after the strong feature is used to obtain good classification effect, fine adjustment is performed by using the original feature to achieve better classification effect, so that the target domain label is effectively used, the over-fitting problem is prevented, the condition that the feature caliber difference information of the source domain and the target domain is omitted can be avoided, the performance of the trained service prediction model on the target domain is stable, and the accuracy of the model prediction result is high. In this way, a high-performance traffic prediction system that can be used by the target domain can be obtained.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram illustrating an implementation scenario of an embodiment disclosed herein;
FIG. 2 illustrates a flow diagram of a method of training a traffic prediction system, according to one embodiment;
FIG. 3 is a schematic diagram illustrating an implementation scenario of another embodiment disclosed in the present specification;
FIG. 4 illustrates a diagram of a training apparatus of a traffic prediction system, according to one embodiment.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
As mentioned above, in some areas where labeled data is rare, it is difficult to train a model with good performance. Moreover, in the field with small data volume, the labeling data may have a problem of serious category imbalance, for example, in the wind control scene of some fields, the concentration of black samples is extremely low compared with white samples, which leads to the phenomena of overfitting and unstable performance when the data in the field is directly used for modeling.
In view of this problem, in the embodiment of the present specification, a migration learning manner is used to perform model training by using labeled data of similar fields with richer data volume and more balanced categories, so that the trained model can be used in the fields with smaller data volume and/or unbalanced data categories. In general, a domain in which data is more abundant may be called a source domain, and a domain in which analysis and learning are to be performed, but the amount of data is less may be called a target domain.
For example, in one scenario, a merchant residing in an applet platform needs to be analyzed. Assuming that the payment platform has been enabled for a long time, by providing the online merchant with the payment receiving service, a large amount of merchant data is accumulated, and the applet merchant to be analyzed has a certain similarity to the online merchant because the data of the applet platform is rare and the data of the applet platform are online, the payment platform can be used as a source domain and the applet platform can be used as a target domain. For another example, in another scenario, a user's interaction events in the customer service platform need to be analyzed. Assuming that the hot-line customer service platform is started for a long time and accumulates a large amount of data, and the data of the online customer service platform to be analyzed is rare due to the fact that the online customer service platform is on line soon, and the data of the online customer service platform have certain similarity, the hot-line customer service platform can be used as a source domain, and the online customer service platform can be used as a target domain. For another example, in another scenario, it is necessary to analyze the operation events of users in different areas of a certain service platform. Assuming that the east China has a long service starting time and more accumulated data, and the north China to be analyzed has a short service opening time and rare data, the east China can be used as a source domain and the north China can be used as a target domain.
Unlike conventional migration learning, in the embodiment disclosed in this specification, a two-stage training manner is adopted, in which feature alignment of a common feature part in a source domain and a target domain is performed in a first stage, so that the feature alignment operation has semantic interpretability, the stage includes a training feature extractor for extracting strong features in the common feature part, and then, in a second stage, a business prediction model is trained using the strong features extracted in the first stage and annotation data of the target domain. In this way, the trained feature extractor and the service prediction model together form a service prediction system which is actually put into use.
Fig. 1 shows a schematic illustration of an implementation scenario according to an embodiment. As shown in FIG. 1, historical data from a source domain and a target domain are collected as a training sample set, and two-stage training is performed. In the first stage, a shared Feature characterizer (or called a Feature extractor or a Strong Feature extractor) is trained according to common features of a source domain and a target domain to extract Stable Strong features (Strong Stable features) aligned with the source domain and the target domain. In the second stage, based on the stable strong features (or called strong features and strong feature representations) and the original features extracted in the first stage, the service prediction model is supervised trained on the target domain. Therefore, the trained feature extractor and the service prediction model jointly form a service prediction system which is actually put into use, and a stable and high-accuracy prediction result on a target domain is achieved.
The following describes the training process of the above traffic prediction system in detail.
FIG. 2 illustrates a flow diagram of a method of training a traffic prediction system, according to one embodiment. It is to be appreciated that the method can be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities. As shown in fig. 2, the training process includes the following steps:
step S201, obtaining a training sample set, wherein the training sample set comprises a plurality of source domain samples and a plurality of target domain samples, each sample comprises a characteristic value corresponding to a plurality of common characteristics, and a common characteristic part is formed; step S202, taking the public characteristic part in each sample as a current characteristic part, and inputting the current characteristic part into a transfer learning system, wherein the transfer learning system comprises a characteristic characterizer, a domain discriminator and a source domain business predictor; step S203, judging whether the current characteristic part is a source domain sample or a target domain sample, further, in step S204, when the current characteristic part belongs to the source domain sample, performing characteristic characterization on the current characteristic part by using a characteristic characterizer to obtain source domain characteristic representation; inputting the source domain feature representation into the domain discriminator and the source domain service predictor respectively, and correspondingly obtaining a source domain discrimination result and a source domain service prediction result, in step S205, when the current feature part belongs to the target domain sample, performing feature representation on the current feature part by using a feature characterizer to obtain target domain feature representation; inputting the target domain feature representation into a domain discriminator to obtain a target domain discrimination result; step S206, determining domain discrimination loss according to the source domain discrimination result and the source domain identifier, and according to the target domain discrimination result and the target domain identifier; determining source domain prediction loss according to the source domain service prediction result and a source domain service label in a source domain sample corresponding to the source domain service prediction result; training the transfer learning system based on domain discrimination loss and source domain prediction loss; step S207, processing a plurality of public characteristic parts corresponding to the plurality of target domain samples by using a characteristic characterizer in the transfer learning system after training to obtain a plurality of strong characteristic representations; and S208, training a business prediction model aiming at the business object of the target domain by using the target domain samples and the strong feature representations, wherein the trained business prediction model and the feature characterizer in the transfer learning system form a business prediction system.
The steps are as follows:
step S201, a training sample set is obtained, where the training sample set includes a plurality of source domain samples and a plurality of target domain samples, and each sample includes a feature value corresponding to a plurality of common features, so as to form a common feature portion.
It will be appreciated that the source domain and the target domain may depend on the business scenario to be analyzed. Generally, the source domain is a domain with rich data, and the target domain is a domain to be analyzed but with sparse data. For example, in one example, the source domain is a payment platform and the target domain is an applet platform; or, in another example, the source domain is a hotline service platform and the target domain is an online service platform; alternatively, in yet another example, the source domain is east China data and the target domain is north China data.
Because the data source of the source domain is richer, the number of source domain samples is generally much larger than the number of target domain samples in the training sample set formed by sample collection.
In the embodiments disclosed in the present specification, the sample objects (or business objects) for which the samples of the two domains, namely the source domain and the target domain, are the same or similar, and the samples of the two domains have a common characteristic part. In one embodiment, the business objects targeted by the two domains are the same and are both merchants. In another embodiment, the sample objects for the two domains are similar, the source domain business object is a login event, and the target domain business object is an access event. On the other hand, in an implementation scenario, two domain samples have the same feature space, and the sample feature items of the two domain samples are identical. In another implementation scenario, the feature spaces of the two domain samples are different, and the target domain has some proprietary features in addition to the same features as the source domain, or the target domain has some proprietary features in addition to all the features of the source domain.
According to a specific embodiment, the business objects targeted by the two domain samples are merchants, and the merchant common features in the two domain samples include: hot commodity sales, user rating, sales volume, complained times, operation duration and the like. Further, in a more specific embodiment, the target domain also includes private features for the merchant: the pre-sale service level of the merchant, the credit level of the merchant and the back-end customer proportion.
According to another specific embodiment, the business objects targeted by the two domain samples are users, and the common characteristics of the user samples in the two domains include: gender, age, hobbies, platform liveliness, consumption preferences (such as consumption amount, consumption times, consumption commodity categories, etc.), frequent locations, etc. Further, in a more specific embodiment, the target domain also includes private features for the user: influence of the user on the social platform (such as fan number), and scene preference of the user for browsing the web page (such as riding a public vehicle or before sleep).
According to another specific embodiment, the source domain business object is a login event, the target domain business object is an access event, and the common characteristics of the login event sample and the access event sample comprise: frequency of operation, time period of operation, address of operation (e.g., IP address), etc., and the sample of access events in the target domain also includes private characteristics: access depth (e.g., several page jumps occurred, or access to several levels of menus), etc.
In the above, the service objects targeted by the source domain sample and the target domain sample are exemplarily described, including a merchant, a user, a login or access event, and the like, and may also be other service objects such as a transaction event, a commodity, and a text content, which are not exhaustive here.
On the other hand, each sample in the training sample set further includes a service label, the source domain service label corresponds to the source domain prediction task, the target domain service label corresponds to the target domain prediction task, generally, the prediction tasks of the two domains are the same or similar, and accordingly, the source domain service label set is the same as or similar to the target domain service label set. In one implementation scenario involving wind control, in one example, high risk, medium risk, and low risk are included in the business label sets of both domains. In another example, both risk and risk free are included. In another implementation scenario, the recommendation field is related, in one example, military information, social information, scientific information, and the like are included in the service tag sets of the two domains, and in another example, clothing, furniture, electronic products, food, and the like are included in the service tag sets of the two domains.
In the above, the obtained training sample set is introduced.
Next, in step S202, the common feature portion of each sample in the training sample set is input to the transfer learning system as the current feature portion. The migration learning system may include a feature characterizer, a domain discriminant, and a source domain traffic predictor, as can be seen in fig. 1.
For the current feature part input to the migration learning system, as shown in step S203 in fig. 2, it is necessary to distinguish whether the current feature part belongs to the source domain sample or the target domain sample.
If the current feature belongs to the source domain sample, then in step 204, a feature characterizer is used to characterize the current feature to obtain a source domain feature representation. The mathematical form of the feature representation output by the feature characterizer may be a vector, array, or matrix. In one embodiment, the feature characterizer may be implemented using a neural network. In a particular embodiment, the feature characterizer may be implemented as a Deep Neural Network (DNN). In another specific embodiment, the feature characterizer may be implemented as a Convolutional Neural Network (CNN).
Further, the source domain features are expressed to an input domain discriminator to obtain a source domain discrimination result; and inputting the source domain feature representation into a source domain service predictor to obtain a source domain service prediction result. It should be understood that the domain discriminator is used to discriminate whether the input sample belongs to the source domain or the target domain, and the obtained source domain discrimination result indicates the probability that the corresponding input sample belongs to the source domain. The domain discriminator can be realized based on a binary algorithm, and specifically can be realized as a Support Vector Machine (SVM), a logistic regression model or the like. In addition, the source domain business predictor is used for executing a source domain prediction task and performing business prediction on a source domain business object to obtain a source domain business prediction result.
If the current feature belongs to the target domain sample, in step 205, a feature characterizer shared by two domains is used to characterize the current feature to obtain a target domain feature representation. Further, the target domain feature is input into a domain discriminator shared by the two domains to obtain a target domain discrimination result, and the target domain discrimination result indicates the probability that the corresponding input sample belongs to the target domain.
On the other hand, in order to improve the accuracy of the domain discrimination result, thereby improving the training effect of the transfer learning and improving the prediction performance of the trained feature extractor, the service prediction result of each domain may also be used as the input of the domain discriminator. Accordingly, the representing the source domain features by the input domain discriminator to obtain the source domain discrimination result may include: the source domain service prediction result and the source domain feature output by the source domain service predictor are expressed to be input to a domain discriminator together to obtain the source domain discrimination result; a target domain service predictor (see fig. 3) is set in the transfer learning system, the target domain feature representation is input into the target domain service predictor to obtain a target domain service prediction result, and the target domain service prediction result and the target domain feature representation are input into a domain discriminator together to obtain the target domain discrimination result.
In the above, the source domain discrimination result, the target domain discrimination result and the source domain service prediction result can be obtained. Based on this, in step S206, the domain discrimination loss is determined based on the source domain discrimination result and the target domain discrimination result obtained as described above. Specifically, a source domain discrimination loss is first determined based on the source domain discrimination result and a source domain identifier (indicating that the corresponding input is from the source domain, e.g., 0), and a target domain discrimination loss is first determined based on the target domain discrimination result and a target domain identifier (indicating that the corresponding input is from the target domain, e.g., 1); then, a comprehensive domain discrimination loss is determined based on the source domain discrimination loss and the target domain discrimination loss, and the domain discrimination loss is positively correlated with both the target domain discrimination loss and the target domain discrimination loss. And determining the source domain prediction loss based on the source domain service prediction result and the source domain service label. Further, the transfer learning system is trained in a direction in which the overall loss corresponding to the domain discrimination loss and the source domain prediction loss is reduced.
Therefore, in the first stage, only the characteristic parts with the same two domains are used for training, and the respective private characteristics of the two domains are not used for training, so that the problem that the semantics cannot be explained due to the fact that the full quantity characteristics of the two domains are directly used for direct projection is avoided, and the interpretable characteristic space alignment is realized; the domain discrimination loss is added in the training process, so that the features extracted by the feature characterizer have domain invariance, and the accuracy of a subsequent service prediction result is improved; the problem of category imbalance often exists due to the small data amount of the target domain, and if the target domain label information is directly used, the problem of overfitting is easily caused, and the target domain label is not used in the first stage, so that overfitting can be avoided; for example, for a risk label, the marking standard of the source domain includes that if a user executes an operation event on an uncommon device, the risk label is marked, and the marking standard of the target domain includes that if the user complains about the operation event, the risk label is marked, which will result in that negative migration (which is worse than the effect of learning without migration) can be generated by using a traditional migration learning algorithm, so that the target domain label is not used in the first stage, and the negative migration can be avoided; although the target domain label information is not used in the first stage, and the label calibers of the source domain and the target domain have a certain difference, the feature representations output by the feature characterizer of the source domain and the target domain are completely aligned, so the features extracted by using the feature characterizer of the target domain are still strong features, namely important features influencing the service prediction result.
Therefore, the first stage can be realized, the training of the transfer learning system is completed, and the trained feature characterizer, domain discriminator, source domain business predictor and the like are obtained. It is to be appreciated that the motivation for training a migration learning system is to train a feature characterizer that is shared by a source domain and a target domain using rich label information of the source domain, and then use it for the target domain.
In step S207, a plurality of common feature portions corresponding to a plurality of target domain samples in the training sample set are processed by using a feature characterizer in the post-training migration learning system, so as to obtain a plurality of strong feature representations. The rabbit can input a plurality of common characteristic parts corresponding to a plurality of target domain samples into the trained characteristic characterizers respectively to obtain a plurality of corresponding strong characteristic representations.
After obtaining the plurality of strong feature representations, in step S208, a business prediction model for the target domain business object is trained using the plurality of target domain samples and the corresponding plurality of strong feature representations. Specifically, the service prediction model is trained by using all original features and service labels included in each target domain sample and a strong feature representation obtained based on a common feature part in each target domain sample.
More specifically, all the features and corresponding strong features in each target domain sample may be represented to be commonly input into a service prediction model to obtain a prediction result, and then the service prediction model is trained based on the prediction result and a target domain service label in the target domain sample. In one embodiment, the full feature and the strong feature representation may be concatenated and input into the traffic prediction model. In another embodiment, the total features and the strong features may be subjected to other fusion processes such as addition or averaging, and then input into the service prediction model. In one embodiment, the traffic prediction model described above may be implemented as a neural network model. In another embodiment, a decision tree model such as XGBOST may also be implemented.
In one embodiment, the target domain prediction loss is determined based on the prediction result and the target domain business label, and the business prediction model is trained towards the direction of reducing the target domain prediction loss.
In the second stage of training the service prediction model, the features extracted in the first stage are strong features, so that the service prediction model can firstly pay attention to the features in the model training, and after the features are used to obtain good classification effect, the original features are used for fine adjustment to achieve good classification effect, and the target domain label can be effectively used and the over-fitting problem is prevented. In addition, if only strong feature representation is used in the second stage, feature caliber difference information contained in the common original features of the target domain and the source domain is omitted, for example, common feature parts of the target domain and the source domain both include user activity levels, but the statistical criteria of the user activity levels in the target domain include: if the single-day APP usage frequency of the user exceeds 5, determining that the user is in a high activity level, and the statistical criteria of the user activity level in the source domain include: if the frequency of the APP usage of the user per day exceeds 4, a high activity level is determined, and if the feature caliber difference information is omitted, the effective feature quantity captured by the service prediction model is limited. Meanwhile, under the condition that the target domain sample also comprises the private characteristics, the private characteristics are added into the training of the business prediction model, and the prediction performance of the training model can be effectively improved. Therefore, the trained business prediction model can be stable in the target domain, and the prediction result has high accuracy.
Therefore, the trained service prediction model for the target domain can be obtained in the second stage, and a service prediction system used for the target domain is formed by combining the trained feature characterizer obtained in the first stage. In the using stage of the service prediction system, for a certain target domain sample to be processed, the public characteristic part can be firstly input into the trained characteristic characterizer to obtain a corresponding strong characteristic representation, then the strong characteristic representation and all the characteristics in the certain target domain sample are jointly input into the trained service prediction model to obtain a corresponding service prediction result, and further service processing is carried out according to the service prediction result. For example, if the traffic prediction result indicates a risk level of an event, the blocking process is performed for an event identified as high risk, and the release process is performed for an event identified as low risk.
In summary, with the training method and apparatus of the business prediction system disclosed in the embodiments of the present specification, in the first stage, only two feature parts with the same domain are used for training, so as to implement interpretable feature space alignment, and utilize rich label data of the source domain to train a feature indicator shared by the source domain and the target domain; in the second stage, in the target domain, the service pre-model is trained by using the strong feature representation extracted by the feature representation device and the original feature and the sample label in the target domain sample, in the training process, the service prediction model firstly pays attention to the strong feature, and after the strong feature is used to obtain good classification effect, fine adjustment is performed by using the original feature to achieve better classification effect, so that the target domain label is effectively used, the over-fitting problem is prevented, the condition that the feature caliber difference information of the source domain and the target domain is omitted can be avoided, the performance of the trained service prediction model on the target domain is stable, and the accuracy of the model prediction result is high. Thus, a high-performance traffic prediction system that can be used by the target domain can be obtained.
According to another aspect of the embodiments, the present specification further discloses a training apparatus of a traffic prediction system, which may be deployed in any device, platform or device cluster having computing and processing capabilities. Fig. 4 is a block diagram of a training apparatus of a traffic prediction system according to an embodiment, as shown in fig. 4, the apparatus 400 includes:
the obtaining unit 410 is configured to obtain a training sample set, which includes a plurality of source domain samples and a plurality of target domain samples, wherein each sample includes feature values corresponding to a plurality of common features, forming a common feature portion. A processing unit 420, configured to input the common feature part in each sample as a current feature part into a transfer learning system, where the transfer learning system includes a feature characterizer, a domain discriminator and a source domain traffic predictor, and when the current feature part belongs to a source domain sample, the feature characterizer is used to characterize the current feature part to obtain a source domain feature representation; inputting the source domain feature representation into the domain discriminator and the source domain service predictor respectively, and correspondingly obtaining a source domain discrimination result and a source domain service prediction result; when the current characteristic part belongs to a target domain sample, performing characteristic characterization on the current characteristic part by using the characteristic characterizer to obtain target domain characteristic representation; and inputting the target domain feature representation into the domain discriminator to obtain a target domain discrimination result. A loss determining unit 430, configured to determine a domain discrimination loss according to the source domain discrimination result and the source domain identifier, and the target domain discrimination result and the target domain identifier; and determining the source domain prediction loss according to the source domain service prediction result and the source domain service label in the source domain sample corresponding to the source domain service prediction result. A system training unit 440 configured to train the transfer learning system based on the domain discrimination loss and the source domain prediction loss. The characterization unit 450 is configured to process, by using a feature characterizer in the transfer learning system after training, a plurality of common feature portions corresponding to the plurality of target domain samples to obtain a plurality of strong feature representations. A model training unit 460 configured to train a business prediction model for a target domain business object using the plurality of target domain samples and the corresponding plurality of strong feature representations; and the trained service prediction model and the feature characterizer in the trained transfer learning system form a service prediction system.
In one embodiment, each target domain sample further includes a feature value corresponding to a number of target domain private features.
In an embodiment, for the processing unit 420, the inputting the source domain feature representation into the domain identifier and the source domain traffic predictor respectively to obtain the source domain identification result and the source domain traffic prediction result correspondingly includes: inputting the source domain feature representation into the source domain service predictor to obtain a source domain service prediction result; inputting the source domain service prediction result and the source domain feature representation into a domain discriminator together to obtain a source domain discrimination result; the migration learning system further comprises a target domain business predictor, and after the characteristic characterizer is adopted to characterize the target domain business predictor to obtain a target domain characteristic representation, the method further comprises the following steps: inputting the target domain feature representation into a target domain service predictor to obtain a target domain service prediction result; inputting the target domain feature representation into the domain discriminator to obtain a target domain discrimination result, wherein the method comprises the following steps: and the target domain service prediction result and the target domain feature are expressed to be jointly input into a domain discriminator to obtain the target domain discrimination result.
In one embodiment, the feature characterizer is implemented as a deep neural network DNN.
In one embodiment, the model training unit 460 is specifically configured to: all the characteristics and corresponding strong characteristic representations in each target domain sample are jointly input into the service prediction model to obtain a prediction result; determining the prediction loss of the target domain according to the prediction result and the target domain service label in each target domain sample; and training the business prediction model based on the target domain prediction loss.
In a specific embodiment, the traffic prediction model is implemented as a decision tree model.
In one embodiment, the target domain business object includes at least one of: users, merchants, goods, and events; the business prediction model is used for predicting the classification or regression value of the target domain business object.
In one embodiment, the number of samples of the plurality of source domain samples is greater than the number of samples of the plurality of target domain samples.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory having stored therein executable code, and a processor that, when executing the executable code, implements the method described in connection with fig. 2.
Those skilled in the art will recognize that the functionality described in this disclosure may be implemented in hardware, software, firmware, or any combination thereof, in one or more of the examples described above. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (11)

1. A training method of a business prediction system comprises the following steps:
acquiring a training sample set aiming at a merchant, wherein the training sample set comprises a plurality of source domain samples and a plurality of target domain samples, each sample comprises a characteristic value corresponding to a plurality of common characteristics to form a common characteristic part; the plurality of common features includes at least one of: hot commodity sales, user rating, sales volume, complained times and operation duration; the source domain is a payment platform, and the target domain is an applet platform;
inputting the common characteristic part in each sample as a current characteristic part into a transfer learning system, wherein the transfer learning system comprises a characteristic characterizer, a domain discriminator and a source domain business predictor,
when the current characteristic part belongs to a source domain sample, performing characteristic characterization on the current characteristic part by using the characteristic characterizer to obtain source domain characteristic representation; inputting the source domain feature representation into the domain discriminator and the source domain business predictor respectively to correspondingly obtain a source domain discrimination result and a source domain merchant classification result;
when the current characteristic part belongs to a target domain sample, performing characteristic characterization on the current characteristic part by using the characteristic characterizer to obtain target domain characteristic representation; inputting the target domain feature representation into the domain discriminator to obtain a target domain discrimination result;
determining domain discrimination loss according to the source domain discrimination result and the source domain identifier, and the target domain discrimination result and the target domain identifier; determining source domain prediction loss according to the source domain merchant classification result and a source domain service label in a source domain sample corresponding to the source domain merchant classification result; training the transfer learning system based on the domain discrimination loss and the source domain prediction loss;
processing a plurality of public characteristic parts corresponding to the plurality of target domain samples by using a characteristic characterizer in the transfer learning system after training to obtain a plurality of strong characteristic representations;
training a business prediction model for a target domain merchant by using the target domain samples and the corresponding strong feature representations; and the trained service prediction model and the feature characterizer in the trained transfer learning system form a service prediction system.
2. The method of claim 1, wherein each target domain sample further includes a feature value corresponding to a number of target domain private features.
3. The method of claim 1, wherein inputting the source domain feature representation into the domain discriminator and the source domain business predictor respectively, and correspondingly obtaining a source domain discrimination result and a source domain merchant classification result, comprises:
inputting the source domain feature representation into the source domain business predictor to obtain a source domain merchant classification result; inputting the source domain merchant classification result and the source domain feature representation into a domain discriminator together to obtain a source domain discrimination result;
the migration learning system further comprises a target domain business predictor, and after the characteristic characterizer is adopted to characterize the target domain business to obtain a target domain characteristic representation, the method further comprises the following steps:
inputting the target domain feature representation into a target domain service predictor to obtain a target domain service prediction result;
inputting the target domain feature representation into the domain discriminator to obtain a target domain discrimination result, wherein the method comprises the following steps:
and the target domain service prediction result and the target domain feature are expressed to be jointly input into a domain discriminator to obtain the target domain discrimination result.
4. The method of claim 1, wherein the feature characterizer is implemented as a Deep Neural Network (DNN).
5. The method of claim 1, wherein training a business prediction model for a target domain merchant using the plurality of target domain samples and the corresponding plurality of strong feature representations comprises:
all the characteristics and corresponding strong characteristic representations in each target domain sample are jointly input into the service prediction model to obtain a prediction result;
determining target domain prediction loss according to the prediction result and the target domain service label in each target domain sample;
and training the business prediction model based on the target domain prediction loss.
6. The method of claim 5, wherein the traffic prediction model is implemented as a decision tree model.
7. The method of any of claims 1-6, wherein the business prediction model is used to predict a classification or regression value for the target domain merchant.
8. The method of any of claims 1-6, wherein a number of samples of the plurality of source domain samples is greater than a number of samples of the plurality of target domain samples.
9. A training apparatus of a traffic prediction system, comprising:
the acquisition unit is configured to acquire a training sample set for a merchant, wherein the training sample set comprises a plurality of source domain samples and a plurality of target domain samples, each sample comprises a feature value corresponding to a plurality of common features, and a common feature part is formed; the plurality of common features includes at least one of: hot-sold commodities, user scores, sales volume, complained times and operation duration; the source domain is a payment platform, and the target domain is an applet platform;
a processing unit configured to input the common feature part in each sample as a current feature part into a transfer learning system, the transfer learning system comprising a feature characterizer, a domain discriminator and a source domain traffic predictor, wherein,
when the current characteristic part belongs to a source domain sample, performing characteristic characterization on the current characteristic part by using the characteristic characterizer to obtain source domain characteristic representation; inputting the source domain feature representation into the domain discriminator and the source domain business predictor respectively, and correspondingly obtaining a source domain discrimination result and a source domain merchant classification result;
when the current characteristic part belongs to a target domain sample, performing characteristic characterization on the current characteristic part by using the characteristic characterizer to obtain target domain characteristic representation; inputting the target domain feature representation into the domain discriminator to obtain a target domain discrimination result;
a loss determining unit configured to determine a domain discrimination loss according to the source domain discrimination result and the source domain identifier, and the target domain discrimination result and the target domain identifier; determining source domain prediction loss according to the source domain merchant classification result and a source domain service label in a source domain sample corresponding to the source domain merchant classification result;
a system training unit configured to train the transfer learning system based on the domain discrimination loss and a source domain prediction loss;
the characterization unit is configured to process a plurality of public characteristic parts corresponding to the plurality of target domain samples by using a characteristic characterizer in the transfer learning system after training to obtain a plurality of strong characteristic representations;
the model training unit is configured to train a business prediction model for a target domain merchant by using the target domain samples and the corresponding strong feature representations; and the trained service prediction model and the feature characterizer in the trained transfer learning system form a service prediction system.
10. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-8.
11. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, implements the method of any of claims 1-8.
CN202110322500.7A 2021-03-25 2021-03-25 Training method and device of business prediction system Active CN112949752B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110322500.7A CN112949752B (en) 2021-03-25 2021-03-25 Training method and device of business prediction system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110322500.7A CN112949752B (en) 2021-03-25 2021-03-25 Training method and device of business prediction system

Publications (2)

Publication Number Publication Date
CN112949752A CN112949752A (en) 2021-06-11
CN112949752B true CN112949752B (en) 2022-09-06

Family

ID=76226714

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110322500.7A Active CN112949752B (en) 2021-03-25 2021-03-25 Training method and device of business prediction system

Country Status (1)

Country Link
CN (1) CN112949752B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214421A (en) * 2018-07-27 2019-01-15 阿里巴巴集团控股有限公司 A kind of model training method, device and computer equipment
CN112015562A (en) * 2020-10-27 2020-12-01 北京淇瑀信息科技有限公司 Resource allocation method and device based on transfer learning and electronic equipment

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106952105A (en) * 2017-04-26 2017-07-14 浙江大学 A kind of retail shop based on transfer learning optimizes site selecting method
CN109685644A (en) * 2018-12-17 2019-04-26 深圳市数丰科技有限公司 A kind of customers' credit methods of marking and device based on transfer learning
CN109688597B (en) * 2018-12-18 2020-09-01 北京邮电大学 Fog wireless access network networking method and device based on artificial intelligence
CN110659744B (en) * 2019-09-26 2021-06-04 支付宝(杭州)信息技术有限公司 Training event prediction model, and method and device for evaluating operation event
CN110738314B (en) * 2019-10-17 2023-05-02 中山大学 Click rate prediction method and device based on deep migration network
CN111401454A (en) * 2020-03-19 2020-07-10 创新奇智(重庆)科技有限公司 Few-sample target identification method based on transfer learning
CN111382846B (en) * 2020-05-28 2020-09-01 支付宝(杭州)信息技术有限公司 Method and device for training neural network model based on transfer learning
CN111814977B (en) * 2020-08-28 2020-12-18 支付宝(杭州)信息技术有限公司 Method and device for training event prediction model
CN112418443A (en) * 2020-12-02 2021-02-26 深圳前海微众银行股份有限公司 Data processing method, device and equipment based on transfer learning and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214421A (en) * 2018-07-27 2019-01-15 阿里巴巴集团控股有限公司 A kind of model training method, device and computer equipment
CN112015562A (en) * 2020-10-27 2020-12-01 北京淇瑀信息科技有限公司 Resource allocation method and device based on transfer learning and electronic equipment

Also Published As

Publication number Publication date
CN112949752A (en) 2021-06-11

Similar Documents

Publication Publication Date Title
CN104765874B (en) For detecting the method and device for clicking cheating
CN107563757B (en) Data risk identification method and device
US20210374749A1 (en) User profiling based on transaction data associated with a user
CN112435137B (en) Cheating information detection method and system based on community mining
CN114581207B (en) Commodity image big data accurate pushing method and system for E-commerce platform
CN112883990A (en) Data classification method and device, computer storage medium and electronic equipment
CN113543117B (en) Prediction method and device for number portability user and computing equipment
Braun et al. Improving card fraud detection through suspicious pattern discovery
US10372702B2 (en) Methods and apparatus for detecting anomalies in electronic data
CN111967503A (en) Method for constructing multi-type abnormal webpage classification model and abnormal webpage detection method
CN112949752B (en) Training method and device of business prediction system
CN116821759A (en) Identification prediction method and device for category labels, processor and electronic equipment
CN116664306A (en) Intelligent recommendation method and device for wind control rules, electronic equipment and medium
CN116485512A (en) Bank data analysis method and system based on reinforcement learning
CN111159399A (en) Automobile vertical website water army discrimination method
CN116318974A (en) Site risk identification method and device, computer readable medium and electronic equipment
CN112347457A (en) Abnormal account detection method and device, computer equipment and storage medium
CN113052512A (en) Risk prediction method and device and electronic equipment
CN112712423A (en) Suspected illegal fundraising item judgment method and device, computer equipment and storage medium
CN114978616B (en) Construction method and device of risk assessment system, and risk assessment method and device
CN112258315B (en) Method and device for checking vehicle credit pre-credit data based on identity tag
CN111447082B (en) Determination method and device of associated account and determination method of associated data object
Chen et al. Development of Machine Learning Based Fraudulent Website Detection Scheme
CN116204567B (en) Training method and device for user mining and model, electronic equipment and storage medium
US20230065074A1 (en) Counterfeit object detection using image analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant