CN111275071B - Prediction model training method, prediction device and electronic equipment - Google Patents

Prediction model training method, prediction device and electronic equipment Download PDF

Info

Publication number
CN111275071B
CN111275071B CN202010010075.3A CN202010010075A CN111275071B CN 111275071 B CN111275071 B CN 111275071B CN 202010010075 A CN202010010075 A CN 202010010075A CN 111275071 B CN111275071 B CN 111275071B
Authority
CN
China
Prior art keywords
account
data
sample data
target
subordinate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010010075.3A
Other languages
Chinese (zh)
Other versions
CN111275071A (en
Inventor
陈知己
赵鹏
金大治
刘芷诺
刘润
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010010075.3A priority Critical patent/CN111275071B/en
Publication of CN111275071A publication Critical patent/CN111275071A/en
Application granted granted Critical
Publication of CN111275071B publication Critical patent/CN111275071B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the specification discloses a prediction model training method, a prediction device and electronic equipment, wherein the prediction model training method can acquire batch sample data generated in a preset time period before and/or after a first time, one sample data carries characteristic data of a plurality of slave objects under a master object, one slave object under one master object is a slave object to be tested, and the rest slave objects are reference slave objects; determining a management index value of the reference dependent object in the sample data based on the characteristic data of the reference dependent object in the sample data and a preset model; integrating the management index value of at least one reference dependent object in the sample data, and taking the integrated result as the newly added characteristic data of the dependent object to be detected in the sample data; and taking the label of the to-be-detected dependent object, the original characteristic data and the newly added characteristic data in the sample data as input, and training a management index prediction model.

Description

Prediction model training method, prediction device and electronic equipment
Technical Field
The application relates to the technical field of computers, in particular to a prediction model training method, a prediction model training device, a prediction model prediction method, a prediction model training device and an electronic device.
Background
In the related art, it often occurs that one master object has a plurality of slave objects. For example, operators in many industries (third party payment platforms, e-commerce companies, offline retail industries, etc.) establish their own membership systems, and in order to increase the number of users and enhance the customer experience, the operators allow the same person (master object) to register multiple accounts (slave objects) based on different registration credentials (e-mail, mobile phone number, certificate number, etc.).
The operator allows one main object to have a plurality of subordinate objects, and the management difficulty of the operator on the subordinate objects is deepened. For example, in the case that the third-party payment platform allows the same person to register a plurality of accounts, there are problems that the user performs illegal activities by using a small number, buying and selling an account, embezzling an account, and the like, which deepens the difficulty of account risk management of the third-party payment platform.
Therefore, how to accurately characterize the management index of the dependent object so that the operator can better manage the dependent object is a technical problem to be solved urgently.
Disclosure of Invention
The embodiment of the specification provides a prediction model training method, a prediction device and electronic equipment, and aims to solve the problem of accurately describing the management index of a dependent object.
In order to solve the above technical problem, the embodiments of the present specification are implemented as follows:
in a first aspect, a management index prediction model training method is provided, including:
acquiring batch sample data, wherein the sample data is generated in a preset time period before and/or after a first time, one sample data carries characteristic data of a plurality of subordinate objects under a main object, one subordinate object under one main object is a subordinate object to be detected, the other subordinate objects are reference subordinate objects, and the subordinate objects to be detected carry tags;
determining a management index value of the reference dependent object in the sample data based on the characteristic data of the reference dependent object in the sample data and a preset model, wherein the preset model is obtained based on the characteristic data training of batch dependent objects;
integrating the management index value of at least one reference dependent object in the sample data, and taking the integrated result as the newly added characteristic data of the dependent object to be detected in the sample data;
and taking the label of the to-be-detected subordinate object, the original characteristic data and the newly-added characteristic data in the sample data as input, and training a management index prediction model, wherein the management index prediction model is used for predicting a management index value of the to-be-detected subordinate object in a preset time period after the first time.
In a second aspect, a management index prediction method is provided, including:
determining a target dependent object to be predicted;
acquiring feature data of a plurality of subordinate objects under a target main object, wherein the plurality of subordinate objects under the target main object comprise the target subordinate object and at least one reference subordinate object, and the feature data of the subordinate objects under the target main object are generated in a preset time period before a second moment;
determining a management index value of a reference slave object under the target master object based on the characteristic data of the reference slave object under the target master object and a preset model, wherein the preset model is obtained based on the characteristic data training of batch slave objects;
integrating the management index value of at least one reference subordinate object under the target main object, and taking the integrated result as the newly added characteristic data of the target subordinate object;
inputting the original characteristic data and the newly added characteristic data of the target dependent object into a management index prediction model, and predicting a management index value of the target dependent object in a preset time period after the second moment, wherein the management index prediction model is obtained based on the method of the first aspect.
In a third aspect, a method for training an account risk prediction model is provided, including:
acquiring batch sample data, wherein the sample data is generated in a preset time period before and/or after a first moment, one sample data carries characteristic data of a plurality of accounts under one identity, one account under one identity is an account to be detected, the other accounts are reference accounts, and the account to be detected carries a label;
determining a risk evaluation value of a reference account in the sample data based on the characteristic data of the reference account in the sample data and a preset model, wherein the preset model is obtained based on characteristic data training of batch accounts;
integrating the risk assessment value of at least one reference account in the sample data, and taking an integration result as newly-added characteristic data of an account to be detected in the sample data;
and taking the label of the account to be tested, the original characteristic data and the newly added characteristic data in the sample data as input, and training a risk prediction model, wherein the risk prediction model is used for predicting a risk assessment value of the account to be tested in a preset time period after the first time.
In a fourth aspect, a method for predicting account risk is provided, including:
determining a target account to be predicted;
acquiring feature data of a plurality of accounts under a target identity, wherein the plurality of accounts under the target identity comprise the target account and at least one reference account, and the feature data of the accounts under the target identity are generated in a preset time period before a second moment;
determining a risk assessment value of the reference account under the target identity based on the feature data of the reference account under the target identity and a preset model, wherein the preset model is obtained based on feature data training of batch accounts;
integrating the risk assessment value of at least one reference account under the target identity, and taking an integration result as newly-added feature data of the target account;
inputting the original characteristic data and the newly added characteristic data of the target account into a risk prediction model, and predicting a risk assessment value of the target account in a preset time period after the second moment, wherein the risk prediction model is obtained by training based on the method of the third aspect.
In a fifth aspect, a management index prediction model training device is provided, including:
the system comprises a first data acquisition module, a second data acquisition module and a third data acquisition module, wherein the first data acquisition module is used for acquiring batch sample data, one sample data is generated in a preset time period before and/or after a first time, one sample data carries characteristic data of a plurality of subordinate objects under a main object, one subordinate object under one main object is a subordinate object to be detected, the other subordinate objects are reference subordinate objects, and the subordinate object to be detected carries a label;
the first index determining module is used for determining a management index value of the reference dependent object in the sample data based on the characteristic data of the reference dependent object in the sample data and a preset model, wherein the preset model is obtained based on the characteristic data training of batch dependent objects;
the first index integration module is used for integrating the management index value of at least one reference dependent object in the sample data and taking the integration result as the newly added feature data of the dependent object to be detected in the sample data;
and the prediction model training module is used for taking the label of the to-be-detected subordinate object, the original characteristic data and the newly-added characteristic data in the sample data as input and training a management index prediction model, wherein the management index prediction model is used for predicting a management index value of the to-be-detected subordinate object in a preset time period after the first time.
In a sixth aspect, a management index prediction apparatus is provided, including:
the target determining module is used for determining a target dependent object to be predicted;
the second data acquisition module is used for acquiring the characteristic data of a plurality of subordinate objects under a target main object, wherein the plurality of subordinate objects under the target main object comprise the target subordinate object and at least one reference subordinate object, and the characteristic data of the subordinate objects under the target main object is generated in a preset time period before a second moment;
the second index determining module is used for determining a management index value of a reference slave object under the target master object based on the feature data of the reference slave object under the target master object and a preset model, wherein the preset model is obtained based on the feature data training of batch slave objects;
the second index integration module is used for integrating the management index values of at least one reference subordinate object under the target master object, and taking the integration result as the newly added feature data of the target subordinate object;
and the management index prediction module is used for inputting the original characteristic data and the newly added characteristic data of the target dependent object into a management index prediction model and predicting the management index value of the target dependent object in a preset time period after the second moment, wherein the management index prediction model is obtained by training based on the method of the first aspect.
A seventh aspect provides an account risk prediction model training device, including:
a first account data acquisition module, wherein the sample data is generated in a preset time period before and/or after a first time, one sample data carries characteristic data of a plurality of accounts under one identity, one account under one identity is an account to be tested, the other accounts are reference accounts, and the account to be tested carries a label;
the first risk value determination module is used for determining a risk assessment value of a reference account in the sample data based on the characteristic data of the reference account in the sample data and a preset model, wherein the preset model is obtained based on the characteristic data training of batch accounts;
the first risk value integration module is used for integrating the risk evaluation value of at least one reference account in the sample data and taking the integration result as the newly added feature data of the account to be detected in the sample data;
and the risk prediction model training module is used for taking the label of the account to be tested, the original characteristic data and the newly added characteristic data in the sample data as input and training a risk prediction model, wherein the risk prediction model is used for predicting the risk assessment value of the account to be tested in a preset time period after the first time.
In an eighth aspect, an account risk prediction apparatus is provided, including:
the target account determining module is used for determining a target account to be predicted;
the second account data acquisition module is used for acquiring the characteristic data of a plurality of accounts under the target identity, wherein the plurality of accounts under the target identity comprise the target account and at least one reference account, and the characteristic data of the accounts under the target identity are generated in a preset time period before the second moment;
the second risk value determination module is used for determining a risk assessment value of the reference account under the target identity based on the feature data of the reference account under the target identity and a preset model, wherein the preset model is obtained based on the feature data of batch accounts through training;
the second risk value integration module is used for integrating the risk evaluation value of at least one reference account under the target identity and taking the integration result as the newly added feature data of the target account;
and a risk prediction module, configured to input the original feature data and the newly added feature data of the target account into a risk prediction model, and predict a risk assessment value of the target account in a preset time period after the second time, where the risk prediction model is obtained by training based on the method of the third aspect.
In a ninth aspect, an electronic device is provided, comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
acquiring batch sample data, wherein the sample data is generated in a preset time period before and/or after a first time, one sample data carries characteristic data of a plurality of subordinate objects under a main object, one subordinate object under one main object is a subordinate object to be detected, the other subordinate objects are reference subordinate objects, and the subordinate objects to be detected carry tags;
determining a management index value of a reference dependent object in the sample data based on the characteristic data of the reference dependent object in the sample data and a preset model, wherein the preset model is obtained based on the characteristic data training of batch dependent objects;
integrating the management index value of at least one reference dependent object in the sample data, and taking the integrated result as the newly added characteristic data of the dependent object to be detected in the sample data;
and taking the label of the to-be-detected subordinate object, the original characteristic data and the newly-added characteristic data in the sample data as input, and training a management index prediction model, wherein the management index prediction model is used for predicting a management index value of the to-be-detected subordinate object in a preset time period after the first time.
In a tenth aspect, a computer-readable storage medium is presented, storing one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to:
acquiring batch sample data, wherein the sample data is generated in a preset time period before and/or after a first time, one sample data carries characteristic data of a plurality of subordinate objects under a main object, one subordinate object under one main object is a subordinate object to be detected, the other subordinate objects are reference subordinate objects, and the subordinate objects to be detected carry tags;
determining a management index value of the reference dependent object in the sample data based on the characteristic data of the reference dependent object in the sample data and a preset model, wherein the preset model is obtained based on the characteristic data training of batch dependent objects;
integrating the management index value of at least one reference dependent object in the sample data, and taking the integrated result as the newly added characteristic data of the dependent object to be detected in the sample data;
and taking the label of the to-be-detected subordinate object, the original characteristic data and the newly-added characteristic data in the sample data as input, and training a management index prediction model, wherein the management index prediction model is used for predicting a management index value of the to-be-detected subordinate object in a preset time period after the first time.
In an eleventh aspect, an electronic device is provided, including:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
determining a target dependent object to be predicted;
acquiring feature data of a plurality of subordinate objects under a target main object, wherein the plurality of subordinate objects under the target main object comprise the target subordinate object and at least one reference subordinate object, and the feature data of the subordinate objects under the target main object are generated in a preset time period before a second moment;
determining a management index value of a reference slave object under the target master object based on the characteristic data of the reference slave object under the target master object and a preset model, wherein the preset model is obtained based on the characteristic data training of batch slave objects;
integrating the management index value of at least one reference subordinate object under the target main object, and taking the integrated result as the newly added characteristic data of the target subordinate object;
inputting the original characteristic data and the newly added characteristic data of the target dependent object into a management index prediction model, and predicting a management index value of the target dependent object in a preset time period after the second moment, wherein the management index prediction model is obtained based on the method of the first aspect.
A twelfth facet, a computer-readable storage medium is presented, the computer-readable storage medium storing one or more programs that, when executed by an electronic device that includes a plurality of application programs, cause the electronic device to:
determining a target dependent object to be predicted;
acquiring feature data of a plurality of subordinate objects under a target main object, wherein the plurality of subordinate objects under the target main object comprise the target subordinate object and at least one reference subordinate object, and the feature data of the subordinate objects under the target main object are generated in a preset time period before a second moment;
determining a management index value of a reference slave object under the target master object based on the characteristic data of the reference slave object under the target master object and a preset model, wherein the preset model is obtained based on the characteristic data training of batch slave objects;
integrating the management index value of at least one reference subordinate object under the target main object, and taking the integrated result as the newly added characteristic data of the target subordinate object;
inputting the original characteristic data and the newly added characteristic data of the target dependent object into a management index prediction model, and predicting a management index value of the target dependent object in a preset time period after the second moment, wherein the management index prediction model is obtained based on the method of the first aspect.
As can be seen from the technical solutions provided in the embodiments of the present specification, the solutions provided in the embodiments of the present specification have at least one of the following technical effects: on one hand, the management index of the to-be-tested slave object is described in the dimension of the slave object, but not in the dimension of the master object; on the other hand, when the management index of the to-be-tested dependent object is described, not only the original characteristic data of the to-be-tested dependent object is considered, but also the integrated result of the management index values of at least one reference dependent object under the same main object is used as the newly-added characteristic data of the to-be-tested dependent object, so that the management index of the to-be-tested dependent object can be accurately described, and an operator can better manage the dependent object accordingly.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a schematic flowchart of a management index prediction model training method provided in an embodiment of the present disclosure.
Fig. 2 is a flowchart illustrating a management index prediction method according to an embodiment of the present disclosure.
Fig. 3 is a schematic flowchart of an account risk prediction model training method provided in an embodiment of the present specification.
Fig. 4 is a schematic diagram illustrating a method for training an account risk prediction model according to an embodiment of the present disclosure.
Fig. 5 is a schematic flowchart of an account risk prediction method provided in an embodiment of the present specification.
Fig. 6 is a schematic structural diagram of an electronic device provided in an embodiment of the present specification.
Fig. 7 is a schematic structural diagram of another electronic device provided in an embodiment of this specification.
Fig. 8 is a schematic structural diagram of a management index prediction model training device according to an embodiment of the present disclosure.
Fig. 9 is a schematic structural diagram of a management index prediction apparatus according to an embodiment of the present disclosure.
Fig. 10 is a schematic structural diagram of an account risk prediction model training device according to an embodiment of the present disclosure.
Fig. 11 is a schematic structural diagram of an account risk prediction apparatus according to an embodiment of the present disclosure.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In order to accurately describe the management indexes of the dependent objects, the embodiment of the specification provides a management index prediction model training method and device, and a management index prediction method and device. On the basis, an account risk prediction model training method and device, and an account risk prediction method and device are provided in an exemplary combination with an actual application scenario.
The method and the apparatus provided by the embodiments of the present disclosure may be executed by an electronic device, such as a terminal device or a server device. In other words, the method may be performed by software or hardware installed in the terminal device or the server device. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like. The terminal devices include but are not limited to: any one of smart terminal devices such as a smart phone, a Personal Computer (PC), a notebook computer, a tablet computer, an electronic reader, a web tv, and a wearable device.
It should be noted that, in the embodiment of the present specification, the management index prediction model training method may be regarded as a training process of the management index prediction model, and the management index prediction method may be regarded as an application process of the management index prediction model. Similarly, the account risk prediction model training method can be regarded as a training process of the account risk prediction model, and the account risk prediction method can be regarded as an application process of the account risk prediction model. As will be described separately below.
First, a management index prediction model training method provided in the embodiments of the present specification is explained.
Fig. 1 is a schematic flow chart of an implementation of a management index prediction model training method according to an embodiment of the present disclosure. As shown in fig. 1, the method may include the following steps.
And 102, acquiring batch sample data.
The batch of sample data is generated within a preset time period before the first time and/or after the first time, for example, within 30 days before the first time and/or after the first time. The preset time period refers to a time period with a fixed length, such as 30 days, 60 days, 120 days, and the like.
In the sample data, one sample data carries characteristic data of a plurality of subordinate objects under one main object, one subordinate object under one main object is a subordinate object to be detected, the rest subordinate objects are reference subordinate objects, and the subordinate object to be detected carries a label. A master object is understood to be an aggregation master of slave objects, slave objects having the same aggregation master being subordinate to the same master object.
There are many application scenarios where multiple dependent objects exist under one master object, and the following description is given by way of example. For example, users of the same identity may register multiple accounts (e.g., personal accounts, public accounts, etc.) on a platform (e.g., a third party payment platform, an e-commerce platform, a financing platform, etc.). Example two, multiple merchants contracting with the same facilitator. And in the third example, the user with the same identity opens a plurality of accounts in the bank. Example four, multiple companies registered by the same legal person, etc.
Taking account a1, account a2, account A3 and account a4 registered by a user with identity a in a third-party payment platform as an example, where account a4 may be a slave object to be tested, and account a1, account a2 and account A3 may be three reference slave objects.
The characteristic data of the dependent object may include, but is not limited to, basic properties, behavior, and the like of the dependent object. Still taking the example that the plurality of subordinate objects under the master object are a plurality of accounts registered in the third-party payment platform with the same identity, the basic attributes of the account may include, but are not limited to, the age, sex, and the like of the user who registers the account, and the performance of the account may include, but is not limited to, the single maximum transaction amount, the transaction frequency, the transaction location, the transaction time, the total transaction amount within a preset time period, and the like.
The tag of the to-be-tested dependent object under one master object may be determined based on whether the to-be-tested dependent object has a preset appearance within a preset time period before and/or after the first time, and the tag value is represented by 0 or 1. If the preset performance of the to-be-predicted account occurs in a preset time period before and/or after the first moment, the to-be-predicted subordinate object is a black seed, and the tag value of the to-be-predicted subordinate object can be 1; if the preset performance of the to-be-predicted account does not appear in the preset time period before and/or after the first time, the to-be-predicted dependent object is a white seed, and the tag value of the to-be-predicted dependent object may be 0.
Wherein the predetermined performance may be a predetermined poor performance. Still taking the example that the multiple subordinate objects under the master object are multiple accounts registered in the third-party payment platform with the same identity, the preset performance can be bad performances such as fraud, cash register, money laundering and the like. In practical application, the mode of determining whether the preset performance of the account to be tested occurs can be determined by whether the event such as a complaint and a report aiming at the account to be tested is received.
In the training process of the management index prediction model, the first time can be regarded as a prediction time. Accordingly, the training process can be viewed as: and training a prediction model capable of predicting the management index of the to-be-tested dependent object in a preset time period after the first moment based on the characteristic data before the first moment at the first moment.
And 104, determining a management index value of the reference dependent object in the sample data based on the characteristic data of the reference dependent object in the sample data and a preset model.
The preset model is obtained by training based on the characteristic data of the batch of the subordinate objects.
Optionally, as an example, before step 104, the method shown in fig. 1 may further include: and training the preset model based on the characteristic data and the label of the reference dependent object in the batch of sample data. Of course, the preset model may also be trained based on feature data and labels of other dependent objects, and this specification is not limited in particular.
For example, when the master object is a user with the same identity and the slave object is an account registered by the user with the identity on the third-party payment platform, the preset model can be obtained by training the feature data of 10 ten thousand accounts under 25000 identities.
In specific implementation, the preset model may be obtained by training based on one of a logistic regression algorithm, a decision tree algorithm, a Gradient Boosting iterative decision tree (GBDT), an eXtreme Gradient Boosting (xgboost) algorithm, and the like, and the feature data and the label of the reference dependent object in the batch of sample data.
The training process of the preset model can be regarded as a process of automatically matching the characteristics by the algorithm. Taking decision trees as an example, a feature is continuously used for clustering, such as first using age, dividing into two categories, namely >70 years old (high probability of 1) and < 70 years old (low probability of 1), and then using gender to divide male (high probability)/female (low probability). That was finally divided into 4 groups, assuming a probability of 90% for men >70 years old and a probability of 20% for women < 70 years old. The rule thus generated is a specific preset model.
After the preset model is trained, the feature data of the reference dependent object in the sample data can be imported into the preset model, and the preset model outputs a value which is usually in the range of [0, 1 ]. And the value output by the preset model is the management index value of the reference dependent object in the sample data. Taking as an example that the plurality of subordinate objects under the master object are a plurality of accounts registered in the third party payment platform with the same identity, the management index value of the reference subordinate object may be a risk assessment value of the reference account. When the risk assessment value of a reference account is close to 1, the reference account is possibly a risk account and needs risk management; when the risk assessment value of a reference account is close to 0, it indicates that the reference account may not be a risk account.
Optionally, before training the preset model, the tag of the reference dependent object in the sample data may be further determined based on the first time. Specifically, the tag of the reference dependent object in the sample data may be determined based on whether a preset performance occurs in a preset time period before and/or after the first time in the sample data. As mentioned above, the first time instant may be considered a predicted time instant.
More specifically, when the sample data is generated in a preset time period before the first time, determining a label of the reference dependent object in the sample data based on whether a preset performance of the reference dependent object in the sample data occurs in the preset time period before the first time; when the sample data is generated in a preset time period after the first time, determining a label of the reference dependent object in the sample data based on whether a preset performance occurs in the preset time period after the first time in the sample data; and when the sample data is generated in the preset time periods before and after the first time, determining the label of the reference dependent object in the sample data based on whether the preset performance of the reference dependent object in the sample data occurs in the preset time periods before and after the first time.
In accordance with the foregoing, the predetermined performance may be a predetermined poor performance. Taking the example that the plurality of subordinate objects under the master object are a plurality of accounts registered in the third-party payment platform by the same identity, the preset performance can be bad performances such as fraud, cash register, money laundering and the like. In practical application, the mode of determining whether the preset performance of the account to be tested occurs can be determined by whether the event such as a complaint and a report aiming at the account to be tested is received.
In this alternative embodiment, the labeling of the reference dependent object in the fixed sample data based on the first time is to make the label of the reference dependent object and the label of the to-be-detected dependent object have the same time window, so as to ensure that the management index prediction model is trained based on the feature data adjacent to the prediction time (the first time), thereby enhancing the prediction accuracy of the management index prediction model.
For example, taking as an example that the plurality of subordinate objects under the master object are a plurality of accounts registered in the same identity on the third party payment platform, for reference account a1, the label of a1 may be determined based on whether a1 has poor performance within 30 days before and after the first moment. Since a1 may be poorly behaved 90 to 60 days prior to the first time, it was normal for nearly 30 days. If a1 is judged as a black seed based on the performance of a1 90-60 days before the first time, a case occurs in which a1 performs normally 30 days before the first time and the normal performance is mapped to black, thereby decreasing prediction accuracy. Therefore, it is necessary to define the labels of the reference dependent objects in the sample data with the same time window (a preset time period before and/or after the first time) to improve the prediction accuracy of the trained management index prediction model.
Table 1 lists some examples of feature data and labels for training the management index prediction model, taking as an example that multiple subordinate objects under the master object are multiple accounts registered in the third party payment platform with the same identity. Referring to table 1, an account corresponds to a set of features and a label.
TABLE 1
Figure BDA0002356822810000151
And 106, integrating the management index value of at least one reference dependent object in the sample data, and taking the integrated result as the newly added feature data of the dependent object to be detected in the sample data.
Specifically, the management index value of at least one reference dependent object in the sample data may be integrated based on a preset rule to obtain an integration result.
As an example, different weights may be set for different types of reference dependent objects in advance, and then the management index value of at least one reference dependent object in the sample data is weighted to obtain an integration result. Optionally, the management index values of all the reference dependent objects in the sample data are weighted to obtain an integration result, so that the management indexes of all the reference objects in the sample data are brought into the characteristic range of the dependent object to be tested, and the prediction accuracy of the trained management index prediction model is improved.
As another example, a maximum value of the management index values of at least one reference dependent object in the sample data may be used as the integration result.
And step 108, taking the label of the to-be-detected subordinate object, the original characteristic data and the newly-added characteristic data in the sample data as input, and training a management index prediction model.
The management index prediction model is used for predicting a management index value of the to-be-tested dependent object in a preset time period after the first time. The range of the management index value may be between 0 and 1, and represents a probability that the to-be-tested dependent object has poor performance in a preset time period after the first time, or represents a probability that the to-be-tested dependent object becomes a poor dependent object in a preset time period after the first time.
In the method for training the management index prediction model provided in the embodiment of the present specification, on one hand, the management index of the to-be-tested dependent object is described in the dimension of the dependent object, but not the management index of the to-be-tested dependent object is described in the dimension of the main object; on the other hand, when the management index of the to-be-tested dependent object is described, not only the original feature data of the to-be-tested dependent object is considered, but also the integrated result of the management index values of at least one reference dependent object under the same master object is used as the newly-added feature data (which can be understood as an enhanced feature) of the to-be-tested dependent object, so that the management index of the to-be-tested dependent object can be more accurately described, and an operator can better manage the dependent object accordingly.
Optionally, on the basis of the embodiment shown in fig. 1, as shown in fig. 2, the present specification further provides a management index prediction method, where the method includes (or the method shown in fig. 1 may further include):
step 202, determining a target dependent object to be predicted.
For example, the target subordinate object may be account B4 with identity B registered on the third party payment platform.
Step 204, obtaining feature data of a plurality of subordinate objects under a target master object, wherein the plurality of subordinate objects under the target master object comprise the target subordinate object and at least one reference subordinate object.
And the characteristic data of the subordinate object under the target master object is generated in a preset time period before the second moment. The second time is later than the first time in the previous embodiment. Optionally, the preset time period in this step is the same as the preset time period in the previous embodiment, that is, the preset time period during the prediction may be consistent with the preset time period during the training, so as to ensure the accuracy of the prediction.
Following the example in the previous step, the target master object is identity B, and account B1, account B2 and account B3 registered by identity B on the third party payment platform may be three reference slave objects.
And step 206, determining a management index value of the reference slave object under the target master object based on the characteristic data and the preset model of the reference slave object under the target master object.
Consistent with the embodiment shown in fig. 1, the preset model is obtained by training based on the feature data of the batch of dependent objects, and please refer to the description of the embodiment shown in fig. 1 for details, which will not be repeated here.
It can be understood that, if the preset model is obtained by training based on one of the logistic regression algorithm, the decision tree algorithm, the GBDT, the xgboost and the like, after the feature data of the reference slave object under the target master object is imported into the preset model, the output of the preset model is the management index value of the reference slave object under the target master object.
And 208, integrating the management index value of at least one reference slave object under the target master object, and taking the integrated result as the added feature data of the target slave object.
As with the embodiment shown in fig. 1, the management index values of at least one reference dependent object under the target master object may be integrated based on the preset rule to obtain an integrated result.
As an example, different weights may be set for different types of reference dependent objects in advance, and then the management index value of at least one reference dependent object under the target master object is weighted to obtain the integration result. Optionally, the management index values of all reference dependent objects under the target master object are weighted to obtain an integration result, so as to further improve the prediction accuracy for the target dependent object.
As another example, the maximum value among the management index values of the at least one reference dependent object under the target master object may be taken as the integration result.
Step 210, inputting the original feature data and the newly added feature data of the target dependent object into a management index prediction model, and predicting a management index value of the target dependent object in a preset time period after the second moment.
The management index prediction model is obtained by training based on the method shown in fig. 1.
Optionally, the method may further determine whether a preset performance of the target dependent object occurs within a preset period after the second time based on the predicted management index value. Wherein the predetermined performance is a predetermined poor performance. In general, the range of the management index value may be between 0 and 1, which represents the probability that the target dependent object has poor performance in a preset period after the second time (or represents the probability that the target dependent object becomes a poor dependent object in a preset period after the second time), and the closer the value is to 1, the higher the probability that the poor performance occurs is, and the lower the probability is otherwise.
In an embodiment of the present specification, in a management index prediction method, on one hand, a management index of a target dependent object is described in a dimension of a dependent object, but not a management index of a target dependent object is described in a dimension of a target master object; on the other hand, when the management index of the target dependent object is described, not only the original feature data of the target dependent object is considered, but also the integrated result of the management index values of at least one reference dependent object under the target master object is used as the added feature data (which can be understood as an enhanced feature) of the target dependent object, so that the management index of the target dependent object can be described more accurately, and the operator can manage the target dependent object better accordingly.
An account risk prediction model training method and an account risk prediction method proposed in combination with an actual application scenario are respectively explained below. It should be noted that the two methods can be applied to any platform that establishes a membership hierarchy.
As shown in fig. 3, an account risk prediction model training method provided by an embodiment of the present disclosure may include the following steps:
and step 302, acquiring batch sample data.
The batch of sample data is generated in a preset time period before and/or after the first time, in the batch of sample data, one sample data carries characteristic data of a plurality of accounts under one identity, one account under one identity is an account to be tested, the other accounts are reference accounts, and the account to be tested carries a label.
As shown in fig. 4, one sample data may carry feature data of account a1, account a2, account A3, and account a4 under identity a, account 4 under identity a is the account to be tested, and account a1, account a2, and account A3 are three reference accounts.
In practical applications, the accounts under the same identity may be determined based on authentication information provided when the user registers the account (e.g., personal identification number, mobile phone number, bank card number, etc. provided during real-name authentication).
And 304, determining a risk assessment value of the reference account in the sample data based on the characteristic data of the reference account in the sample data and a preset model.
The preset model is obtained by training based on the characteristic data of the batch accounts. After the preset model is trained, the feature data of the reference account in the sample data can be imported into the preset model, and the preset model outputs a value which is usually in the range of [0, 1 ]. And the value output by the preset model is the risk assessment value of the reference account in the sample data.
Optionally, before step 304, the method shown in fig. 3 may further include: training a preset model based on the feature data and the labels of the reference accounts in the batch sample data in the step 302.
Optionally, before training the preset model, the method shown in fig. 3 may further include: determining a label of a reference account in the sample data based on the first time instant. Specifically, the label of the reference account in the sample data may be determined based on whether risk performance occurs in a preset time period before and/or after the first time in the reference account in the sample data.
More specifically, when the sample data is generated in a preset time period before the first time, determining a label of the reference account in the sample data based on whether a preset performance of the reference account in the sample data occurs in the preset time period before the first time; when the sample data is generated in a preset time period after the first moment, determining a label of the reference account in the sample data based on whether a preset performance occurs in the preset time period after the first moment in the sample data; and when the sample data is generated in the preset time periods before and after the first moment, determining the label of the reference account in the sample data based on whether the preset performance of the reference account in the sample data occurs in the preset time periods before and after the first moment.
The preset performance may be a preset undesirable performance, such as fraud, cash register, money laundering, etc. In practical application, the mode of determining whether the preset performance occurs in the account to be tested can be determined by whether an event such as a complaint and a report aiming at the account to be tested is received.
In this alternative embodiment, the labeling of the reference dependent object in the fixed sample data based on the first time is to make the label of the reference dependent object and the label of the to-be-detected dependent object have the same time window, so as to ensure that the management index prediction model is trained based on the feature data adjacent to the prediction time (the first time), thereby enhancing the prediction accuracy of the management index prediction model.
And step 306, integrating the risk assessment value of at least one reference account in the sample data, and taking an integrated result as the newly added feature data of the account to be detected in the sample data.
Specifically, the risk assessment value of at least one reference account in the sample data may be integrated based on a preset rule, so as to obtain an integration result.
As an example, different weights may be set for different types of reference accounts in advance, and then the risk assessment value of at least one reference account in the sample data is subjected to weighted calculation to obtain an integration result. Optionally, performing weighted calculation on the risk assessment values of all reference accounts in the sample data to obtain an integration result.
As another example, a maximum value of the risk assessment values of at least one reference account in the sample data may be used as the integration result.
And 308, taking the label of the account to be tested, the original characteristic data and the newly added characteristic data in the sample data as input, and training a risk prediction model, wherein the risk prediction model is used for predicting a risk assessment value of the account to be tested in a preset time period after the first time.
The risk assessment prediction model is used for predicting a risk assessment value of the account to be tested in a preset time period after the first time. The value range of the risk assessment value can be between 0 and 1, and represents the probability that the account to be tested has bad performance in a preset time period after the first time, or represents the probability that the account to be tested becomes a bad account in a preset time period after the first time.
As shown in fig. 4, assume that the identity a includes an account a1, an account a2, an account A3, and an account a4, where the account a4 is an account to be tested, and the account a1, the account a2, and the account A3 are three reference accounts, feature data of the account a1, the account a2, and the account A3 may be first imported into a preset model, so as to obtain a risk assessment value 1, a risk assessment value 2, and a risk assessment value 3; then integrating the risk assessment value 1, the risk assessment value 2 and the risk assessment value 3 to obtain an integration result, and using the integration result as the newly added feature data of the account A4; and finally, taking the label of the account A4, the original characteristic data and the newly added characteristic book data as input, and training a risk prediction model.
In the method for training the account risk prediction model provided by the embodiment of the specification, on one hand, the risk assessment value of the account to be tested is described in the dimension of the account, but not in the dimension of the identity; on the other hand, when the risk assessment value of the account to be tested is described, not only the original feature data of the account to be tested is considered, but also the integration result of the risk assessment value of at least one reference account in the same identity is used as the newly added feature data (which can be understood as an enhanced feature) of the account to be tested, so that the risk assessment value of the account to be tested can be described more accurately, and an operator can manage the account better accordingly.
Optionally, on the basis of the embodiment shown in fig. 3, as shown in fig. 5, the present specification further provides an account risk prediction method, which includes (that is, the method shown in fig. 1 may further include):
step 502, determining a target account to be predicted.
For example, the target account may be account B4 with identity B registered on the third party payment platform.
Step 504, obtaining feature data of a plurality of accounts under the target identity, wherein the plurality of accounts under the target identity comprise the target account and at least one reference account.
The characteristic data of the account under the target identity is generated in a preset time period before a second time, wherein the second time is later than the first time. Optionally, the preset time period in this step is the same as the preset time period in the previous embodiment, that is, the preset time period during the prediction may be consistent with the preset time period during the training, so as to ensure the accuracy of the prediction.
Following the example in the previous step, the target master object is identity B, and account B1, account B2 and account B3 registered by identity B on the third party payment platform are three reference slave objects.
Step 506, determining a risk assessment value of the reference account under the target identity based on the feature data of the reference account under the target identity and a preset model.
Consistent with the embodiment shown in fig. 3, the preset model is trained based on the feature data of the batch accounts.
It can be understood that if the preset model is obtained by training based on one of the logistic regression algorithm, the decision tree algorithm, the GBDT, the xgboost and the like, the output of the preset model is the risk assessment value of the reference account in the target identity after the feature data of the reference account in the target identity is imported into the preset model.
And step 508, integrating the risk assessment values of at least one reference account under the target identity, and taking the integrated result as the newly added feature data of the target account.
Like the embodiment shown in fig. 3, the risk assessment values of at least one reference account under the target identity may be integrated based on a preset rule, so as to obtain an integrated result.
As an example, different weights may be set for different accounts in advance, and then the risk assessment value of at least one reference account under the target identity is weighted to obtain the integration result. Optionally, the risk assessment values of all reference accounts under the target identity are weighted to obtain an integration result, so as to further improve the prediction accuracy for the target account.
As another example, the maximum value of the risk assessment values of the at least one reference account under the target identity may be used as the integration result.
And step 510, inputting the original characteristic data and the newly added characteristic data of the target account into a risk prediction model, and predicting a risk assessment value of the target account in a preset time period after the second moment.
Wherein, the risk prediction model is trained based on the method shown in fig. 3.
Optionally, the method may further determine whether the target account will exhibit a preset performance within a preset time period after the second time based on the predicted risk assessment value. Wherein the predetermined performance is a predetermined poor performance. In general, the value of the risk assessment value may range from 0 to 1, and represents the probability that the target account will perform badly within a preset time period after the second time, or represents the probability that the target account will become a bad account within a preset time period after the second time.
In the method for predicting the risk of the account provided by the embodiment of the specification, on one hand, the risk assessment value of the target account is described in the dimension of the account, but not in the dimension of the identity; on the other hand, when the risk assessment value of the target account is described, not only the original feature data of the target account is considered, but also the integrated result of the risk assessment values of at least one reference account in the same identity is used as the newly added feature data (which can be understood as an enhanced feature) of the target account, so that the risk assessment value of the target account can be described more accurately, and an operator can manage the target account better accordingly.
The above is a description of embodiments of the method provided in this specification, and the electronic device provided in this specification is described below.
Fig. 6 is a schematic structural diagram of an electronic device provided in an embodiment of the present specification. Referring to fig. 6, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.
The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 6, but that does not indicate only one bus or one type of bus.
And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.
The processor reads a corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the prediction model training device on a logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:
acquiring batch sample data, wherein the sample data is generated in a preset time period before and/or after a first time, one sample data carries characteristic data of a plurality of subordinate objects under a main object, one subordinate object under one main object is a subordinate object to be detected, the other subordinate objects are reference subordinate objects, and the subordinate objects to be detected carry tags;
determining a management index value of the reference dependent object in the sample data based on the characteristic data of the reference dependent object in the sample data and a preset model, wherein the preset model is obtained based on the characteristic data training of batch dependent objects;
integrating the management index value of at least one reference dependent object in the sample data, and taking the integrated result as the newly added characteristic data of the dependent object to be detected in the sample data;
and taking the label of the to-be-detected subordinate object, the original characteristic data and the newly-added characteristic data in the sample data as input, and training a management index prediction model, wherein the management index prediction model is used for predicting a management index value of the to-be-detected subordinate object in a preset time period after the first time.
The above-mentioned prediction model training method disclosed in the embodiment of fig. 1 or fig. 3 of the present specification may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in one or more embodiments of the present specification may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with one or more embodiments of the present disclosure may be embodied directly in hardware, in a software module executed by a hardware decoding processor, or in a combination of the hardware and software modules executed by a hardware decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
The electronic device may further perform the predictive model training method provided in fig. 1 or fig. 3, which is not described herein again.
Fig. 7 is a schematic structural diagram of another electronic device provided in an embodiment of the present specification. The electronic device is different from the electronic device shown in fig. 6 in that the processor executes the program stored in the memory, and the processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the prediction device on a logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:
determining a target dependent object to be predicted;
acquiring feature data of a plurality of subordinate objects under a target main object, wherein the plurality of subordinate objects under the target main object comprise the target subordinate object and at least one reference subordinate object, and the feature data of the subordinate objects under the target main object are generated in a preset time period before a second moment;
determining a management index value of a reference slave object under the target master object based on the characteristic data of the reference slave object under the target master object and a preset model, wherein the preset model is obtained based on the characteristic data training of batch slave objects;
integrating the management index value of at least one reference subordinate object under the target main object, and taking the integrated result as the newly added characteristic data of the target subordinate object;
inputting the original characteristic data and the newly added characteristic data of the target dependent object into a management index prediction model, and predicting a management index value of the target dependent object in a preset time period after the second moment, wherein the management index prediction model is obtained by training based on a management index prediction model training method provided by the embodiment of the specification.
Of course, besides the software implementation, the electronic device in this specification does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or logic devices.
Embodiments of the present specification also propose a computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a portable electronic device comprising a plurality of application programs, are capable of causing the portable electronic device to perform the method of the embodiment shown in fig. 1, and in particular to perform the following:
acquiring batch sample data, wherein the sample data is generated in a preset time period before and/or after a first time, one sample data carries characteristic data of a plurality of subordinate objects under a main object, one subordinate object under one main object is a subordinate object to be detected, the other subordinate objects are reference subordinate objects, and the subordinate objects to be detected carry tags;
determining a management index value of the reference dependent object in the sample data based on the characteristic data of the reference dependent object in the sample data and a preset model, wherein the preset model is obtained based on the characteristic data training of batch dependent objects;
integrating the management index value of at least one reference dependent object in the sample data, and taking the integrated result as the newly added characteristic data of the dependent object to be detected in the sample data;
and taking the label of the to-be-detected subordinate object, the original characteristic data and the newly-added characteristic data in the sample data as input, and training a management index prediction model, wherein the management index prediction model is used for predicting a management index value of the to-be-detected subordinate object in a preset time period after the first time.
This specification embodiment also proposes a computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a portable electronic device comprising a plurality of application programs, are capable of causing the portable electronic device to perform the method of the embodiment shown in fig. 2, and in particular to perform the following operations:
determining a target dependent object to be predicted;
acquiring feature data of a plurality of subordinate objects under a target main object, wherein the plurality of subordinate objects under the target main object comprise the target subordinate object and at least one reference subordinate object, and the feature data of the subordinate objects under the target main object are generated in a preset time period before a second moment;
determining a management index value of a reference slave object under the target master object based on the characteristic data of the reference slave object under the target master object and a preset model, wherein the preset model is obtained based on the characteristic data training of batch slave objects;
integrating the management index value of at least one reference subordinate object under the target main object, and taking the integrated result as the newly added characteristic data of the target subordinate object;
inputting the original characteristic data and the newly added characteristic data of the target dependent object into a management index prediction model, and predicting a management index value of the target dependent object in a preset time period after the second moment, wherein the management index prediction model is obtained by training based on a management index prediction model training method provided by the embodiment of the specification.
The following is a description of the apparatus provided in this specification.
As shown in fig. 8, an embodiment of the present specification provides a management index prediction model training apparatus 800, and in one software implementation, the apparatus 800 may include: a first data acquisition module 801, a first index determination module 802, a first index integration module 803, and a predictive model training module 804.
The first data obtaining module 801 is configured to obtain batch sample data, where the sample data is generated in a preset time period before and/or after a first time, one sample data carries characteristic data of multiple dependent objects under a master object, one dependent object under one master object is a to-be-detected dependent object, the remaining dependent objects are reference dependent objects, and the to-be-detected dependent object carries a tag.
A first index determining module 802, configured to determine a management index value of a reference dependent object in the sample data based on feature data of the reference dependent object in the sample data and a preset model, where the preset model is obtained by training based on feature data of batch dependent objects.
The first index integration module 803 integrates the management index value of at least one reference dependent object in the sample data, and uses the integration result as the new feature data of the dependent object to be detected in the sample data.
A prediction model training module 804, configured to train a management index prediction model by using, as inputs, the label of the to-be-detected dependent object, the original feature data, and the newly-added feature data in the sample data, where the management index prediction model is used to predict a management index value of the to-be-detected dependent object in a preset time period after the first time.
It should be noted that the management index prediction model training apparatus 800 can implement the method of fig. 1 and achieve the same technical effects, and the detailed content can refer to the method shown in fig. 1 and will not be described again.
As shown in fig. 9, an embodiment of the present specification provides a management index prediction apparatus 900, and in one software implementation, the apparatus 900 may include: a target determination module 901, a second data acquisition module 902, a second index determination module 903, a second index integration module 904, and a management index prediction module 905.
A target determining module 901, configured to determine a target dependent object to be predicted.
A second data obtaining module 902, configured to obtain feature data of multiple subordinate objects under a target master object, where the multiple subordinate objects under the target master object include the target subordinate object and at least one reference subordinate object, and the feature data of the subordinate object under the target master object is generated in a preset time period before a second time.
A second index determining module 903, configured to determine a management index value of the reference dependent object under the target master object based on the feature data of the reference dependent object under the target master object and a preset model, where the preset model is obtained by training based on the feature data of batch dependent objects.
A second index integration module 904, configured to integrate the management index values of at least one reference dependent object under the target master object, and use the integration result as the new feature data of the target dependent object.
A management index prediction module 905, configured to input the original feature data and the newly added feature data of the target dependent object into a management index prediction model, and predict a management index value of the target dependent object in a preset time period after the second time.
The management index prediction model is obtained by training based on the method shown in fig. 1.
It should be noted that the management indicator prediction apparatus 900 can implement the method of fig. 2 and achieve the same technical effects, and the detailed contents refer to the method shown in fig. 2 and are not repeated.
As shown in fig. 10, an embodiment of the present specification provides an account risk prediction model training apparatus 1000, and in one software implementation, the apparatus 1000 may include: a first account data acquisition module 1001, a first risk value determination module 1002, a first risk value integration module 1003 and a risk prediction model training module 1004.
The first account data obtaining module 1001, where the sample data is generated in a preset time period before and/or after the first time, one sample data carries feature data of multiple accounts under one identity, one account under one identity is an account to be tested, the other accounts are reference accounts, and the account to be tested carries a tag.
A first risk value determining module 1002, configured to determine a risk assessment value of a reference account in the sample data based on feature data of the reference account in the sample data and a preset model, where the preset model is obtained by training based on feature data of batch accounts.
A first risk value integration module 1003, configured to integrate the risk assessment value of at least one reference account in the sample data, and use an integration result as new feature data of an account to be tested in the sample data.
A risk prediction model training module 1004, configured to train a risk prediction model using the label of the account to be tested, the original feature data, and the newly added feature data in the sample data as inputs, where the risk prediction model is used to predict a risk assessment value of the account to be tested in a preset time period after the first time.
It should be noted that the account risk prediction model training apparatus 1000 can implement the method shown in fig. 3 and obtain the same technical effect, and the detailed content may refer to the method shown in fig. 3 and is not described again.
As shown in fig. 11, one embodiment of the present specification provides an account risk prediction apparatus 1100, and in one software implementation, the apparatus 1100 may include: a target account determination module 1101, a second account data acquisition module 1102, a second risk value determination module 1103, a second risk value integration module 1104, and a risk prediction module 1105.
A target account determining module 1101, configured to determine a target account to be predicted.
A second account data obtaining module 1102, configured to obtain feature data of multiple accounts under a target identity, where the multiple accounts under the target identity include the target account and at least one reference account, and the feature data of the accounts under the target identity is generated in a preset time period before a second time.
A second risk value determining module 1103, configured to determine a risk assessment value of the reference account under the target identity based on the feature data of the reference account under the target identity and a preset model, where the preset model is trained based on the feature data of the batch accounts.
And a second risk value integration module 1104, configured to integrate the risk assessment values of at least one reference account in the target identity, and use an integration result as the new feature data of the target account.
A risk prediction module 1105, configured to input the original feature data and the newly added feature data of the target account into a risk prediction model, and predict a risk assessment value of the target account in a preset time period after the second time.
Wherein, the risk prediction model is trained based on the method shown in fig. 3.
It should be noted that the account risk prediction apparatus 1100 can implement the method shown in fig. 5 and achieve the same technical effects, and the detailed contents refer to the method shown in fig. 5 and are not repeated.
While certain embodiments of the present disclosure have been described above, other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
In short, the above description is only a preferred embodiment of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of one or more embodiments of the present disclosure should be included in the scope of protection of one or more embodiments of the present disclosure.
The systems, apparatuses, modules or units described in the above embodiments may be specifically implemented by a computer chip or an entity, or implemented by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Claims (19)

1. A management index prediction model training method comprises the following steps:
acquiring batch sample data, wherein the sample data is generated in a preset time period before and/or after a first time, one sample data carries characteristic data of a plurality of subordinate objects under a main object, one subordinate object under one main object is a subordinate object to be detected, the other subordinate objects are reference subordinate objects, and the subordinate objects to be detected carry tags;
determining a management index value of a reference dependent object in the sample data based on the characteristic data of the reference dependent object in the sample data and a preset model, wherein the preset model is obtained based on the characteristic data training of batch dependent objects;
integrating the management index value of at least one reference dependent object in the sample data, and taking the integrated result as the newly added characteristic data of the dependent object to be detected in the sample data;
and taking the label of the to-be-detected subordinate object, the original characteristic data and the newly-added characteristic data in the sample data as input, and training a management index prediction model, wherein the management index prediction model is used for predicting a management index value of the to-be-detected subordinate object in a preset time period after the first time.
2. The method according to claim 1, further comprising, before said determining a management index value for a reference dependent object in said sample data based on feature data of the reference dependent object in said sample data and a preset model:
and training the preset model based on the characteristic data and the label of the reference dependent object in the batch of sample data.
3. The method of claim 2, prior to training the preset model, further comprising:
determining a tag of a reference dependent object in the sample data based on the first time instant.
4. The method of claim 3, wherein said determining a tag of a reference dependent object in said sample data based on said first time instance comprises:
and determining the label of the reference dependent object in the sample data based on whether the preset performance of the reference dependent object in the sample data occurs in a preset time period before and/or after the first time.
5. The method according to any of claims 1-4, wherein said integrating the management index value of at least one reference dependent object in said sample data comprises:
and carrying out weighted calculation on the management index value of at least one reference dependent object in the sample data.
6. The method of claim 1, further comprising:
determining a target dependent object to be predicted;
acquiring feature data of a plurality of subordinate objects under a target main object, wherein the plurality of subordinate objects under the target main object comprise the target subordinate object and at least one reference subordinate object, and the feature data of the subordinate objects under the target main object are generated in a preset time period before a second moment;
determining a management index value of a reference subordinate object under the target main object based on the characteristic data of the reference subordinate object under the target main object and the preset model;
integrating the management index value of at least one reference subordinate object under the target main object, and taking the integrated result as the newly added characteristic data of the target subordinate object;
and inputting the original characteristic data and the newly added characteristic data of the target dependent object into the management index prediction model, and predicting the management index value of the target dependent object in a preset time period after the second moment.
7. The method of claim 6, wherein said integrating the management index value of the at least one reference slave object under the target master object comprises:
and carrying out weighted calculation on the management index value of at least one reference slave object under the target master object.
8. The method of claim 6 or 7, further comprising:
and determining whether the target subordinate object has preset performance within a preset time period after the second moment or not based on the predicted management index value.
9. A management index prediction method includes:
determining a target dependent object to be predicted;
acquiring feature data of a plurality of subordinate objects under a target main object, wherein the plurality of subordinate objects under the target main object comprise the target subordinate object and at least one reference subordinate object, and the feature data of the subordinate objects under the target main object are generated in a preset time period before a second moment;
determining a management index value of a reference slave object under the target master object based on the characteristic data of the reference slave object under the target master object and a preset model, wherein the preset model is obtained based on the characteristic data training of batch slave objects;
integrating the management index value of at least one reference subordinate object under the target main object, and taking the integrated result as the newly added characteristic data of the target subordinate object;
inputting the original characteristic data and the newly added characteristic data of the target dependent object into a management index prediction model, and predicting a management index value of the target dependent object in a preset time period after the second moment, wherein the management index prediction model is obtained by training based on the method of any one of claims 1 to 5.
10. An account risk prediction model training method comprises the following steps:
acquiring batch sample data, wherein the sample data is generated in a preset time period before and/or after a first moment, one sample data carries characteristic data of a plurality of accounts under one identity, one account under one identity is an account to be detected, the other accounts are reference accounts, and the account to be detected carries a label;
determining a risk evaluation value of a reference account in the sample data based on the characteristic data of the reference account in the sample data and a preset model, wherein the preset model is obtained based on characteristic data training of batch accounts;
integrating the risk assessment value of at least one reference account in the sample data, and taking an integration result as newly-added characteristic data of an account to be detected in the sample data;
and taking the label of the account to be tested, the original characteristic data and the newly added characteristic data in the sample data as input, and training a risk prediction model, wherein the risk prediction model is used for predicting a risk assessment value of the account to be tested in a preset time period after the first time.
11. An account risk prediction method, comprising:
determining a target account to be predicted;
acquiring feature data of a plurality of accounts under a target identity, wherein the plurality of accounts under the target identity comprise the target account and at least one reference account, and the feature data of the accounts under the target identity are generated in a preset time period before a second moment;
determining a risk assessment value of the reference account under the target identity based on the feature data of the reference account under the target identity and a preset model, wherein the preset model is obtained based on feature data training of batch accounts;
integrating the risk assessment value of at least one reference account under the target identity, and taking an integration result as newly-added feature data of the target account;
inputting the original characteristic data and the newly added characteristic data of the target account into a risk prediction model, and predicting a risk assessment value of the target account in a preset time period after the second moment, wherein the risk prediction model is obtained by training based on the method of claim 10.
12. A management index prediction model training apparatus, comprising:
the system comprises a first data acquisition module, a second data acquisition module and a third data acquisition module, wherein the first data acquisition module is used for acquiring batch sample data, one sample data is generated in a preset time period before and/or after a first time, one sample data carries characteristic data of a plurality of subordinate objects under a main object, one subordinate object under one main object is a subordinate object to be detected, the other subordinate objects are reference subordinate objects, and the subordinate object to be detected carries a label;
the first index determining module is used for determining a management index value of the reference dependent object in the sample data based on the characteristic data of the reference dependent object in the sample data and a preset model, wherein the preset model is obtained based on the characteristic data training of batch dependent objects;
the first index integration module is used for integrating the management index value of at least one reference dependent object in the sample data and taking the integration result as the newly added feature data of the dependent object to be detected in the sample data;
and the prediction model training module is used for taking the label of the to-be-detected subordinate object, the original characteristic data and the newly-added characteristic data in the sample data as input and training a management index prediction model, wherein the management index prediction model is used for predicting a management index value of the to-be-detected subordinate object in a preset time period after the first time.
13. A management index prediction apparatus comprising:
the target determining module is used for determining a target dependent object to be predicted;
the second data acquisition module is used for acquiring the characteristic data of a plurality of subordinate objects under a target main object, wherein the plurality of subordinate objects under the target main object comprise the target subordinate object and at least one reference subordinate object, and the characteristic data of the subordinate objects under the target main object is generated in a preset time period before a second moment;
the second index determining module is used for determining a management index value of a reference slave object under the target master object based on the feature data of the reference slave object under the target master object and a preset model, wherein the preset model is obtained based on the feature data training of batch slave objects;
the second index integration module is used for integrating the management index values of at least one reference subordinate object under the target master object, and taking the integration result as the newly added feature data of the target subordinate object;
a management index prediction module, configured to input original feature data and newly added feature data of the target dependent object into a management index prediction model, and predict a management index value of the target dependent object in a preset time period after the second time, where the management index prediction model is obtained based on the method in any one of claims 1 to 5.
14. An account risk prediction model training device, comprising:
the method comprises the steps that a first account data acquisition module, wherein sample data is generated in a preset time period before and/or after a first moment, one sample data carries characteristic data of a plurality of accounts under one identity, one account under one identity is an account to be tested, the rest accounts are reference accounts, and the account to be tested carries a label;
the first risk value determination module is used for determining a risk assessment value of a reference account in the sample data based on the characteristic data of the reference account in the sample data and a preset model, wherein the preset model is obtained based on the characteristic data training of batch accounts;
the first risk value integration module is used for integrating the risk evaluation value of at least one reference account in the sample data and taking the integration result as the newly added feature data of the account to be detected in the sample data;
and the risk prediction model training module is used for taking the label of the account to be tested, the original characteristic data and the newly added characteristic data in the sample data as input and training a risk prediction model, wherein the risk prediction model is used for predicting the risk assessment value of the account to be tested in a preset time period after the first time.
15. An account risk prediction apparatus comprising:
the target account determining module is used for determining a target account to be predicted;
the second account data acquisition module is used for acquiring the feature data of a plurality of accounts under the target identity, wherein the plurality of accounts under the target identity comprise the target account and at least one reference account, and the feature data of the accounts under the target identity are generated in a preset time period before the second moment;
the second risk value determination module is used for determining a risk assessment value of the reference account under the target identity based on the feature data of the reference account under the target identity and a preset model, wherein the preset model is obtained based on the feature data of batch accounts through training;
the second risk value integration module is used for integrating the risk evaluation value of at least one reference account under the target identity and taking the integration result as the newly added feature data of the target account;
a risk prediction module, configured to input the original feature data and the newly added feature data of the target account into a risk prediction model, and predict a risk assessment value of the target account in a preset time period after the second time, where the risk prediction model is obtained by training based on the method of claim 10.
16. An electronic device, comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
acquiring batch sample data, wherein the sample data is generated in a preset time period before and/or after a first time, one sample data carries characteristic data of a plurality of subordinate objects under a main object, one subordinate object under one main object is a subordinate object to be detected, the other subordinate objects are reference subordinate objects, and the subordinate objects to be detected carry tags;
determining a management index value of the reference dependent object in the sample data based on the characteristic data of the reference dependent object in the sample data and a preset model, wherein the preset model is obtained based on the characteristic data training of batch dependent objects;
integrating the management index value of at least one reference dependent object in the sample data, and taking the integrated result as the newly added characteristic data of the dependent object to be detected in the sample data;
and taking the label of the to-be-detected subordinate object, the original characteristic data and the newly-added characteristic data in the sample data as input, and training a management index prediction model, wherein the management index prediction model is used for predicting a management index value of the to-be-detected subordinate object in a preset time period after the first time.
17. A computer-readable storage medium storing one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to:
acquiring batch sample data, wherein the sample data is generated in a preset time period before and/or after a first time, one sample data carries characteristic data of a plurality of subordinate objects under a main object, one subordinate object under one main object is a subordinate object to be detected, the other subordinate objects are reference subordinate objects, and the subordinate objects to be detected carry tags;
determining a management index value of the reference dependent object in the sample data based on the characteristic data of the reference dependent object in the sample data and a preset model, wherein the preset model is obtained based on the characteristic data training of batch dependent objects;
integrating the management index value of at least one reference dependent object in the sample data, and taking the integrated result as the newly added characteristic data of the dependent object to be detected in the sample data;
and taking the label of the to-be-detected subordinate object, the original characteristic data and the newly-added characteristic data in the sample data as input, and training a management index prediction model, wherein the management index prediction model is used for predicting a management index value of the to-be-detected subordinate object in a preset time period after the first time.
18. An electronic device, comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
acquiring batch sample data, wherein the sample data is generated in a preset time period before and/or after a first moment, one sample data carries characteristic data of a plurality of accounts under one identity, one account under one identity is an account to be detected, the other accounts are reference accounts, and the account to be detected carries a label;
determining a risk assessment value of a reference account in the sample data based on the characteristic data of the reference account in the sample data and a preset model, wherein the preset model is obtained based on the characteristic data training of batch accounts;
integrating the risk assessment value of at least one reference account in the sample data, and taking an integration result as newly-added characteristic data of an account to be detected in the sample data;
and taking the label of the account to be tested, the original characteristic data and the newly added characteristic data in the sample data as input, and training a risk prediction model, wherein the risk prediction model is used for predicting a risk assessment value of the account to be tested in a preset time period after the first time.
19. A computer-readable storage medium storing one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to:
acquiring batch sample data, wherein the sample data is generated in a preset time period before and/or after a first moment, one sample data carries characteristic data of a plurality of accounts under one identity, one account under one identity is an account to be detected, the other accounts are reference accounts, and the account to be detected carries a label;
determining a risk assessment value of a reference account in the sample data based on the characteristic data of the reference account in the sample data and a preset model, wherein the preset model is obtained based on the characteristic data training of batch accounts;
integrating the risk assessment value of at least one reference account in the sample data, and taking an integration result as newly-added characteristic data of an account to be detected in the sample data;
and taking the label of the account to be tested, the original characteristic data and the newly added characteristic data in the sample data as input, and training a risk prediction model, wherein the risk prediction model is used for predicting a risk assessment value of the account to be tested in a preset time period after the first time.
CN202010010075.3A 2020-01-06 2020-01-06 Prediction model training method, prediction device and electronic equipment Active CN111275071B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010010075.3A CN111275071B (en) 2020-01-06 2020-01-06 Prediction model training method, prediction device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010010075.3A CN111275071B (en) 2020-01-06 2020-01-06 Prediction model training method, prediction device and electronic equipment

Publications (2)

Publication Number Publication Date
CN111275071A CN111275071A (en) 2020-06-12
CN111275071B true CN111275071B (en) 2022-06-10

Family

ID=71111849

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010010075.3A Active CN111275071B (en) 2020-01-06 2020-01-06 Prediction model training method, prediction device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111275071B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112348520A (en) * 2020-10-21 2021-02-09 上海淇玥信息技术有限公司 XGboost-based risk assessment method and device and electronic equipment
CN113052579B (en) * 2021-04-23 2021-12-07 深圳市亚飞电子商务有限公司 Payment method and system of mobile payment platform

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9973522B2 (en) * 2016-07-08 2018-05-15 Accenture Global Solutions Limited Identifying network security risks
CN107256465A (en) * 2017-06-28 2017-10-17 阿里巴巴集团控股有限公司 The recognition methods of adventure account and device
CN109934697A (en) * 2017-12-15 2019-06-25 阿里巴巴集团控股有限公司 A kind of credit risk control method, device and equipment based on graph structure model
CN108985929B (en) * 2018-06-11 2022-04-08 创新先进技术有限公司 Training method, business data classification processing method and device, and electronic equipment
CN109657696B (en) * 2018-11-05 2023-06-30 创新先进技术有限公司 Multi-task supervised learning model training and predicting method and device
CN110009174B (en) * 2018-12-13 2020-11-06 创新先进技术有限公司 Risk recognition model training method and device and server
CN110009359A (en) * 2019-01-22 2019-07-12 阿里巴巴集团控股有限公司 Training method, update method and the device of unsupervised risk prevention system model
CN110147823B (en) * 2019-04-16 2023-04-07 创新先进技术有限公司 Wind control model training method, device and equipment

Also Published As

Publication number Publication date
CN111275071A (en) 2020-06-12

Similar Documents

Publication Publication Date Title
CN109063985B (en) Business risk decision method and device
CN109347787B (en) Identity information identification method and device
CN108763952B (en) Data classification method and device and electronic equipment
CN109034583B (en) Abnormal transaction identification method and device and electronic equipment
TWI743773B (en) Method and device for identifying abnormal collection behavior based on privacy data protection
CN110874491B (en) Privacy data processing method and device based on machine learning and electronic equipment
CN109598414B (en) Risk assessment model training, risk assessment method and device and electronic equipment
CN108550046B (en) Resource and marketing recommendation method and device and electronic equipment
CN111275071B (en) Prediction model training method, prediction device and electronic equipment
CN109064217B (en) User level-based core body strategy determination method and device and electronic equipment
CN111582872A (en) Abnormal account detection model training method, abnormal account detection device and abnormal account detection equipment
CN113407854A (en) Application recommendation method, device and equipment and computer readable storage medium
CN110008986B (en) Batch risk case identification method and device and electronic equipment
CN110058992B (en) Text template effect feedback method and device and electronic equipment
CN109903166B (en) Data risk prediction method, device and equipment
CN108492112B (en) Method and device for judging false resource transfer and false transaction and electronic equipment
CN112184143B (en) Model training method, device and equipment in compliance audit rule
CN110334936B (en) Method, device and equipment for constructing credit qualification scoring model
CN107038377B (en) Website authentication method and device and website credit granting method and device
CN113297462A (en) Data processing method, device, equipment and storage medium
CN111461892A (en) Method and device for selecting derived variables of risk identification model
CN113362137B (en) Insurance product recommendation method and device, terminal equipment and storage medium
CN113723522B (en) Abnormal user identification method and device, electronic equipment and storage medium
CN113283978B (en) Financial risk assessment method based on biological basis, behavioral characteristics and business characteristics
CN112214387B (en) Knowledge graph-based user operation behavior prediction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant