CN112766558A - Modeling sample generation method, device, equipment and computer readable storage medium - Google Patents

Modeling sample generation method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN112766558A
CN112766558A CN202110045708.9A CN202110045708A CN112766558A CN 112766558 A CN112766558 A CN 112766558A CN 202110045708 A CN202110045708 A CN 202110045708A CN 112766558 A CN112766558 A CN 112766558A
Authority
CN
China
Prior art keywords
modeling
sample
initial
modeling sample
time point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110045708.9A
Other languages
Chinese (zh)
Inventor
张鹏
陈婷
吴三平
庄伟亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202110045708.9A priority Critical patent/CN112766558A/en
Publication of CN112766558A publication Critical patent/CN112766558A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Educational Administration (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of financial technology (Fintech). The invention discloses a modeling sample generation method, a device, equipment and a medium, which collects a latest sample variable set of a modeling sample client at the latest time point on the basis of an initial modeling sample set, weights the initial modeling sample set by using variable distribution information of the latest sample variable set, so that a final weighted modeling sample set can be obtained by fully combining expression data of an old time point corresponding to the initial modeling sample set and variable distribution information of the latest time point, thereby simulating data distribution of the latest time point at an observation point, modeling is carried out by combining an expression period, further the variable weights in a model trained based on the weighted modeling sample set can be close to the data distribution of the latest time point, the variable weights in the finally trained model are also close to the data distribution of the latest time point, and therefore, the method has higher client risk identification capability, the side surfaces thus reflect the effectiveness of the modeling sample.

Description

Modeling sample generation method, device, equipment and computer readable storage medium
Technical Field
The invention relates to the technical field of financial technology (Fintech), in particular to a modeling sample generation method, a modeling sample generation device, modeling sample generation equipment and a computer-readable storage medium.
Background
With the development of computer technology, more and more technologies (big data, distributed, Blockchain, artificial intelligence, etc.) are applied to the financial field, and the traditional financial industry is gradually changing to financial technology (Fintech), but higher requirements are also put forward on the technologies due to the requirements of security and real-time performance of the financial industry.
For the risk modeling field, because a sample of risk modeling needs a long expression time from an observation point, the sample is often far away from the current time, and a variable used for modeling also represents past behavior expression of a client, so that when the trained risk identification model is applied to the latest time point, the change of the client behavior can cause the model to be invalid. To solve this problem, existing solutions typically shorten the presentation period, trying to model using the customer behavior information at the latest time point as a sample. However, the short expression period can lead to the insufficient risk expression of the client, and the risk identification model established based on the short expression period still cannot accurately identify the risk client. The above situation reflects the technical problem that the effectiveness of the risk modeling sample obtained by the existing method is not high.
Disclosure of Invention
The invention mainly aims to provide a modeling sample generation method, a modeling sample generation device, modeling sample generation equipment and a computer readable storage medium, and aims to solve the technical problem that a risk modeling sample obtained by the existing method is not high in effectiveness.
In order to achieve the above object, the present invention provides a modeling sample generation method, including:
obtaining an initial modeling sample set corresponding to a modeling sample client;
determining a target time point to obtain a latest sample variable set of the modeling sample client at the target time point and obtain distribution information of the latest sample variable set;
and weighting the initial modeling sample set according to the distribution information to obtain a weighted modeling sample set so as to model based on the weighted modeling sample set.
Optionally, the step of obtaining an initial modeling sample set corresponding to a modeling sample customer includes:
determining modeling sample clients, modeling observation points and modeling presentation periods;
and acquiring initial sample variables of the modeling sample client according to the modeling observation points, and acquiring initial risk expressions of the modeling sample client according to the modeling expression periods so as to obtain the initial modeling sample set based on the initial sample variables and the initial risk expressions.
Optionally, the step of determining a target time point to obtain an updated sample variable set of the modeling sample client at the target time point, and obtaining distribution information of the updated sample variable set includes:
determining the target time point according to the modeling observation point and the modeling expression period, and acquiring a plurality of current sample variables of the modeling sample client at the target time point to serve as the latest sample variable set;
and training the latest sample variable set by using a Gaussian mixture model to obtain probability distribution information of the latest sample variable set as the distribution information.
Optionally, the step of weighting the initial modeling sample set according to the distribution information to obtain a weighted modeling sample set includes:
determining the distribution probability corresponding to each initial modeling sample in the initial modeling sample set according to the probability distribution information;
and weighting each initial modeling sample by taking the distribution probability as a sample weight so as to take the weighted initial modeling sample set as the weighted modeling sample set.
Optionally, before the step of obtaining an initial modeling sample set corresponding to a modeling sample customer, the method further includes:
and determining a target modeling customer base, and sampling the target modeling customer base to determine the modeling sample customer.
Optionally, the step of determining a target modeling guest group includes:
and obtaining customer information and business phase information corresponding to the customer information, and carrying out customer group division on the customer information according to the business phase information to determine the target modeling customer group.
Optionally, the step of modeling based on the weighted modeling sample set includes:
and modeling the weighted modeling sample set by using a risk model modeling algorithm based on machine learning to obtain a risk prediction model for risk prediction.
Further, to achieve the above object, the present invention also provides a modeling sample generation apparatus including:
the initial sample acquisition module is used for acquiring an initial modeling sample set corresponding to a modeling sample client;
the distribution information acquisition module is used for determining a target time point, acquiring the latest sample variable set of the modeling sample client at the target time point and acquiring the distribution information of the latest sample variable set;
and the weighted sample generating module is used for weighting the initial modeling sample set according to the distribution information to obtain a weighted modeling sample set so as to perform modeling based on the weighted modeling sample set.
Optionally, the initial sample acquiring module comprises:
the modeling information determining unit is used for determining modeling sample clients, modeling observation points and modeling presentation periods;
and the initial sample acquisition unit is used for acquiring initial sample variables of the modeling sample client according to the modeling observation points and acquiring initial risk expressions of the modeling sample client according to the modeling expression periods so as to obtain the initial modeling sample set based on the initial sample variables and the initial risk expressions.
Optionally, the distribution information obtaining module includes:
the latest sample acquisition unit is used for determining the target time point according to the modeling observation point and the modeling expression period, and acquiring a plurality of current sample variables of the modeling sample client at the target time point to serve as the latest sample variable set;
and the probability distribution acquisition unit is used for training the latest sample variable set by using a Gaussian mixture model to obtain probability distribution information of the latest sample variable set as the distribution information.
Optionally, the weighted sample generation module comprises:
a distribution probability determining unit, configured to determine, according to the probability distribution information, a distribution probability corresponding to each initial modeling sample in the initial modeling sample set;
and the weighted sample acquisition unit is used for weighting each initial modeling sample by taking the distribution probability as a sample weight so as to take the weighted initial modeling sample set as the weighted modeling sample set.
Optionally, the modeling sample generation apparatus further includes:
and the target object group sampling module is used for determining a target modeling object group and sampling the target modeling object group to determine the modeling sample client.
Optionally, the target guest group sampling module includes:
and the target guest group dividing unit is used for acquiring guest information and business phase information corresponding to the guest information, and dividing the guest group of the guest information according to the business phase information to determine the target modeling guest group.
Optionally, the weighted sample generation module comprises:
and the weighted sample modeling unit is used for modeling the weighted modeling sample set by utilizing a risk model modeling algorithm based on machine learning to obtain a risk prediction model for risk prediction.
Further, to achieve the above object, the present invention also provides a modeling sample generation apparatus including: a memory, a processor and a modeling sample generation program stored on the memory and executable on the processor, the modeling sample generation program, when executed by the processor, implementing the steps of the modeling sample generation method as described above.
Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a modeling sample generation program that, when executed by a processor, implements the steps of the modeling sample generation method as described above.
The invention provides a modeling sample generation method, a modeling sample generation device, modeling sample generation equipment and a computer readable storage medium. The invention collects the latest sample variable set of the modeling sample client at the latest time point on the basis of the initial modeling sample set, weights the initial modeling sample set by utilizing the variable distribution information of the latest sample variable set at the latest time point, so that the final weighted modeling sample set can be obtained by fully combining the expression data of the old time point corresponding to the initial modeling sample set and the variable distribution information of the latest time point, thereby simulating the data distribution of the latest time point at the observation point, modeling is carried out by combining the expression period, further the variable weight in the model obtained by training based on the weighted modeling sample set can be close to the data distribution of the latest time point, the variable weight in the finally trained model is also close to the data distribution of the latest time point, therefore, the invention has higher client risk identification capability, thereby reflecting the effectiveness of the modeling sample on the side, the technical problem that a risk modeling sample obtained through the existing mode is low in effectiveness is solved.
Drawings
FIG. 1 is a schematic diagram of an apparatus architecture of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart diagram of a first embodiment of a modeling sample generation method of the present invention;
FIG. 3 is a schematic flow chart diagram of a second embodiment of a modeling sample generation method of the present invention;
fig. 4 is a functional block diagram of the modeling sample generation apparatus according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, fig. 1 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the modeling sample generation apparatus may include: a processor 1001, such as a CPU, a user interface 1003, a network interface 1004, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the configuration of the apparatus shown in fig. 1 is not intended to be limiting of the apparatus and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a modeling sample generation program.
In the device shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (programmer's end) and performing data communication with the client; and the processor 1001 may be configured to call the modeling sample generation program stored in the memory 1005 and perform the operations in the following modeling sample generation method:
obtaining an initial modeling sample set corresponding to a modeling sample client;
determining a target time point to obtain a latest sample variable set of the modeling sample client at the target time point and obtain distribution information of the latest sample variable set;
and weighting the initial modeling sample set according to the distribution information to obtain a weighted modeling sample set so as to model based on the weighted modeling sample set.
Further, the step of obtaining an initial modeling sample set corresponding to the modeling sample client includes:
determining modeling sample clients, modeling observation points and modeling presentation periods;
and acquiring initial sample variables of the modeling sample client according to the modeling observation points, and acquiring initial risk expressions of the modeling sample client according to the modeling expression periods so as to obtain the initial modeling sample set based on the initial sample variables and the initial risk expressions.
Further, the step of determining a target time point to obtain an updated sample variable set of the modeling sample client at the target time point, and obtaining distribution information of the updated sample variable set includes:
determining the target time point according to the modeling observation point and the modeling expression period, and acquiring a plurality of current sample variables of the modeling sample client at the target time point to serve as the latest sample variable set;
and training the latest sample variable set by using a Gaussian mixture model to obtain probability distribution information of the latest sample variable set as the distribution information.
Further, the step of weighting the initial modeling sample set according to the distribution information to obtain a weighted modeling sample set includes:
determining the distribution probability corresponding to each initial modeling sample in the initial modeling sample set according to the probability distribution information;
and weighting each initial modeling sample by taking the distribution probability as a sample weight so as to take the weighted initial modeling sample set as the weighted modeling sample set.
Further, before the step of obtaining the initial modeling sample set corresponding to the modeling sample client, the processor 1001 may be configured to call the modeling sample generation program stored in the memory 1005, and perform the following operations in the modeling sample generation method:
and determining a target modeling customer base, and sampling the target modeling customer base to determine the modeling sample customer.
Further, the step of determining a target modeling guest group includes:
and obtaining customer information and business phase information corresponding to the customer information, and carrying out customer group division on the customer information according to the business phase information to determine the target modeling customer group.
Further, the step of modeling based on the set of weighted modeling samples comprises:
and modeling the weighted modeling sample set by using a risk model modeling algorithm based on machine learning to obtain a risk prediction model for risk prediction.
Based on the hardware structure, the embodiment of the modeling sample generation method is provided.
For the risk modeling field, because a sample of risk modeling needs a long expression time from an observation point, the sample is often far away from the current time, and a variable used for modeling also represents past behavior expression of a client, so that when the trained risk identification model is applied to the latest time point, the change of the client behavior can cause the model to be invalid. To solve this problem, existing solutions typically shorten the presentation period, trying to model using the customer behavior information at the latest time point as a sample. However, the short expression period can lead to the insufficient risk expression of the client, and the risk identification model established based on the short expression period still cannot accurately identify the risk client. The above situation reflects the technical problem that the effectiveness of the risk modeling sample obtained by the existing method is not high.
In order to solve the above problems, the present invention provides a method for generating a modeling sample, which comprises collecting a latest sample variable set of a modeling sample client at a latest time point on the basis of an initial modeling sample set, weighting the initial modeling sample set by using variable distribution information of the latest sample variable set at the latest time point, so as to obtain a final weighted modeling sample set by fully combining expression data of an old time point corresponding to the initial modeling sample set and variable distribution information of the latest time point, thereby simulating data distribution of the latest time point at an observation point, modeling by combining an expression period, further making a variable weight in a model trained based on the weighted modeling sample set to be close to the data distribution of the latest time point, and making a variable weight in a finally trained model to be close to the data distribution of the latest time point, thereby having higher client risk identification capability, therefore, the effectiveness of the modeling sample is reflected by the side face, and the technical problem that the effectiveness of the risk modeling sample obtained by the conventional method is not high is solved.
Referring to fig. 2, fig. 2 is a schematic flow chart of a modeling sample generation method according to a first embodiment of the present invention. The modeling sample generation method comprises the following steps;
step S10, obtaining an initial modeling sample set corresponding to a modeling sample client;
in the embodiment, the method is applied to the terminal equipment, and particularly can be applied to the field of credit risk modeling. For personal credit business, credit risk modeling is to identify the risk of personal default by combining various factors causing personal default and utilizing a method of a mathematical model, and the risk is used in the whole risk control process. The modeling sample client refers to a client object for collecting sample data at this time. The initial modeling sample set contains a plurality of initial modeling samples, wherein the initial modeling samples refer to sample data of modeling sample clients before a target time point, and generally comprise risk performances, behavior information and the like of the clients before the target time point.
Specifically, if the terminal receives a modeling sample generation instruction, a modeling sample client for which sample data needs to be acquired currently is determined according to the modeling sample generation instruction, risk performance, behavior information and the like of the modeling sample client within a certain time period are acquired from each specified relevant platform, and the data are collected into the initial modeling sample set.
Step S20, determining a target time point, so as to obtain the latest sample variable set of the modeling sample client at the target time point and obtain the distribution information of the latest sample variable set;
in the present embodiment, the target time point refers to the latest time point before modeling is performed. The latest sample variable set comprises a plurality of latest sample variables, and the latest sample variables refer to the sample variables of the modeling sample clients at the latest time point and are usually represented in a behavior information mode. Distribution information refers to the distribution information in the ensemble, usually embodied in the form of a probability distribution, of each of the latest sample variables of the modeling sample client.
The terminal can take the current time point of starting modeling as the target time point, and then obtains a plurality of latest sample variables of the modeling sample client at the current time point on the relevant platform as an latest sample variable set. And then analyzing the distribution information of each latest sample variable through a related technical means.
And step S30, weighting the initial modeling sample set according to the distribution information to obtain a weighted modeling sample set, and modeling based on the weighted modeling sample set.
In this embodiment, the weighted modeling sample set includes a plurality of weighted modeling samples, and each weighted modeling sample includes an initial modeling sample and distribution information corresponding to the initial modeling sample in the latest sample variable set.
And the terminal correspondingly weights each initial modeling sample in the initial modeling sample set according to the currently obtained distribution information of each sample variable at the latest time point, and lists each weighted initial modeling sample as the weighted modeling sample set. After the terminal obtains the weighted modeling sample set, the risk prediction model can be trained based on the sample data, and the variable weight in the trained risk prediction model is closer to the data distribution of the latest time point, so that the change of customer behaviors can be followed, and the effectiveness of the model is ensured.
The invention provides a modeling sample generation method. The modeling sample generation method comprises the steps of obtaining an initial modeling sample set corresponding to a modeling sample client; determining a target time point to obtain a latest sample variable set of the modeling sample client at the target time point and obtain distribution information of the latest sample variable set; and weighting the initial modeling sample set according to the distribution information to obtain a weighted modeling sample set so as to model based on the weighted modeling sample set. The invention collects the latest sample variable set of the modeling sample client at the latest time point on the basis of the initial modeling sample set, weights the initial modeling sample set by utilizing the variable distribution information of the latest sample variable set at the latest time point, so that the final weighted modeling sample set can be obtained by fully combining the expression data of the old time point corresponding to the initial modeling sample set and the variable distribution information of the latest time point, thereby simulating the data distribution of the latest time point at the observation point, modeling is carried out by combining the expression period, further the variable weight in the model obtained by training based on the weighted modeling sample set can be close to the data distribution of the latest time point, the variable weight in the finally trained model is also close to the data distribution of the latest time point, therefore, the invention has higher client risk identification capability, thereby reflecting the effectiveness of the modeling sample on the side, the technical problem that a risk modeling sample obtained through the existing mode is low in effectiveness is solved.
Further, based on the first embodiment shown in fig. 2, a second embodiment of the modeling sample generation method of the present invention is proposed. As shown in fig. 3, in the present embodiment, step S10 includes:
step S11, determining modeling sample clients, modeling observation points and modeling presentation periods;
step S12, obtaining initial sample variables of the modeling sample client according to the modeling observation points, and obtaining initial risk expression of the modeling sample client according to the modeling expression period, so as to obtain the initial modeling sample set based on the initial sample variables and the initial risk expression.
In this embodiment, the modeling observation point refers to a point at which the modeling sample client starts risk performance. Initial risk performance refers to the relevant behavioral information of the modeled sample client before the observation point, which can be generalized to some numerical description. The modeling presentation period refers to a period of time after the observation point by the modeling sample client. The initial risk performance refers to risk performance information of a modeling sample client in a modeling performance period, specifically, whether the client becomes a bad client in the performance period or not, if the client becomes the bad client, the client can be marked as 1, and if the client does not become the bad client, the client can be marked as 0, namely, the value of the initial risk performance can be identified by 0 and 1.
The terminal can determine a modeling sample client, a modeling observation point and the time length of a modeling presentation period according to a specific modeling instruction, then collects some behavior information expressed in a digital form before the modeling observation point of the modeling sample client and risk presentation information in the modeling presentation period of the modeling sample client through related channels, and combines the behavior information and the risk presentation information to obtain an initial modeling sample set.
It should be noted that, the modeling presentation period does not need to be shortened as in the prior art, and the set time duration is still the conventional time duration.
Further, step S20 includes:
step S21, determining the target time point according to the modeling observation point and the modeling expression period, and acquiring a plurality of current sample variables of the modeling sample client at the target time point to be used as the latest sample variable set;
step S22, training the latest sample variable set by using a Gaussian mixture model to obtain probability distribution information of the latest sample variable set as the distribution information.
In this embodiment, the gaussian mixture model is a model for fitting the distribution of an arbitrary variable x, and may be regarded as a model formed by combining K single gaussian models, and the K sub-models are hidden variables of the mixture model. In general, any probability distribution can be used for a mixture model, where a Gaussian mixture model is used because of its good mathematical properties and good computational performance.
The terminal can select the time after the modeling presentation period as a target time point according to the specific time of the modeling observation point and the corresponding duration of the modeling presentation period, and then obtain the sample variable of the modeling sample client at the current time point when the current time reaches the target time point. The terminal models the latest sample variable set of the sample client at the latest time point through the mobile phone, and trains the latest sample variable set by using a Gaussian mixture model, so that the probability distribution of each sample variable at the latest time point can be obtained, namely the distribution information.
Further, the step of weighting the initial modeling sample set according to the distribution information to obtain a weighted modeling sample set includes:
step S31, determining the distribution probability corresponding to each initial modeling sample in the initial modeling sample set according to the probability distribution information;
and step S32, weighting each initial modeling sample by taking the distribution probability as a sample weight, and taking the weighted initial modeling sample set as the weighted modeling sample set.
In this embodiment, the specific way for the terminal to weight the sample data is as follows: and weighting each initial modeling sample containing the initial sample variable and the initial risk expression in the initial modeling sample set according to the distribution probability of the sample variable at each latest time point in the latest sample variable set obtained by the Gaussian mixture model, wherein the probability distribution can be directly used as a corresponding weight, and the weighted initial modeling sample set is recorded as the weighted modeling sample set.
As one specific example. The terminal implements the scheme based on the following steps:
the first step is as follows: determining a modeling passenger group, an observation point and a presentation period, collecting a variable x and a risk presentation y corresponding to a modeling sample, and merging and recording as a sample A;
the second step is that: collecting a variable x ' of a modeling sample client at the latest time point, training by using a Gaussian mixture model to obtain a distribution p (x ') of the variable x ' at the latest time point, and recording as a model A;
the third step: weighting each sample x in the sample A by using the model A to obtain p (x) of each sample in the sample A, and recording the sample and the corresponding weight as a sample B;
the fourth step: modeling the sample B by using a risk model modeling algorithm to obtain a model B;
the fifth step: and the risk prediction can be carried out by utilizing the model B.
Further, in the present embodiment, by using the variable distribution information of the latest time point, the modeling samples are weighted, so that the distribution of the latest time point is simulated at the observation point, and modeling is performed in combination with the expression period information.
Further, based on the first embodiment shown in fig. 2 described above, a third embodiment of the modeling sample generation method of the present invention is proposed. In this embodiment, before step S10, the method further includes;
and determining a target modeling customer base, and sampling the target modeling customer base to determine the modeling sample customer.
In the present embodiment, the target modeling customer group refers to a customer group having several common characteristics screened by some screening conditions. The terminal determines the target modeling client group first, and then selects part of clients from the target modeling client group as modeling sample clients through sample sampling, wherein the specific sampling mode can be random sampling or other sampling modes, and the embodiment is not particularly limited.
Further, the step of determining a target modeling guest group includes:
and obtaining customer information and business phase information corresponding to the customer information, and carrying out customer group division on the customer information according to the business phase information to determine the target modeling customer group.
In this embodiment, the business phase refers to different phases of the credit business of the client, such as account opening, first loan, stock, overdue, and the like, and the different phases can be divided into different client groups. The terminal classifies the existing whole client source by taking different service stages as classification bases, and then determines the category needing risk prediction as the target modeling client group, wherein one or more target modeling client groups can be determined at the same time.
Further, the step of modeling based on the set of weighted modeling samples comprises:
and modeling the weighted modeling sample set by using a risk model modeling algorithm based on machine learning to obtain a risk prediction model for risk prediction.
As shown in fig. 4, the present invention also provides a modeling sample generation apparatus including:
an initial sample obtaining module 10, configured to obtain an initial modeling sample set corresponding to a modeling sample client;
a distribution information obtaining module 20, configured to determine a target time point, so as to obtain a latest sample variable set of the modeling sample client at the target time point, and obtain distribution information of the latest sample variable set;
and the weighted sample generating module 30 is configured to weight the initial modeling sample set according to the distribution information to obtain a weighted modeling sample set, so as to perform modeling based on the weighted modeling sample set.
The method executed by each program module can refer to each embodiment of the modeling sample generation method of the present invention, and is not described herein again.
The invention also provides modeling sample generation equipment.
The modeling sample generation device comprises a processor, a memory and a modeling sample generation program stored on the memory and operable on the processor, wherein the modeling sample generation program, when executed by the processor, implements the steps of the modeling sample generation method as described above.
The method implemented when the modeling sample generation program is executed may refer to each embodiment of the modeling sample generation method of the present invention, and details are not repeated here.
The invention also provides a computer readable storage medium.
The computer-readable storage medium of the present invention has stored thereon a modeling sample generation program that, when executed by a processor, implements the steps of the modeling sample generation method as described above.
The method implemented when the modeling sample generation program is executed may refer to each embodiment of the modeling sample generation method of the present invention, and details are not repeated here.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A modeling sample generation method, characterized by comprising:
obtaining an initial modeling sample set corresponding to a modeling sample client;
determining a target time point to obtain a latest sample variable set of the modeling sample client at the target time point and obtain distribution information of the latest sample variable set;
and weighting the initial modeling sample set according to the distribution information to obtain a weighted modeling sample set so as to model based on the weighted modeling sample set.
2. The modeling sample generation method of claim 1, wherein the step of obtaining an initial set of modeling samples corresponding to a modeling sample customer comprises:
determining modeling sample clients, modeling observation points and modeling presentation periods;
and acquiring initial sample variables of the modeling sample client according to the modeling observation points, and acquiring initial risk expressions of the modeling sample client according to the modeling expression periods so as to obtain the initial modeling sample set based on the initial sample variables and the initial risk expressions.
3. The modeling sample generating method according to claim 2, wherein the step of determining a target time point to obtain a latest sample variable set of the modeling sample client at the target time point and obtaining distribution information of the latest sample variable set includes:
determining the target time point according to the modeling observation point and the modeling expression period, and acquiring a plurality of current sample variables of the modeling sample client at the target time point to serve as the latest sample variable set;
and training the latest sample variable set by using a Gaussian mixture model to obtain probability distribution information of the latest sample variable set as the distribution information.
4. The method of generating modeling samples according to claim 3, wherein the step of weighting the initial set of modeling samples according to the distribution information to obtain a set of weighted modeling samples comprises:
determining the distribution probability corresponding to each initial modeling sample in the initial modeling sample set according to the probability distribution information;
and weighting each initial modeling sample by taking the distribution probability as a sample weight so as to take the weighted initial modeling sample set as the weighted modeling sample set.
5. The method for generating modeling samples according to claim 1, wherein the step of obtaining an initial set of modeling samples corresponding to a customer of modeling samples is preceded by the step of:
and determining a target modeling customer base, and sampling the target modeling customer base to determine the modeling sample customer.
6. The modeling sample generation method of claim 5, wherein the step of determining a target modeling objective comprises:
and obtaining customer information and business phase information corresponding to the customer information, and carrying out customer group division on the customer information according to the business phase information to determine the target modeling customer group.
7. The method of generating modeled samples of any of claims 1-6 wherein the step of modeling based on the set of weighted modeled samples comprises:
and modeling the weighted modeling sample set by using a risk model modeling algorithm based on machine learning to obtain a risk prediction model for risk prediction.
8. A modeling sample generation apparatus, characterized by comprising:
the initial sample acquisition module is used for acquiring an initial modeling sample set corresponding to a modeling sample client;
the distribution information acquisition module is used for determining a target time point, acquiring the latest sample variable set of the modeling sample client at the target time point and acquiring the distribution information of the latest sample variable set;
and the weighted sample generating module is used for weighting the initial modeling sample set according to the distribution information to obtain a weighted modeling sample set so as to perform modeling based on the weighted modeling sample set.
9. A modeling sample generation apparatus, characterized by comprising: memory, a processor and a modeling sample generation program stored on the memory and executable on the processor, the modeling sample generation program, when executed by the processor, implementing the steps of the modeling sample generation method of any of claims 1 to 7.
10. A computer-readable storage medium, characterized in that a modeling sample generation program is stored thereon, which when executed by a processor implements the steps of the modeling sample generation method according to any one of claims 1 to 7.
CN202110045708.9A 2021-01-13 2021-01-13 Modeling sample generation method, device, equipment and computer readable storage medium Pending CN112766558A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110045708.9A CN112766558A (en) 2021-01-13 2021-01-13 Modeling sample generation method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110045708.9A CN112766558A (en) 2021-01-13 2021-01-13 Modeling sample generation method, device, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN112766558A true CN112766558A (en) 2021-05-07

Family

ID=75700446

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110045708.9A Pending CN112766558A (en) 2021-01-13 2021-01-13 Modeling sample generation method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112766558A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678322A (en) * 2012-09-03 2014-03-26 阿里巴巴集团控股有限公司 System and method for sample data integration
CN107644279A (en) * 2016-07-21 2018-01-30 阿里巴巴集团控股有限公司 The modeling method and device of evaluation model
CN110264274A (en) * 2019-06-21 2019-09-20 深圳前海微众银行股份有限公司 Objective group's division methods, model generating method, device, equipment and storage medium
CN111311402A (en) * 2020-03-30 2020-06-19 百维金科(上海)信息科技有限公司 XGboost-based internet financial wind control model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678322A (en) * 2012-09-03 2014-03-26 阿里巴巴集团控股有限公司 System and method for sample data integration
CN107644279A (en) * 2016-07-21 2018-01-30 阿里巴巴集团控股有限公司 The modeling method and device of evaluation model
CN110264274A (en) * 2019-06-21 2019-09-20 深圳前海微众银行股份有限公司 Objective group's division methods, model generating method, device, equipment and storage medium
CN111311402A (en) * 2020-03-30 2020-06-19 百维金科(上海)信息科技有限公司 XGboost-based internet financial wind control model

Similar Documents

Publication Publication Date Title
CN109543925B (en) Risk prediction method and device based on machine learning, computer equipment and storage medium
WO2022252363A1 (en) Data processing method, computer device and readable storage medium
CN112785086A (en) Credit overdue risk prediction method and device
CN111160624B (en) User intention prediction method, user intention prediction device and terminal equipment
CN111582341B (en) User abnormal operation prediction method and device
CN113205403A (en) Method and device for calculating enterprise credit level, storage medium and terminal
CN111932269A (en) Equipment information processing method and device
CN113379301A (en) Method, device and equipment for classifying users through decision tree model
CN112598117A (en) Neural network model design method, deployment method, electronic device and storage medium
CN109978575B (en) Method and device for mining user flow operation scene
CN110851817A (en) Terminal type identification method and device
KR102152081B1 (en) Valuation method based on deep-learning and apparatus thereof
CN110717537B (en) Method and device for training user classification model and executing user classification prediction
CN112767028A (en) Method for predicting number of active users, computer device and storage medium
WO2024051146A1 (en) Methods, systems, and computer-readable media for recommending downstream operator
CN112231299A (en) Method and device for dynamically adjusting feature library
CN111445139A (en) Business process simulation method and device, storage medium and electronic equipment
CN109241249B (en) Method and device for determining burst problem
CN111046156A (en) Method and device for determining reward data and server
CN112766558A (en) Modeling sample generation method, device, equipment and computer readable storage medium
CN113596061B (en) Network security vulnerability response method based on block chain technology
CN111599342A (en) Tone selecting method and system
CN112785418B (en) Credit risk modeling method, apparatus, device and computer readable storage medium
CN114782110A (en) Demand mining method and system based on logistic regression two-classification and JMTS
CN113033938B (en) Method, device, terminal equipment and storage medium for determining resource allocation strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination