WO2020253038A1 - 一种模型构建方法及装置 - Google Patents

一种模型构建方法及装置 Download PDF

Info

Publication number
WO2020253038A1
WO2020253038A1 PCT/CN2019/117071 CN2019117071W WO2020253038A1 WO 2020253038 A1 WO2020253038 A1 WO 2020253038A1 CN 2019117071 W CN2019117071 W CN 2019117071W WO 2020253038 A1 WO2020253038 A1 WO 2020253038A1
Authority
WO
WIPO (PCT)
Prior art keywords
merchant
target
parameters
time period
transaction
Prior art date
Application number
PCT/CN2019/117071
Other languages
English (en)
French (fr)
Inventor
苏宇
石英伦
朱凡
蒋旭昂
Original Assignee
平安普惠企业管理有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安普惠企业管理有限公司 filed Critical 平安普惠企业管理有限公司
Publication of WO2020253038A1 publication Critical patent/WO2020253038A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Definitions

  • This application relates to the field of computer technology, in particular to a model construction method and device.
  • the existing prediction models mainly include linear regression models, Kalman filter prediction models, input-output prediction models, and artificial neural network prediction models.
  • the existing prediction model cannot be applied to predict the number of transactions in the transaction prediction scenario.
  • the embodiments of the present application provide a model construction method and device, which can construct a transaction prediction model in a specific scenario, so as to predict the number of successful transactions in different regions of the city in the future based on the transaction prediction model.
  • an embodiment of the present application provides a model construction method, which includes:
  • each merchant gathering area corresponds to P group of raw data
  • each group of raw data includes at least one merchant parameter and at least one transaction parameter
  • the transaction parameter includes at least the number of successful transactions
  • the sum of the number of merchant parameters and transaction parameters included in each set of original data is N ;
  • the target data corresponding to the K target parameters are selected from the original data corresponding to each merchant gathering area to obtain the M*P target data.
  • Each target data includes the same K target parameters, and K is less than or equal to N- 1;
  • the training sample set includes M*P training samples, and each training sample includes the K The characteristics of the discretized data corresponding to three target parameters.
  • the K target parameters include one or more of the number of merchants, the proportion of merchants, the number of transaction applications, the number of transaction cancellations, and the transaction trend;
  • a transaction prediction model is constructed based on the M*P training samples in the training sample set and the M*P number of successful transactions in the original data of the M*P group.
  • the transaction prediction model is used for the first time period based on the target merchant cluster area A set of target data within predicts the number of successful transactions in the target merchant gathering area in the second time period after the first time period.
  • an embodiment of the present application provides a model construction device, which includes:
  • the first acquisition module is used to acquire P groups of original data in P different time periods in each of the M merchant aggregation areas to obtain M*P groups of original data, where one time period corresponds to a set of original data ,
  • Each merchant gathering area corresponds to P sets of raw data, each set of raw data includes at least one merchant parameter and at least one transaction parameter, the transaction parameters include at least the number of successful transactions, and the merchant parameters and transactions included in each set of raw data
  • the sum of the number of parameters is N;
  • the screening module is used to filter out the target data corresponding to the K target parameters from the original data corresponding to each merchant gathering area to obtain the M*P group target data.
  • Each group of target data includes the same K target parameters, K Less than or equal to N-1;
  • the discrete processing module is used to discretize the M*P data corresponding to each target parameter included in the M*P group of target data to obtain a training sample set.
  • the training sample set includes M*P training samples.
  • Each training sample includes the discretized characteristics of the data corresponding to the K target parameters, and the K target parameters include one or more of the number of merchants, the proportion of merchants, the number of transaction applications, the number of transaction cancellations, and the transaction trend;
  • the building module is used to construct a transaction prediction model based on the M*P training samples in the training sample set and the M*P number of successful transactions in the original data of the M*P group, and the transaction prediction model is used based on the target merchant cluster area A set of target data in the first time period predicts the number of successful transactions in the target merchant gathering area in the second time period after the first time period.
  • an embodiment of the present application provides a terminal, including a processor and a memory, the processor and the memory are connected to each other, wherein the memory is used to store a computer program that supports the terminal to execute the above method, and the computer program includes program instructions
  • the processor is configured to call the program instructions to execute the model construction method of the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium that stores a computer program, and the computer program includes program instructions that, when executed by a processor, cause the processor to execute the aforementioned first On the one hand, the model construction method.
  • the embodiment of the application constructs a transaction prediction model based on a specific training sample set, and can construct a transaction prediction model in a specific scenario, so as to predict the number of successful transactions in different regions of the city in the future based on the transaction prediction model.
  • FIG. 1 is a schematic flowchart of a model construction method provided by an embodiment of the present application
  • FIG. 2 is another schematic flowchart of a model construction method provided by an embodiment of the present application.
  • Figure 3a is a schematic diagram of a training process provided by an embodiment of the present application.
  • FIG. 3b is another schematic diagram of the training process provided by the embodiment of the present application.
  • FIG. 4 is a schematic block diagram of a model construction device provided by an embodiment of the present application.
  • FIG. 5 is a schematic block diagram of a terminal provided by an embodiment of the present application.
  • Fig. 1 is a schematic flowchart of a model construction method provided by an embodiment of the present application. As shown in Figure 1, the model construction method may include steps:
  • S101 Acquire P groups of original data in P different time periods in each of the M merchant aggregation areas to obtain M*P groups of original data.
  • the terminal may obtain M merchant gathering areas in the target city from the area database.
  • the regional database can be used to store the merchant gathering areas divided in each city, and each merchant gathering area includes one or more merchants.
  • the target city can be a prefecture-level city or a municipality directly under the Central Government, such as Zhuhai, Shenzhen, Shanghai, etc.
  • the terminal can obtain P groups of original data of each of the M merchant aggregation areas in P different time periods to obtain M*P groups of original data. Among them, P can be 24, one time period can be one month, and P different time periods can be 24 consecutive months in history, such as from November 2016 to November 2018.
  • One month can correspond to a set of raw data, and a set of raw data can represent data that actually exists in a business gathering area within a period of time.
  • Each merchant aggregation area has 24 sets of original data in 24 consecutive months, so M merchant aggregation areas have a total of M*24 sets of original data within 24 consecutive months in history.
  • Each set of raw data may include at least one merchant parameter and at least one transaction parameter, and the at least one transaction parameter may include the number of transaction applications, the number of successful transactions, the number of failed transactions, the number of canceled transactions, the success rate of transactions, or the trend of transactions;
  • the at least one kind of merchant parameter may include the number of merchants (including the number of merchants of different types of merchants, such as the number of electronic merchants, the number of clothing merchants, the number of beauty merchants, etc.), the proportion of merchants (including different Types of merchants (proportion of merchants) or merchant density, etc.
  • the sum of the number of merchant parameters and transaction parameters included in each group of raw data is N, and the types of parameters included in each group of raw data can be the same, that is, the N types of parameters included in each group of raw data are the same.
  • M can be an integer greater than or equal to 1
  • N can be an integer greater than or equal to 2.
  • the transaction involved in the embodiment of this application may be a loan.
  • S102 Select target data corresponding to K target parameters from each group of original data corresponding to the gathering area of each merchant, to obtain the M*P group of target data.
  • the terminal can obtain each group of original data corresponding to the aggregation area of each merchant, and can obtain the N types of parameters of each group of original data and the N types of parameters.
  • the tags carried by various parameters of the parameters are input into the decision tree for parameter filtering.
  • the terminal can obtain the contribution of the N-1 parameters of the decision tree output to the number of successful transactions based on the N parameters of the original data of each group (for example, the decision tree uses information gain to indicate the contribution, then the information of the various parameters
  • the gain value is the contribution of various parameters to the parameter of the number of successful transactions).
  • the terminal can obtain the contribution threshold.
  • the terminal can filter out K target parameters whose contribution to the number of successful transactions is greater than or equal to the contribution threshold from the N-1 parameters output by the decision tree, and can select each group corresponding to the aggregation area of each merchant
  • the target data corresponding to the K target parameters are extracted from the original data to obtain the M*P group target data.
  • the K target parameters included in each group of target data may be the same, and K may be less than or equal to N-1.
  • the contribution threshold may be a preset value, for example, the contribution threshold is 0.2.
  • the above-mentioned contribution threshold is obtained specifically as follows: the terminal can output the N-1 parameters of the decision tree to the number of successful transactions in the order of the contribution of various parameters to the number of successful transactions. Arrange in order to get the contribution degree sequence.
  • the terminal arranges the contribution degrees of the various parameters in the N-1 parameters to the number of successful transactions in descending order to obtain the contribution degree sequence.
  • the terminal uses the 69th contribution degree (assumed to be 0.35) in the contribution degree sequence as the contribution degree threshold.
  • S103 Discretize the M*P data corresponding to each target parameter included in the M*P group of target data to obtain a training sample set.
  • the following operations are performed for each of the above K types of target parameters:
  • the terminal can extract M*P data corresponding to the target parameter m in the above M*P group of target data, and can be based on Clustering algorithms such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN) discretize the M*P data corresponding to the target parameter m to obtain M*P features.
  • the P features belong to M merchant gathering areas respectively.
  • the terminal can obtain the M*P*K features corresponding to the K types of target parameters, one type of target parameter corresponds to M*P features, and can be based on the M*P*K features and the merchant aggregation area to which each feature belongs.
  • each training sample may include discretized features of data corresponding to K target parameters in a set of target data in a merchant gathering area in a period of time.
  • the K target parameters may include one or more of the number of merchants, the proportion of merchants, the number of transaction applications, the number of transaction cancellations, and the transaction trend.
  • the terminal obtains the data corresponding to the target parameter of the number of merchants in each set of target data in the 100*24 sets of target data. Since a set of target data includes a data corresponding to the target parameter of the number of merchants, the 100*24 sets of target data get 100*24 Data. Since these 100*24 data belong to the parameter of the number of merchants, the terminal discretizes the 100*24 data based on DBSCAN to obtain 100*24 features. Each of these 100*24 features represents a discrete and abstracted value of the number of businesses in a business aggregation area in a time period.
  • the data corresponding to the target parameter of the number of merchants includes 10 data of 500, 600, 70, 100, 82, 550, 120, 150, 65, and 167.
  • the terminal is based on DBSCAN for these 10 data.
  • the data is discretized, and the number of merchants is less than 100 as category I, 100 ⁇ number of merchants ⁇ 200 as category II, and the number of merchants ⁇ 500 as category III.
  • type I is represented by binary 00
  • type II is represented by binary 01
  • type III is represented by binary 10.
  • the discretized features of these 10 data are binary 10, 10, 00, 01, 00, 10, 01, 01, 00, 01.
  • S104 Construct a transaction prediction model based on the M*P training samples in the training sample set and the M*P number of successful transactions in the M*P group of original data.
  • the terminal can obtain the M*P number of successful transactions in the aforementioned M*P group of original data.
  • the terminal may obtain a preset basic model, and the basic model may include a regression model composed of multiple tree models.
  • the terminal can input the M*P training samples in the above-mentioned training sample set and the M*P number of successful transactions into the basic model for training, so that the basic model learns the various target parameters of the training samples and the number of successful transactions. The relationship between.
  • the terminal can determine the basic model at this time as the transaction prediction model.
  • the transaction prediction model can be used to predict the number of successful transactions of a merchant gathering area in the next time period of this historical time period based on a set of target data of a merchant gathering area in a historical period of time.
  • this set of target data includes the above-mentioned K target parameters
  • this set of target data includes the above-mentioned K target parameters
  • the original data including the number of merchants, the proportion of merchants, the number of transaction applications, the number of successful transactions, the number of canceled transactions, and the transaction trend, etc.
  • the sample builds a transaction prediction model, which can build a transaction prediction model for the specific scenario of transaction, so as to predict the number of future successful transactions in different areas of the city based on the transaction prediction model.
  • the terminal obtains the original data of the M*P group by obtaining the P groups of original data in the P different time periods of the various merchant gathering areas in the M merchant gathering areas, and then obtains the M*P sets of original data from the corresponding merchant gathering areas
  • the target data corresponding to the K target parameters are selected from each group of raw data to obtain the M*P group target data.
  • After discretizing the M*P data corresponding to each target parameter included in the M*P group target data Obtain the training sample set, and finally build a transaction prediction model based on the M*P training samples in the training sample set and the M*P number of successful transactions in the M*P group of original data.
  • a transaction prediction model can be constructed in a specific scenario, and then the number of successful transactions in different regions of the city can be predicted based on the transaction prediction model.
  • FIG. 2 is another schematic flowchart of the model construction method provided by the embodiment of the present application.
  • the model construction method may include steps:
  • S201 Obtain P groups of original data in P different time periods in each of the M merchant aggregation areas, to obtain M*P groups of original data.
  • S202 Select target data corresponding to K target parameters from each group of original data corresponding to each merchant gathering area to obtain M*P group target data.
  • S203 Discretize the M*P data corresponding to each target parameter included in the M*P group of target data to obtain a training sample set.
  • steps S201 to S203 in the embodiment of the present application reference may be made to the implementation manners of step S101 to step S103 in the embodiment shown in FIG. 1, and details are not described herein again.
  • S204 Construct a first regression model based on the M*P training samples in the training sample set and the M*P number of successful transactions in the M merchant gathering areas in P different time periods.
  • the foregoing training sample set may include M*P training samples.
  • Each training sample may include discretized features of data corresponding to K target parameters in a set of target data in a merchant cluster area in a period of time.
  • the K target parameters may include one or more of the number of merchants, the proportion of merchants, the number of transaction applications, the number of transaction cancellations, and the transaction trend.
  • the terminal can obtain the number of successful M*P transactions in the above-mentioned M merchant gathering areas in the above-mentioned P different time periods.
  • the terminal may obtain a preset first basic model, the first basic model may include multiple tree models, and the multiple tree models may be connected in series to form a regression model.
  • the terminal can input the M*P training samples in the above-mentioned training sample set and the M*P number of successful transactions into the first basic model for training, so that the first basic model learns the various target parameters in the training samples and
  • the relationship between the parameters of the number of successful transactions is to determine the weights of various tree models in the first basic model.
  • the terminal can change
  • the first basic model at this time is determined to be the first regression model.
  • the first regression model can be mainly used to predict the number of successful transactions in the merchant cluster area in a period of time in the future.
  • the first basic model includes n tree models of A1, A2, A3, A4, ..., An.
  • Each tree model selects (manually set or the model chooses) different part of the characteristics of a training sample for training, that is, each tree model learns the difference between the different target parameters of the K target parameters and the number of successful transactions. relationship.
  • the tree model A1 selects any merchant cluster area in the training sample in January 2018
  • the three target parameters a1, a2, a3 discretized features for training the tree model A2 selects any merchant cluster area in The discretized features of the two target parameters b5 and b7 in the training sample in January 2018 are trained; then the tree model A1 in the training sample of February 2018 still chooses a1, a2, and any merchant cluster area The discretized features of the three target parameters of a3 are trained.
  • the tree model A2 still chooses the discretized features of the two target parameters, b5 and b7, for training in any merchant cluster area.
  • a training process of the first basic model is taken as an example.
  • the terminal uses the number of successful M*P transactions in the above P different time periods in the above-mentioned M merchant gathering areas as a verification set.
  • the terminal takes out a certain merchant aggregation area from the training sample set, assuming the training sample of area_1 in January 2018, and extracts the corresponding number of successful transactions from the verification set (that is, the same merchant aggregation area, that is, the number of successful transactions in area_1 in January 2018 ), input the extracted training samples and the number of successful transactions into the first basic model for training to obtain the weights of each tree model in the first basic model, as shown in Figure 3a for weights W1, W2, W3, W4,..., Wn.
  • the weight of each tree model in the first basic model is updated until the weight of each tree model no longer changes or the change range is within a fixed range, then the training will be stopped, and the first training will be stopped.
  • the basic model is used as the first regression model.
  • S205 Construct a second regression model based on the M*P training samples in the training sample set, the M*P number of successful transactions in the M merchant gathering areas in P different time periods, and the first regression model.
  • the terminal may obtain the M*P number of successful transactions in the above-mentioned M merchant gathering areas in the above-mentioned P different time periods, and calculate the M*P The number of successful transactions is used as the verification set.
  • the terminal performs the following operations on each of the above-mentioned M*P training samples: the terminal can input the training sample h of any merchant gathering area i in any time period f of the above-mentioned P different time periods into the above-mentioned first Prediction processing is performed in a regression model, and the predicted value of the merchant gathering area i output by the first regression model based on the training sample h in the next time period of the time period f (that is, the first successful transaction quantity) can be obtained.
  • the terminal may calculate the difference between the first number of successful transactions and the number of successful transactions in the merchant gathering area i in the verification set in the next time period of the time period f.
  • a predicted value (the first successful number of transactions) can be obtained, and each predicted value (the first successful number of transactions) corresponds to the true value (number of successful transactions) in the verification set. If there is a difference between them, then M*P training samples correspond to M*P differences. Therefore, the terminal can obtain the M*P number of first successful transactions corresponding to M*P training samples, and then obtain the difference between the number of successful first transactions and the corresponding number of successful transactions in the verification set to obtain M*P Difference.
  • the terminal may obtain a preset second basic model, and the second basic model may be a regression model.
  • the terminal can input the M*P differences and the M*P training samples in the training sample into the second basic model for training, so that the second basic model learns the difference and various target parameters in the training sample The relationship between.
  • the second basic model reaches convergence, that is, the difference between the predicted value (second number of successful transactions) output by the second basic model and the corresponding true value (number of successful transactions) in the verification set fluctuates within a fixed range .
  • the second basic model at this time is determined as the second regression model.
  • the terminal can connect the output of the first regression model to the input of the second regression model through a subtractor to synthesize the transaction prediction model.
  • the second regression model can be mainly used to adjust the predicted value (first transaction success quantity) output by the first regression model according to various target parameters, so that the predicted value ( The second number of successful transactions) is closer to the true value (number of successful transactions) corresponding to the verification set.
  • FIG. 3b it is another schematic diagram of the training process provided by the embodiment of the present application.
  • the terminal inputs the training samples of any merchant aggregation area area_1 in January 2018 into the first regression model for prediction processing, and obtains the predicted value of the merchant aggregation area area_1 output by the first regression model in February 2018 (first transaction Number of successes).
  • the terminal calculates the difference between the first successful number of transactions in the merchant cluster area_1 in February 2018 and the true value corresponding to the verification set (that is, the number of successful transactions in area_1 in February 2018).
  • the terminal obtains the preset second basic model, and enters the difference between the predicted value and the true value of the merchant cluster area area_1 in February 2018 and the training sample of area_1 in February 2018 into the second basic model. Training is performed so that the second basic model learns the relationship between the difference between the predicted value and the true value and various target parameters. In the next training, the model parameters of the second basic model are adjusted until the second basic model reaches convergence, the training is stopped, and the second basic model after the training is stopped is used as the second regression model. The terminal synthesizes the first regression model and the second regression model into a transaction prediction model.
  • each tree model in the first regression model learns the relationship between different target parameters and the number of successful transactions. Therefore, after the terminal constructs the above-mentioned first regression model, it can obtain the weights of various tree models in the first regression model, and can update the M*P training samples according to the weights of the various tree models. For example, the terminal extracts the target parameter corresponding to the tree model whose weight of the tree model in the first regression model is greater than the weight threshold. Only the features corresponding to the extracted target parameters are retained in each training sample, and other features are removed from each training sample to obtain a new training sample. After M*P training samples are updated, M*P new training samples are obtained. The terminal may construct a second regression model based on the M*P new training samples, the M*P number of successful transactions in the above P different time periods in the M merchant gathering areas, and the above first regression model.
  • S207 Acquire target data including K target parameters in the target merchant gathering area in the first time period.
  • S208 Input the target data of the target merchant gathering area in the first time period into the transaction prediction model for processing, and obtain the target merchant gathering area output by the transaction prediction model based on the target data in the first time period after the first time period The number of successful transactions in the second time period.
  • the terminal after the terminal constructs the transaction prediction model, it can select a merchant gathering area from the above M merchant gathering areas as the target merchant gathering area, and can obtain the target merchant gathering area in the first time
  • the segment includes the target data of the K target parameters (referring to the K target parameters required to construct the transaction prediction model).
  • the terminal may input the target data of the target merchant gathering area in the first time period into the transaction prediction model for processing, and may obtain the target merchant gathering output by the transaction prediction model based on the target data in the first time period The number of successful transactions in the area in the second time period after the first time period.
  • the target data may include one or more parameters among the number of merchants, the proportion of merchants, the number of transaction applications, the number of transaction cancellations, and the transaction trend.
  • the first time period may not belong to the above P different time periods. Assuming that the P different time periods are 24 months between November 2016 and November 2018, then the first time period can be these P different time periods The subsequent period, such as December 2018. The first time period is the same as the second time period. For example, if the first time period is December 2018, then the second time period is January 2019. The first time period and the second time period are both For one month.
  • the original data including the number of merchants, the proportion of merchants, the number of transaction applications, the number of successful transactions, the number of canceled transactions, and the transaction trend, etc.
  • the sample builds a transaction prediction model. After the transaction prediction model is built, the transaction prediction model predicts the number of successful transactions in different areas of the city in the future, so as to guide the salesperson to conduct more targeted business.
  • the terminal obtains the original data of the M*P group by obtaining the P groups of original data in the P different time periods of the various merchant gathering areas in the M merchant gathering areas, and then obtains the M*P sets of original data from the corresponding merchant gathering areas
  • the target data corresponding to the K target parameters are selected from each group of raw data to obtain the M*P group target data. After discretizing the M*P data corresponding to each target parameter included in the M*P group target data Get the training sample set.
  • the first regression model based on the M*P training samples in the training sample set and the M*P number of successful transactions in P different time periods in the M merchant gathering area, and based on the M*P training samples in the training sample set , M*P number of successful transactions in P different time periods in the M merchant gathering areas and the first regression model, construct a second regression model, and synthesize the first regression model and the second regression model into a transaction prediction model.
  • the transaction prediction model the number of successful transactions in the target merchant cluster area in the future is predicted. In order to guide the salesman to conduct business more targeted.
  • FIG. 4 is a schematic block diagram of the model construction apparatus provided by an embodiment of the present application.
  • the device of the embodiment of the present application includes:
  • the first obtaining module 10 is used to obtain P groups of original data in P different time periods in each of the M merchant aggregation areas to obtain M*P groups of original data, where one time period corresponds to a set of original data Data, each merchant gathering area corresponds to P sets of original data, each set of original data includes at least one merchant parameter and at least one transaction parameter, the transaction parameters include at least the number of successful transactions, the merchant parameters included in each set of original data and The sum of the number of transaction parameters is N;
  • the screening module 20 is used to filter out the target data corresponding to the K target parameters from each group of original data corresponding to each merchant gathering area to obtain the M*P group of target data, each group of target data includes the same K target parameters, K is less than or equal to N-1;
  • the discrete processing module 30 is configured to discretize the M*P data corresponding to each target parameter included in the M*P group of target data to obtain a training sample set, and the training sample set includes M*P training samples, Each training sample includes the discretized characteristics of the data corresponding to the K target parameters.
  • the K target parameters include one or more of the number of merchants, the proportion of merchants, the number of transaction applications, the number of transaction cancellations, and the transaction trend;
  • the construction module 40 is used to construct a transaction prediction model based on the M*P training samples in the training sample set and the M*P number of successful transactions in the original data of the M*P group, and the transaction prediction model is used for gathering based on target merchants A set of target data of the area in the first time period predicts the number of successful transactions in the target merchant gathering area in the second time period after the first time period.
  • the aforementioned screening module 20 is also used to:
  • the device further includes a second acquisition module 50, an input module 60, and a third acquisition module 70.
  • the second obtaining module 50 is used to obtain target data including the K target parameters in the target merchant gathering area in the first time period;
  • the input module 60 is used to obtain the target data in the target merchant gathering area in the first time period The target data is input into the transaction prediction model for processing;
  • the third obtaining module 70 is configured to obtain the target merchant gathering area output by the transaction prediction model based on the target data in the first time period after the first time period For the number of successful transactions in the second time period, the first time period is consistent with the time length of the second time period.
  • the aforementioned construction module 40 includes a first construction unit 401, a second construction unit 402, and a synthesis unit 403.
  • the first construction unit 401 is configured to construct a first regression model based on the M*P training samples and the M*P number of successful transactions in the P different time periods in the M merchant gathering area;
  • the second construction The unit 402 is configured to construct a second regression model based on the M*P training samples, the M*P number of successful transactions in the P different time periods of the M merchants gathering area, and the first regression model;
  • the synthesis unit 403 is configured to synthesize the first regression model and the second regression model into a transaction prediction model.
  • the above-mentioned second construction unit 402 is specifically configured to: perform the following operation on each of the M*P training samples: place any merchant gathering area i at the P different times
  • the training sample h in any time period f of the period is input into the first regression model for processing, and the merchant gathering area i output by the first regression model based on the training sample h is obtained in the next time period of the time period f
  • the first successful number of transactions; the difference between the first successful number of transactions and the number of successful transactions in the next time period of the time period f in the merchant gathering area i is obtained.
  • model construction device described above can execute the implementation provided by each step in the implementation provided in Figure 1 or Figure 2 through the various modules described above to implement the functions implemented in the above embodiments.
  • the model construction device described above can execute the implementation provided by each step in the implementation provided in Figure 1 or Figure 2 through the various modules described above to implement the functions implemented in the above embodiments.
  • the corresponding description provided in each step in the method embodiment shown in 1 or FIG. 2 will not be repeated here.
  • the model construction device obtains the original data of the M*P group by obtaining the original data of the M*P group of each merchant gathering area in the P different time periods in the M merchant gathering area, and then obtains the M*P set of original data from each merchant gathering area
  • the target data corresponding to the K target parameters are selected from the corresponding groups of raw data to obtain the M*P group target data, and the M*P data corresponding to each target parameter included in the M*P group target data are discretized
  • a training sample set is obtained.
  • a transaction prediction model is constructed based on the M*P training samples in the training sample set and the M*P number of successful transactions in the M*P group of original data.
  • a transaction prediction model can be constructed in a specific scenario, and then the number of successful transactions in different regions of the city can be predicted based on the transaction prediction model.
  • FIG. 5 is a schematic block diagram of a terminal provided in an embodiment of the present application.
  • the terminal in the embodiment of the present application may include: one or more processors 501 and a memory 502.
  • the aforementioned processor 501 and memory 502 are connected through a bus 503.
  • the memory 502 is configured to store a computer program including program instructions
  • the processor 501 is configured to execute the program instructions stored in the memory 502.
  • the processor 501 is configured to call the program instructions to execute:
  • each merchant gathering area corresponds to P group of raw data
  • each group of raw data includes at least one merchant parameter and at least one transaction parameter
  • the transaction parameter includes at least the number of successful transactions
  • the sum of the number of merchant parameters and transaction parameters included in each set of original data is N ; Filter out the target data corresponding to the K target parameters from each group of original data corresponding to each merchant gathering area to obtain the M*P group target data.
  • Each group of target data includes the same K target parameters, and K is less than or equal to N -1; Discretize the M*P data corresponding to each target parameter included in the M*P group of target data to obtain a training sample set, the training sample set includes M*P training samples, each training sample Including the discretized characteristics of the data corresponding to the K target parameters.
  • the K target parameters include one or more of the number of merchants, the proportion of merchants, the number of transaction applications, the number of transaction cancellations, and the transaction trend; based on the training sample
  • the concentrated M*P training samples and the M*P number of successful transactions in the original data of the M*P group construct a transaction prediction model.
  • the transaction prediction model is used for a set of target merchants in the first time period.
  • the target data predicts the number of successful transactions in the target merchant gathering area in the second time period after the first time period.
  • the processor 501 may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors or digital signal processors (DSP). , Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory 502 may include a read-only memory and a random access memory, and provides instructions and data to the processor 501.
  • a part of the memory 502 may also include a non-volatile random access memory.
  • the memory 502 may also store device type information.
  • the processor 501 described in the embodiment of the present application can execute the implementation described in the model construction method provided in the embodiment of the present application, and can also execute the implementation manner of the model construction apparatus described in the embodiment of the present application. I will not repeat them here.
  • An embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program includes program instructions that, when executed by a processor, implement the model shown in FIG. 1 or FIG. 2
  • the computer program includes program instructions that, when executed by a processor, implement the model shown in FIG. 1 or FIG. 2
  • the construction method please refer to the description of the embodiment shown in FIG. 1 or FIG. 2 for specific details, which will not be repeated here.
  • the foregoing computer-readable storage medium may be the model construction apparatus described in any of the foregoing embodiments or an internal storage unit of an electronic device, such as a hard disk or memory of the electronic device.
  • the computer-readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk, a smart media card (SMC), or a secure digital (SD) card equipped on the electronic device. Flash card, etc.
  • the computer-readable storage medium may also include both an internal storage unit of the electronic device and an external storage device.
  • the computer-readable storage medium is used to store the computer program and other programs and data required by the electronic device.
  • the computer-readable storage medium can also be used to temporarily store data that has been output or will be output.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种模型构建方法及装置,该方法适用于机器学习,该方法包括:获取M个商户聚集区域中各个商户聚集区域在P个不同时间段内的P组原始数据,以得到M*P组原始数据(S101),再从各个商户聚集区域对应的各组原始数据中筛选出K种目标参数对应的目标数据,以得到M*P组目标数据(S102),将M*P组目标数据包括的每种目标参数对应的M*P个数据进行离散化处理后得到训练样本集(S103),最后基于训练样本集中的M*P个训练样本以及M*P组原始数据中的M*P个交易成功数量构建交易预测模型(S104)。采用所述方法,可以在特定的场景下构建交易预测模型,从而基于该交易预测模型预测城市中不同区域未来的交易成功数量。

Description

一种模型构建方法及装置
本申请要求于2019年6月18日提交中国专利局、申请号为201910529288.4、申请名称为“一种交易预测模型构建方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种模型构建方法及装置。
背景技术
目前,现有的预测模型主要有线性回归模型、卡尔曼滤波预测模型、投入产出预测模型、人工神经网络预测模型等。但在交易预测场景下,由于交易所涉及的数据庞大、且各种数据之间的关系未知,故现有的预测模型不能应用在交易预测场景下预测交易数量。
发明内容
本申请实施例提供一种模型构建方法及装置,可以在特定的场景下构建交易预测模型,从而基于该交易预测模型预测城市中不同区域未来的交易成功数量。
第一方面,本申请实施例提供了一种模型构建方法,该方法包括:
获取M个商户聚集区域中各个商户聚集区域在P个不同时间段内的P组原始数据,以得到M*P组原始数据,其中,一个时间段对应一组原始数据,每个商户聚集区域对应P组原始数据,每组原始数据包括至少一种商户参数和至少一种交易参数,该交易参数中至少包括交易成功数量,每组原始数据中包括的商户参数和交易参数的数量之和为N;
从各个商户聚集区域对应的各组原始数据中筛选出K种目标参数对应的目标数据,以得到M*P组目标数据,各组目标数据包括的K种目标参数相同,K小于或等于N-1;
将该M*P组目标数据包括的每种目标参数对应的M*P个数据进行离散化处理后得到训练样本集,该训练样本集中包括M*P个训练样本,每个训练样本包括该K种目标参数对应的数据离散化后的特征,该K种目标参数中包括商户数量、商户比重、交易申请数量、交易取消数量以及交易趋势中的一种或者多种;
基于该训练样本集中的M*P个训练样本以及该M*P组原始数据中的M*P个交易成功数量构建交易预测模型,该交易预测模型用于基于目标商户聚集区域在第一时间段内的一组目标数据预测该目标商户聚集区域在该第一时间段之后的第二时间段内的交易成功数量。
第二方面,本申请实施例提供了一种模型构建装置,该装置包括:
第一获取模块,用于获取M个商户聚集区域中各个商户聚集区域在P个不同时间段内的P组原始数据,以得到M*P组原始数据,其中,一个时间段对应一组原始数据,每个商户聚集区域对应P组原始数据,每组原始数据包括至少一种商户参数和至少一种交易参数,该交易参数中至少包括交易成功数量,每组原始数据中包括的商户参数和交易参数的数量之和为N;
筛选模块,用于从各个商户聚集区域对应的各组原始数据中筛选出K种目标参数对应的目标数据,以得到M*P组目标数据,各组目标数据包括的K种目标参数相同,K小于或 等于N-1;
离散处理模块,用于将该M*P组目标数据包括的每种目标参数对应的M*P个数据进行离散化处理后得到训练样本集,该训练样本集中包括M*P个训练样本,每个训练样本包括该K种目标参数对应的数据离散化后的特征,该K种目标参数中包括商户数量、商户比重、交易申请数量、交易取消数量以及交易趋势中的一种或者多种;
构建模块,用于基于该训练样本集中的M*P个训练样本以及该M*P组原始数据中的M*P个交易成功数量构建交易预测模型,该交易预测模型用于基于目标商户聚集区域在第一时间段内的一组目标数据预测该目标商户聚集区域在该第一时间段之后的第二时间段内的交易成功数量。
第三方面,本申请实施例提供了一种终端,包括处理器和存储器,该处理器和存储器相互连接,其中,该存储器用于存储支持终端执行上述方法的计算机程序,该计算机程序包括程序指令,该处理器被配置用于调用该程序指令,执行上述第一方面的模型构建方法。
第四方面,本申请实施例提供了一种计算机可读存储介质,该计算机存储介质存储有计算机程序,该计算机程序包括程序指令,该程序指令当被处理器执行时使该处理器执行上述第一方面的模型构建方法。
本申请实施例基于特定的训练样本集构建交易预测模型,可以在特定的场景下构建交易预测模型,从而基于该交易预测模型预测城市中不同区域未来的交易成功数量。
附图说明
图1是本申请实施例提供的模型构建方法的一示意流程图;
图2是本申请实施例提供的模型构建方法的另一示意流程图;
图3a是本申请实施例提供的训练过程的一示意图;
图3b是本申请实施例提供的训练过程的另一示意图;
图4是本申请实施例提供的模型构建装置的一示意性框图;
图5是本申请实施例提供的终端的一示意性框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
下面将结合图1至图5,对本申请实施例提供的模型构建方法及装置进行说明。
参见图1,是本申请实施例提供的模型构建方法的一示意流程图。如图1所示,该模型构建方法可包括步骤:
S101,获取M个商户聚集区域中各个商户聚集区域在P个不同时间段内的P组原始数据,以得到M*P组原始数据。
在一些可行的实施方式中,终端可以从区域数据库中获取目标城市内的M个商户聚集 区域。区域数据库可以用于存储各个城市划分好的商户聚集区域,每个商户聚集区域中包括一个或多个商户。目标城市可以为地级市城市或直辖市城市,如珠海市、深圳市、上海市等。终端可以获取该M个商户聚集区域中各个商户聚集区域在P个不同时间段内的P组原始数据,以得到M*P组原始数据。其中,P可以为24,一个时间段可以为一个月,P个不同的时间段可以为历史连续的24个月,比如2016年11月到2018年11月。一个月可以对应一组原始数据,一组原始数据可以表示一个商户聚集区域在一个时间段内真实存在的数据。每个商户聚集区域在连续的24个月内有24组原始数据,那么M个商户聚集区域在历史连续的24个月内总共有M*24组原始数据。每组原始数据可以包括至少一种商户参数和至少一种交易参数,该至少一种交易参数可以包括交易申请数量、交易成功数量、交易失败数量、交易取消数量、交易成功率或交易趋势等;该至少一种商户参数可以包括商户数量(包括不同类型商户的商户数量,如电子类商户的商户数量、服装类商户的商户数量、美妆类商户的商户数量等等)、商户比重(包括不同类型商户的商户比重)或商户密度等。每组原始数据包括的商户参数与交易参数的数量之和为N,且各组原始数据包括的参数种类可以相同,即各组原始数据包括的N种参数相同。M可以为大于或等于1的整数,N可以为大于或等于2的整数。本申请实施例中涉及的交易可以为贷款。
S102,从各个商户聚集区域对应的各组原始数据中筛选出K种目标参数对应的目标数据,以得到M*P组目标数据。
在一些可行的实施方式中,由于上述各组原始数据中包括N种参数,终端可以获取各个商户聚集区域对应的各组原始数据,并可以将该各组原始数据的N种参数以及该N种参数的各种参数携带的标签输入决策树中进行参数筛选。终端可以获取该决策树基于该各组原始数据的N种参数输出的N-1种参数对交易成功数量的贡献度(比如该决策树中用信息增益来表示贡献度,那么各种参数的信息增益值即为各种参数对交易成功数量这个参数的贡献度)。终端可以获取贡献度阈值。终端可以从该决策树输出的N-1种参数中筛选出对交易成功数量这个参数的贡献度大于或等于该贡献度阈值的K种目标参数,并可以从该各个商户聚集区域对应的各组原始数据中提取出该K种目标参数对应的目标数据,得到M*P组目标数据。其中,各组目标数据包括的K种目标参数可以相同,K可以小于或等于N-1。贡献度阈值可以为预设的值,比如贡献度阈值为0.2。
在一些可行的实施方式中,上述贡献度阈值的获取方式具体为:终端可以将该决策树输出的N-1种参数中各种参数对交易成功数量这个参数的贡献度按照从大到小的顺序进行排列,得到贡献度序列。终端可以获取预设的筛选百分比,并可以计算该N-1与该筛选百分比的乘积取整后的目标值,再可以将该贡献度序列中第目标值个贡献度确定为贡献度阈值。例如,假设N=100,筛选百分比为70%。终端将N-1种参数中各种参数对交易成功数量这个参数的贡献度按照从大到小的顺序进行排列,得到贡献度序列。终端计算N-1=99与筛选百分比70%之间的乘积69.3取整后的目标值69。终端将贡献度序列中第69个贡献度(假设为0.35)作为贡献度阈值。
S103,将M*P组目标数据包括的每种目标参数对应的M*P个数据进行离散化处理后得到训练样本集。
在一些可行的实施方式中,针对上述K种目标参数中的每种目标参数均进行以下操作: 终端可以提取上述M*P组目标数据中目标参数m对应的M*P个数据,并可以基于聚类算法如基于密度的聚类算法(Density-Based Spatial Clustering of Applications with Noise,DBSCAN)对该目标参数m对应的M*P个数据进行离散化处理,得到M*P个特征,该M*P个特征分别属于M个商户聚集区域。终端可以获取该K种目标参数对应的M*P*K个特征,一种目标参数对应M*P个特征,并可以根据该M*P*K个特征以及每个特征所属的商户聚集区域和时间段,确定出包括M*P个训练样本的训练样本集。其中,每个训练样本可以包括一个商户聚集区域在一个时间段内的一组目标数据中K种目标参数对应的数据经过离散化后的特征。K种目标参数中可以包括商户数量、商户比重、交易申请数量、交易取消数量以及交易趋势中的一种或者多种参数。
例如,M=100,P=24,K=50。以K种目标参数中的商户数量这个目标参数为例。终端获取100*24组目标数据中每组目标数据的商户数量这个目标参数对应的数据,由于一组目标数据包括商户数量这个目标参数对应的一个数据,所以100*24组目标数据得到100*24个数据。由于这100*24个数据都属于商户数量这一类参数,故终端基于DBSCAN对这100*24个数据进行离散化处理,以得到100*24个特征。这100*24个特征中每个特征表示一个商户聚集区域在一个时间段内的商户数量离散化抽象出的值。假设商户数量这个目标参数对应的数据包括500家、600家、70家、100家、82家、550家、120家、150家、65家、167家这10个数据,终端基于DBSCAN对这10个数据进行离散化处理,得到商户数量<100家作为I类,100家≤商户数量≤200家作为II类,商户数量≥500家作为III类。假设I类用二进制00表示,II类用二进制01表示,III类用二进制10表示。那么这10个数据离散化后的特征依次为二进制10、10、00、01、00、10、01、01、00、01。
S104,基于训练样本集中的M*P个训练样本以及M*P组原始数据中的M*P个交易成功数量构建交易预测模型。
在一些可行的实施方式中,终端可以获取上述M*P组原始数据中的M*P个交易成功数量。终端可以获取预先设定的基础模型,该基础模型可以包括多种树形模型构成的回归模型。终端可以将上述训练样本集中的M*P个训练样本以及该M*P个交易成功数量输入该基础模型中进行训练,以使该基础模型学习训练样本的各种目标参数与交易成功数量这个参数之间的关系。当该基础模型达到收敛时,即该基础模型基于M*P个训练样本输出的交易成功数量(这里指基础模型输出的预测值)与真实发生的交易成功数量(即上述原始数据中的交易成功数量)之间的差值均在固定范围内波动时,终端可以将此时的基础模型确定为交易预测模型。其中,该交易预测模型可以用于基于一个商户聚集区域在历史的一段时间内的一组目标数据预测这个商户聚集区域在历史的这个时间段的下一时间段内的交易成功数量。比如,将商户聚集区域area_1在2018年12月的一组目标数据(这组目标数据包括上述K种目标参数)输入该交易预测模型中进行预测处理,得到该交易预测模型输出的area_1在2019年1月的交易成功数量。本申请实施例通过将商户聚集区域在不同时间段内的原始数据(包括商户数量、商户比重、交易申请数量、交易成功数量、交易取消数量以及交易趋势等)处理成训练样本,再基于这些训练样本构建交易预测模型,可以针对交易这一特定的场景下构建交易预测模型,从而基于该交易预测模型预测出城市中不同区域未来的交易成功数量。
在本申请实施例中,终端通过获取M个商户聚集区域中各个商户聚集区域在P个不同时间段内的P组原始数据,以得到M*P组原始数据,再从各个商户聚集区域对应的各组原始数据中筛选出K种目标参数对应的目标数据,以得到M*P组目标数据,将M*P组目标数据包括的每种目标参数对应的M*P个数据进行离散化处理后得到训练样本集,最后基于训练样本集中的M*P个训练样本以及M*P组原始数据中的M*P个交易成功数量构建交易预测模型。可以在特定的场景下构建交易预测模型,从而基于该交易预测模型预测出城市中不同区域未来的交易成功数量。
参见图2,是本申请实施例提供的模型构建方法的另一示意流程图。如图2所示,该模型构建方法可包括步骤:
S201,获取M个商户聚集区域中各个商户聚集区域在P个不同时间段内的P组原始数据,以得到M*P组原始数据。
S202,从各个商户聚集区域对应的各组原始数据中筛选出K种目标参数对应的目标数据,以得到M*P组目标数据。
S203,将M*P组目标数据包括的每种目标参数对应的M*P个数据进行离散化处理后得到训练样本集。
在一些可行的实施方式中,本申请实施例中的步骤S201-步骤S203可参考图1所示实施例的步骤S101-步骤S103的实现方式,在此不再赘述。
S204,基于训练样本集中的M*P个训练样本以及M个商户聚集区域在P个不同时间段内的M*P个交易成功数量构建第一回归模型。
在一些可行的实施方式中,上述训练样本集中可以包括M*P个训练样本。每个训练样本可以包括一个商户聚集区域在一个时间段内的一组目标数据中K种目标参数对应的数据经过离散化后的特征。K种目标参数中可以包括商户数量、商户比重、交易申请数量、交易取消数量以及交易趋势中的一种或者多种参数。终端可以获取上述M个商户聚集区域在上述P个不同时间段内的M*P个交易成功数量。终端可以获取预先设定的第一基础模型,该第一基础模型可以包括多种树形模型,该多种树形模型可以串联形成一个回归模型。终端可以将上述训练样本集中的M*P个训练样本以及该M*P个交易成功数量输入该第一基础模型中进行训练,以使该第一基础模型学习训练样本中的各种目标参数与交易成功数量这个参数之间的关系,即确定该第一基础模型中各种树形模型的权重。当该第一基础模型达到收敛时,即各种目标参数与交易成功数量这个参数之间的关系趋于稳定或该第一基础模型中各种树形模型的权重变化范围小时,则终端可以将此时的第一基础模型确定为第一回归模型。其中,该第一回归模型可以主要用于预测商户聚集区域在未来一段时间内的交易成功数量。
例如,如图3a所示,是本申请实施例提供的训练过程的一示意图。其中,第一基础模型包括A1、A2、A3、A4、…、An这n个树形模型。每个树形模型选择(人工设定或模型自己选择)一个训练样本中不同的部分特征进行训练,即每个树形模型学习K种目标参数中不同目标参数与交易成功数量这个参数之间的关系。假设树形模型A1选择任一商户聚集区域在2018年1月这个训练样本中的a1、a2、a3这3种目标参数离散化后的特征进行 训练,树形模型A2选择任一商户聚集区域在2018年1月这个训练样本中的b5、b7这2种目标参数离散化后的特征进行训练;那么任一商户聚集区域在2018年2月这个训练样本中树形模型A1仍然选择a1、a2、a3这3种目标参数离散化后的特征进行训练,任一商户聚集区域在2018年2月这个训练样本中树形模型A2仍然选择b5、b7这2种目标参数离散化后的特征进行训练,以此类推,以便于训练出各种目标参数与验证集中真实的交易成功数量之间的关系(即模型的权重)。如图3a所示,以第一基础模型的一次训练过程为例。终端将上述M个商户聚集区域在上述P个不同时间段内的M*P个交易成功数量作为验证集。终端从训练样本集中取出某个商户聚集区域,假设area_1在2018年1月的训练样本,并从验证集中取出相应的交易成功数量(即相同商户聚集区域即area_1在2018年1月的交易成功数量),将取出的训练样本以及交易成功数量输入第一基础模型中进行训练,以得到第一基础模型中各个树形模型的权重,如图3a中的权重W1、W2、W3、W4、…、Wn。在下一次训练时,则更新第一基础模型中各个树形模型的权重,直到各个树形模型的权重不再发生变化或变化范围在固定范围内时,则停止训练,将停止训练后的第一基础模型作为第一回归模型。
S205,基于训练样本集中的M*P个训练样本、M个商户聚集区域在P个不同时间段内的M*P个交易成功数量以及第一回归模型,构建第二回归模型。
S206,将第一回归模型以及第二回归模型合成为交易预测模型。
在一些可行的实施方式中,在构建出上述第一回归模型之后,终端可以获取上述M个商户聚集区域在上述P个不同时间段内的M*P个交易成功数量,并将该M*P个交易成功数量作为验证集。终端对上述M*P个训练样本中的每个训练样本均进行以下操作:终端可以将任一商户聚集区域i在上述P个不同时间段的任一时间段f内的训练样本h输入上述第一回归模型中进行预测处理,并可以获取该第一回归模型基于该训练样本h输出的该商户聚集区域i在该时间段f的下一时间段的预测值(即第一交易成功数量)。终端可以计算该第一交易成功数量与该验证集中该商户聚集区域i在时间段f的下一时间段内的交易成功数量之间的差值。由此可知,一个训练样本经过第一回归模型处理后可以得到一个预测值(第一交易成功数量),每个预测值(第一交易成功数量)与验证集中对应的真实值(交易成功数量)之间存在一个差值,那么M*P个训练样本就对应M*P个差值。故终端可以获取M*P个训练样本对应的M*P个第一交易成功数量,再可以获取各个第一交易成功数量与该验证集中对应的交易成功数量之间的差值,得到M*P个差值。终端可以获取预先设定的第二基础模型,该第二基础模型可以为回归模型。终端可以将该M*P个差值以及该训练样本中的M*P个训练样本输入该第二基础模型中进行训练,以使该第二基础模型学习差值与训练样本中各种目标参数之间的关系。当该第二基础模型达到收敛时,即该第二基础模型输出的预测值(第二交易成功数量)与该验证集中对应的真实值(交易成功数量)之间的差值在固定范围内波动,则将此时的第二基础模型确定为第二回归模型。终端可以将上述第一回归模型的输出通过减法器与该第二回归模型的输入连接起来,合成为交易预测模型。其中,该第二回归模型可以主要用于根据各种目标参数对该第一回归模型输出的预测值(第一交易成功数量)进行调整,以使经过该第二回归模型调整后的预测值(第二交易成功数量)更接近验证集中对应的真实值(交易成功数量)。
例如,如图3b所示,是本申请实施例提供的训练过程的另一示意图。其中,以第二基础模型的一次训练过程为例。终端将任一商户聚集区域area_1在2018年1月的训练样本依次输入第一回归模型中进行预测处理,得到第一回归模型输出的商户聚集区域area_1在2018年2月的预测值(第一交易成功数量)。终端计算商户聚集区域area_1在2018年2月的第一交易成功数量与验证集中对应的真实值(即area_1在2018年2月的交易成功数量)之间的差值。终端获取预先设定的第二基础模型,并将商户聚集区域area_1在2018年2月的预测值与真实值之间的差值以及area_1在2018年2月的训练样本一起输入第二基础模型中进行训练,以使第二基础模型学习预测值与真实值的差值与各种目标参数之间的关系。在下一次训练时,则对第二基础模型的模型参数进行调整,直到第二基础模型达到收敛时,停止训练,将停止训练后的第二基础模型作为第二回归模型。终端将第一回归模型与第二回归模型合成为交易预测模型。
在一些可行的实施方式中,由于第一回归模型中各个树形模型学习的是不同的目标参数与交易成功数量这个参数之间的关系的。故终端在构建出上述第一回归模型之后,可以获取该第一回归模型中各种树形模型的权重,并可以根据该各种树形模型的权重对M*P个训练样本进行更新。比如,终端将第一回归模型中树形模型的权重大于权重阈值的树形模型所对应的目标参数提取出来。每个训练样本中只保留提取出来的目标参数所对应的特征,其他特征就从每个训练样本中剔除,得到新的训练样本。M*P个训练样本经过更新后就得到M*P个新的训练样本。终端可以基于该M*P个新的训练样本、上述M个商户聚集区域在上述P个不同时间段内的M*P个交易成功数量以及上述第一回归模型构建第二回归模型。
S207,获取目标商户聚集区域在第一时间段内包括K种目标参数的目标数据。
S208,将目标商户聚集区域在第一时间段内的目标数据输入交易预测模型中进行处理,并获取交易预测模型基于第一时间段内的目标数据输出的目标商户聚集区域在第一时间段之后的第二时间段内的交易成功数量。
在一些可行的实施方式中,终端在构建出交易预测模型之后,可以从上述M个商户聚集区域中任选一个商户聚集区域作为目标商户聚集区域,并可以获取该目标商户聚集区域在第一时间段内包括上述K种目标参数(指构建交易预测模型所需的K种目标参数)的目标数据。终端可以将该目标商户聚集区域在该第一时间段内的目标数据输入上述交易预测模型中进行处理,并可以获取该交易预测模型基于该第一时间段内的目标数据输出的该目标商户聚集区域在该第一时间段之后的第二时间段内的交易成功数量。其中,目标数据可以包括商户数量、商户比重、交易申请数量、交易取消数量以及交易趋势中的一种或者多种参数。第一时间段可以不属于上述P个不同时间段,假设P个不同时间段为2016年11月到2018年11月之间的24个月,那么第一时间段可以为这P个不同时间段之后的时间段,如2018年12月。第一时间段与第二时间段的时间长度相同,比如,第一时间段为2018年12月,那么第二时间段为2019年1月,第一时间段与第二时间段的时间长度均为一个月。本申请实施例通过将商户聚集区域在不同时间段内的原始数据(包括商户数量、商户比重、交易申请数量、交易成功数量、交易取消数量以及交易趋势等)处理成训练样本,再基于这些训练样本构建交易预测模型,在交易预测模型构建好之后,基于该交易预测模型预测城市中不同区域未来的交易成功数量,从而指导业务员更有针对性的开展业务。
在本申请实施例中,终端通过获取M个商户聚集区域中各个商户聚集区域在P个不同时间段内的P组原始数据,以得到M*P组原始数据,再从各个商户聚集区域对应的各组原始数据中筛选出K种目标参数对应的目标数据,以得到M*P组目标数据,将M*P组目标数据包括的每种目标参数对应的M*P个数据进行离散化处理后得到训练样本集。然后基于训练样本集中的M*P个训练样本以及M个商户聚集区域在P个不同时间段内的M*P个交易成功数量构建第一回归模型,基于训练样本集中的M*P个训练样本、M个商户聚集区域在P个不同时间段内的M*P个交易成功数量以及第一回归模型,构建第二回归模型,将第一回归模型以及第二回归模型合成为交易预测模型。最后基于交易预测模型预测目标商户聚集区域在未来一段时间内的交易成功数量。从而指导业务员更有针对性的开展业务。
参见图4,是本申请实施例提供的模型构建装置的一示意性框图。如图4所示,本申请实施例的装置包括:
第一获取模块10,用于获取M个商户聚集区域中各个商户聚集区域在P个不同时间段内的P组原始数据,以得到M*P组原始数据,其中,一个时间段对应一组原始数据,每个商户聚集区域对应P组原始数据,每组原始数据包括至少一种商户参数和至少一种交易参数,该交易参数中至少包括交易成功数量,每组原始数据中包括的商户参数和交易参数的数量之和为N;
筛选模块20,用于从各个商户聚集区域对应的各组原始数据中筛选出K种目标参数对应的目标数据,以得到M*P组目标数据,各组目标数据包括的K种目标参数相同,K小于或等于N-1;
离散处理模块30,用于将该M*P组目标数据包括的每种目标参数对应的M*P个数据进行离散化处理后得到训练样本集,该训练样本集中包括M*P个训练样本,每个训练样本包括该K种目标参数对应的数据离散化后的特征,该K种目标参数中包括商户数量、商户比重、交易申请数量、交易取消数量以及交易趋势中的一种或者多种;
构建模块40,用于基于该训练样本集中的M*P个训练样本以及该M*P组原始数据中的M*P个交易成功数量构建交易预测模型,该交易预测模型用于基于目标商户聚集区域在第一时间段内的一组目标数据预测该目标商户聚集区域在该第一时间段之后的第二时间段内的交易成功数量。
在一些可行的实施方式中,上述筛选模块20还用于:
将该各个商户聚集区域对应的各组原始数据的N种参数以及各种参数所携带的标签输入决策树中进行筛选;获取该决策树基于该各个商户聚集区域对应的各组原始数据的N种参数输出的N-1种参数对交易成功数量的贡献度;从输出的N-1种参数中筛选出对交易成功数量的贡献度大于或等于贡献度阈值的K种目标参数,从该各个商户聚集区域对应的各组原始数据中提取出该K种目标参数对应的目标数据,得到M*P组目标数据。
在一些可行的实施方式中,该装置还包括第二获取模块50、输入模块60以及第三获取模块70。该第二获取模块50,用于获取目标商户聚集区域在第一时间段内包括该K种目标参数的目标数据;该输入模块60,用于将该目标商户聚集区域在第一时间段内的目标数据输入该交易预测模型中进行处理;该第三获取模块70,用于获取该交易预测模型基于 该第一时间段内的目标数据输出的该目标商户聚集区域在该第一时间段之后的第二时间段内的交易成功数量,该第一时间段与该第二时间段的时间长度一致。
在一些可行的实施方式中,上述构建模块40包括第一构建单元401、第二构建单元402以及合成单元403。该第一构建单元401,用于基于该M*P个训练样本以及该M个商户聚集区域在该P个不同时间段内的M*P个交易成功数量构建第一回归模型;该第二构建单元402,用于基于该M*P个训练样本、该M个商户聚集区域在该P个不同时间段内的M*P个交易成功数量以及该第一回归模型,构建第二回归模型;该合成单元403,用于将该第一回归模型以及该第二回归模型合成为交易预测模型。
在一些可行的实施方式中,上述第二构建单元402具体用于:对该M*P个训练样本中的每个训练样本均进行以下操作:将任一商户聚集区域i在该P个不同时间段的任一时间段f内的训练样本h输入该第一回归模型中进行处理,获取该第一回归模型基于该训练样本h输出的该商户聚集区域i在该时间段f的下一时间段的第一交易成功数量;获取该第一交易成功数量与该商户聚集区域i在该时间段f的下一时间段内的交易成功数量之间的差值。
获取M*P个训练样本对应的M*P个第一交易成功数量,并获取各个第一交易成功数量与对应的交易成功数量之间的差值,得到M*P个差值,其中一个训练样本对应一个第一交易成功数量;基于该M*P个差值和该M*P个训练样本构建第二回归模型,以使该第二回归模型学习该差值与该K种目标参数的各种目标参数之间的关系。
具体实现中,上述模型构建装置可通过上述各个模块执行上述图1或图2所提供的实现方式中各个步骤所提供的实现方式,实现上述各实施例中所实现的功能,具体可参见上述图1或图2所示的方法实施例中各个步骤提供的相应描述,在此不再赘述。
在本申请实施例中,模型构建装置通过获取M个商户聚集区域中各个商户聚集区域在P个不同时间段内的P组原始数据,以得到M*P组原始数据,再从各个商户聚集区域对应的各组原始数据中筛选出K种目标参数对应的目标数据,以得到M*P组目标数据,将M*P组目标数据包括的每种目标参数对应的M*P个数据进行离散化处理后得到训练样本集,最后基于训练样本集中的M*P个训练样本以及M*P组原始数据中的M*P个交易成功数量构建交易预测模型。可以在特定的场景下构建交易预测模型,从而基于该交易预测模型预测出城市中不同区域未来的交易成功数量。
参见图5,是本申请实施例提供的终端的一示意性框图。如图5所示,本申请实施例中的终端可以包括:一个或多个处理器501和存储器502。上述处理器501和存储器502通过总线503连接。存储器502用于存储计算机程序,所述计算机程序包括程序指令,处理器501用于执行存储器502存储的程序指令。其中,处理器501被配置用于调用所述程序指令执行:
获取M个商户聚集区域中各个商户聚集区域在P个不同时间段内的P组原始数据,以得到M*P组原始数据,其中,一个时间段对应一组原始数据,每个商户聚集区域对应P组原始数据,每组原始数据包括至少一种商户参数和至少一种交易参数,该交易参数中至少包括交易成功数量,每组原始数据中包括的商户参数和交易参数的数量之和为N;从各个商户聚集区域对应的各组原始数据中筛选出K种目标参数对应的目标数据,以得到M*P 组目标数据,各组目标数据包括的K种目标参数相同,K小于或等于N-1;将该M*P组目标数据包括的每种目标参数对应的M*P个数据进行离散化处理后得到训练样本集,该训练样本集中包括M*P个训练样本,每个训练样本包括该K种目标参数对应的数据离散化后的特征,该K种目标参数中包括商户数量、商户比重、交易申请数量、交易取消数量以及交易趋势中的一种或者多种;基于该训练样本集中的M*P个训练样本以及该M*P组原始数据中的M*P个交易成功数量构建交易预测模型,该交易预测模型用于基于目标商户聚集区域在第一时间段内的一组目标数据预测该目标商户聚集区域在该第一时间段之后的第二时间段内的交易成功数量。
应当理解,在本申请实施例中,所称处理器501可以是中央处理单元(Central Processing Unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
该存储器502可以包括只读存储器和随机存取存储器,并向处理器501提供指令和数据。存储器502的一部分还可以包括非易失性随机存取存储器。例如,存储器502还可以存储设备类型的信息。
具体实现中,本申请实施例中所描述的处理器501可执行本申请实施例提供的模型构建方法中所描述的实现方式,也可执行本申请实施例所描述的模型构建装置的实现方式,在此不再赘述。
本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序包括程序指令,该程序指令被处理器执行时实现图1或图2所示的模型构建方法,具体细节请参照图1或图2所示实施例的描述,在此不再赘述。
上述计算机可读存储介质可以是前述任一实施例所述的模型构建装置或电子设备的内部存储单元,例如电子设备的硬盘或内存。该计算机可读存储介质也可以是该电子设备的外部存储设备,例如该电子设备上配备的插接式硬盘,智能存储卡(smart media card,SMC),安全数字(secure digital,SD)卡,闪存卡(flash card)等。进一步地,该计算机可读存储介质还可以既包括该电子设备的内部存储单元也包括外部存储设备。该计算机可读存储介质用于存储该计算机程序以及该电子设备所需的其他程序和数据。该计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (20)

  1. 一种模型构建方法,其特征在于,包括:
    获取M个商户聚集区域中各个商户聚集区域在P个不同时间段内的P组原始数据,以得到M*P组原始数据,其中,一个时间段对应一组原始数据,每个商户聚集区域对应P组原始数据,每组原始数据包括至少一种商户参数和至少一种交易参数,所述交易参数中至少包括交易成功数量,每组原始数据中包括的商户参数和交易参数的数量之和为N;
    从各个商户聚集区域对应的各组原始数据中筛选出K种目标参数对应的目标数据,以得到M*P组目标数据,各组目标数据包括的K种目标参数相同,K小于或等于N-1;
    将所述M*P组目标数据包括的每种目标参数对应的M*P个数据进行离散化处理后得到训练样本集,所述训练样本集中包括M*P个训练样本,每个训练样本包括所述K种目标参数对应的数据离散化后的特征,所述K种目标参数中包括商户数量、商户比重、交易申请数量、交易取消数量以及交易趋势中的一种或者多种;
    基于所述训练样本集中的M*P个训练样本以及所述M*P组原始数据中的M*P个交易成功数量构建交易预测模型,所述交易预测模型用于基于目标商户聚集区域在第一时间段内的一组目标数据预测所述目标商户聚集区域在所述第一时间段之后的第二时间段内的交易成功数量。
  2. 根据权利要求1所述的方法,其特征在于,所述从所述各个商户聚集区域对应的各组原始数据中筛选出K种目标参数对应的目标数据,以得到M*P组目标数据之前,所述方法还包括:
    将所述各个商户聚集区域对应的各组原始数据的N种参数以及各种参数所携带的标签输入决策树中进行筛选;
    获取所述决策树基于所述各个商户聚集区域对应的各组原始数据的N种参数输出的N-1种参数对交易成功数量的贡献度;
    从输出的N-1种参数中筛选出对交易成功数量的贡献度大于或等于贡献度阈值的K种目标参数。
  3. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    获取目标商户聚集区域在第一时间段内包括所述K种目标参数的目标数据;
    将所述目标商户聚集区域在第一时间段内的目标数据输入所述交易预测模型中进行处理,并获取所述交易预测模型基于所述第一时间段内的目标数据输出的所述目标商户聚集区域在所述第一时间段之后的第二时间段内的交易成功数量,所述第一时间段与所述第二时间段的时间长度一致。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述基于所述训练样本集中的M*P个训练样本以及所述M*P组原始数据中的M*P个交易成功数量构建交易预测模型,包括:
    基于所述M*P个训练样本以及所述M个商户聚集区域在所述P个不同时间段内的M*P个交易成功数量构建第一回归模型;
    基于所述M*P个训练样本、所述M个商户聚集区域在所述P个不同时间段内的M*P 个交易成功数量以及所述第一回归模型,构建第二回归模型;
    将所述第一回归模型以及所述第二回归模型合成为交易预测模型。
  5. 根据权利要求4所述的方法,其特征在于,所述基于所述M*P个训练样本、所述M个商户聚集区域在所述P个不同时间段内的M*P个交易成功数量以及所述第一回归模型,构建第二回归模型,包括:
    对所述M*P个训练样本中的每个训练样本均进行以下操作:
    将任一商户聚集区域i在所述P个不同时间段的任一时间段f内的训练样本h输入所述第一回归模型中进行处理,获取所述第一回归模型基于所述训练样本h输出的所述商户聚集区域i在所述时间段f的下一时间段的第一交易成功数量;
    获取所述第一交易成功数量与所述商户聚集区域i在所述时间段f的下一时间段内的交易成功数量之间的差值;
    获取M*P个训练样本对应的M*P个第一交易成功数量,并获取各个第一交易成功数量与对应的交易成功数量之间的差值,得到M*P个差值,其中一个训练样本对应一个第一交易成功数量;
    基于所述M*P个差值和所述M*P个训练样本构建第二回归模型,以使所述第二回归模型学习所述差值与所述K种目标参数的各种目标参数之间的关系。
  6. 根据权利要求2所述的方法,其特征在于,所述获取所述决策树基于所述各个商户聚集区域对应的各组原始数据的N种参数输出的N-1种参数对交易成功数量的贡献度之后,所述方法还包括:
    将所述决策树输出的N-1种参数中各种参数对交易成功数量的贡献度按照从大到小的顺序进行排列,得到贡献度序列;
    获取预设的筛选百分比,并计算N-1与所述筛选百分比的乘积取整后的目标值;
    将所述贡献度序列中第目标值个贡献度确定为贡献度阈值。
  7. 根据权利要求4所述的方法,其特征在于,所述基于所述M*P个训练样本以及所述M个商户聚集区域在所述P个不同时间段内的M*P个交易成功数量构建第一回归模型,包括:
    获取预先设定的第一基础模型,所述第一基础模型包括多种树形模型,所述多种树形模型串联为回归模型;
    将所述M*P个训练样本以及所述M*P个交易成功数量输入所述第一基础模型中进行训练;
    将收敛的所述第一基础模型确定为第一回归模型。
  8. 一种模型构建装置,其特征在于,包括:
    第一获取模块,用于获取M个商户聚集区域中各个商户聚集区域在P个不同时间段内的P组原始数据,以得到M*P组原始数据,其中,一个时间段对应一组原始数据,每个商户聚集区域对应P组原始数据,每组原始数据包括至少一种商户参数和至少一种交易参数,所述交易参数中至少包括交易成功数量,每组原始数据中包括的商户参数和交易参数的数量之和为N;
    筛选模块,用于从各个商户聚集区域对应的各组原始数据中筛选出K种目标参数对应 的目标数据,以得到M*P组目标数据,各组目标数据包括的K种目标参数相同,K小于或等于N-1;
    离散处理模块,用于将所述M*P组目标数据包括的每种目标参数对应的M*P个数据进行离散化处理后得到训练样本集,所述训练样本集中包括M*P个训练样本,每个训练样本包括所述K种目标参数对应的数据离散化后的特征,所述K种目标参数中包括商户数量、商户比重、交易申请数量、交易取消数量以及交易趋势中的一种或者多种;
    构建模块,用于基于所述训练样本集中的M*P个训练样本以及所述M*P组原始数据中的M*P个交易成功数量构建交易预测模型,所述交易预测模型用于基于目标商户聚集区域在第一时间段内的一组目标数据预测所述目标商户聚集区域在所述第一时间段之后的第二时间段内的交易成功数量。
  9. 根据权利要求8所述的装置,其特征在于,所述筛选模块还用于:
    将所述各个商户聚集区域对应的各组原始数据的N种参数以及各种参数所携带的标签输入决策树中进行筛选;
    获取所述决策树基于所述各个商户聚集区域对应的各组原始数据的N种参数输出的N-1种参数对交易成功数量的贡献度;
    从输出的N-1种参数中筛选出对交易成功数量的贡献度大于或等于贡献度阈值的K种目标参数。
  10. 根据权利要求8所述的装置,其特征在于,所述装置还包括:
    第二获取模块,用于获取目标商户聚集区域在第一时间段内包括所述K种目标参数的目标数据;
    输入模块,用于将所述目标商户聚集区域在第一时间段内的目标数据输入所述交易预测模型中进行处理;
    第三获取模块,用于获取所述交易预测模型基于所述第一时间段内的目标数据输出的所述目标商户聚集区域在所述第一时间段之后的第二时间段内的交易成功数量,所述第一时间段与所述第二时间段的时间长度一致。
  11. 根据权利要求8-10任一项所述的装置,其特征在于,所述构建模块包括:
    第一构建单元,用于基于所述M*P个训练样本以及所述M个商户聚集区域在所述P个不同时间段内的M*P个交易成功数量构建第一回归模型;
    第二构建单元,用于基于所述M*P个训练样本、所述M个商户聚集区域在所述P个不同时间段内的M*P个交易成功数量以及所述第一回归模型,构建第二回归模型;
    合成单元,用于将所述第一回归模型以及所述第二回归模型合成为交易预测模型。
  12. 根据权利要求11所述的装置,其特征在于,所述第二构建单元具体用于:
    对所述M*P个训练样本中的每个训练样本均进行以下操作:
    将任一商户聚集区域i在所述P个不同时间段的任一时间段f内的训练样本h输入所述第一回归模型中进行处理,获取所述第一回归模型基于所述训练样本h输出的所述商户聚集区域i在所述时间段f的下一时间段的第一交易成功数量;
    获取所述第一交易成功数量与所述商户聚集区域i在所述时间段f的下一时间段内的交易成功数量之间的差值;
    获取M*P个训练样本对应的M*P个第一交易成功数量,并获取各个第一交易成功数量与对应的交易成功数量之间的差值,得到M*P个差值,其中一个训练样本对应一个第一交易成功数量;
    基于所述M*P个差值和所述M*P个训练样本构建第二回归模型,以使所述第二回归模型学习所述差值与所述K种目标参数的各种目标参数之间的关系。
  13. 根据权利要求9所述的装置,其特征在于,所述筛选模块还用于:
    将所述决策树输出的N-1种参数中各种参数对交易成功数量的贡献度按照从大到小的顺序进行排列,得到贡献度序列;
    获取预设的筛选百分比,并计算N-1与所述筛选百分比的乘积取整后的目标值;
    将所述贡献度序列中第目标值个贡献度确定为贡献度阈值。
  14. 根据权利要求11所述的装置,其特征在于,所述第一构建单元具体用于:
    获取预先设定的第一基础模型,所述第一基础模型包括多种树形模型,所述多种树形模型串联为回归模型;
    将所述M*P个训练样本以及所述M*P个交易成功数量输入所述第一基础模型中进行训练;
    将收敛的所述第一基础模型确定为第一回归模型。
  15. 一种终端,其特征在于,包括处理器和存储器,所述处理器和存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行:
    获取M个商户聚集区域中各个商户聚集区域在P个不同时间段内的P组原始数据,以得到M*P组原始数据,其中,一个时间段对应一组原始数据,每个商户聚集区域对应P组原始数据,每组原始数据包括至少一种商户参数和至少一种交易参数,所述交易参数中至少包括交易成功数量,每组原始数据中包括的商户参数和交易参数的数量之和为N;
    从各个商户聚集区域对应的各组原始数据中筛选出K种目标参数对应的目标数据,以得到M*P组目标数据,各组目标数据包括的K种目标参数相同,K小于或等于N-1;
    将所述M*P组目标数据包括的每种目标参数对应的M*P个数据进行离散化处理后得到训练样本集,所述训练样本集中包括M*P个训练样本,每个训练样本包括所述K种目标参数对应的数据离散化后的特征,所述K种目标参数中包括商户数量、商户比重、交易申请数量、交易取消数量以及交易趋势中的一种或者多种;
    基于所述训练样本集中的M*P个训练样本以及所述M*P组原始数据中的M*P个交易成功数量构建交易预测模型,所述交易预测模型用于基于目标商户聚集区域在第一时间段内的一组目标数据预测所述目标商户聚集区域在所述第一时间段之后的第二时间段内的交易成功数量。
  16. 根据权利要求15所述的终端,其特征在于,所述处理器还用于:
    将所述各个商户聚集区域对应的各组原始数据的N种参数以及各种参数所携带的标签输入决策树中进行筛选;
    获取所述决策树基于所述各个商户聚集区域对应的各组原始数据的N种参数输出的N-1种参数对交易成功数量的贡献度;
    从输出的N-1种参数中筛选出对交易成功数量的贡献度大于或等于贡献度阈值的K种目标参数。
  17. 根据权利要求15所述的终端,其特征在于,所述处理器还用于:
    获取目标商户聚集区域在第一时间段内包括所述K种目标参数的目标数据;
    将所述目标商户聚集区域在第一时间段内的目标数据输入所述交易预测模型中进行处理,并获取所述交易预测模型基于所述第一时间段内的目标数据输出的所述目标商户聚集区域在所述第一时间段之后的第二时间段内的交易成功数量,所述第一时间段与所述第二时间段的时间长度一致。
  18. 根据权利要求15-17任一项所述的终端,其特征在于,所述处理器具体用于:
    基于所述M*P个训练样本以及所述M个商户聚集区域在所述P个不同时间段内的M*P个交易成功数量构建第一回归模型;
    基于所述M*P个训练样本、所述M个商户聚集区域在所述P个不同时间段内的M*P个交易成功数量以及所述第一回归模型,构建第二回归模型;
    将所述第一回归模型以及所述第二回归模型合成为交易预测模型。
  19. 根据权利要求18所述的终端,其特征在于,所述处理器还具体用于:
    对所述M*P个训练样本中的每个训练样本均进行以下操作:
    将任一商户聚集区域i在所述P个不同时间段的任一时间段f内的训练样本h输入所述第一回归模型中进行处理,获取所述第一回归模型基于所述训练样本h输出的所述商户聚集区域i在所述时间段f的下一时间段的第一交易成功数量;
    获取所述第一交易成功数量与所述商户聚集区域i在所述时间段f的下一时间段内的交易成功数量之间的差值;
    获取M*P个训练样本对应的M*P个第一交易成功数量,并获取各个第一交易成功数量与对应的交易成功数量之间的差值,得到M*P个差值,其中一个训练样本对应一个第一交易成功数量;
    基于所述M*P个差值和所述M*P个训练样本构建第二回归模型,以使所述第二回归模型学习所述差值与所述K种目标参数的各种目标参数之间的关系。
  20. 一种计算机非易失性可读存储介质,其特征在于,所述计算机非易失性可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行如权利要求1-7任一项所述的方法。
PCT/CN2019/117071 2019-06-18 2019-11-11 一种模型构建方法及装置 WO2020253038A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910529288.4 2019-06-18
CN201910529288.4A CN110335067A (zh) 2019-06-18 2019-06-18 一种交易预测模型构建方法及装置

Publications (1)

Publication Number Publication Date
WO2020253038A1 true WO2020253038A1 (zh) 2020-12-24

Family

ID=68142505

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117071 WO2020253038A1 (zh) 2019-06-18 2019-11-11 一种模型构建方法及装置

Country Status (2)

Country Link
CN (1) CN110335067A (zh)
WO (1) WO2020253038A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113763111A (zh) * 2021-02-10 2021-12-07 北京沃东天骏信息技术有限公司 物品搭配方法、装置及存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110335067A (zh) * 2019-06-18 2019-10-15 平安普惠企业管理有限公司 一种交易预测模型构建方法及装置
CN111798263A (zh) * 2020-05-22 2020-10-20 北京国电通网络技术有限公司 一种交易趋势的预测方法和装置
CN112488831A (zh) * 2020-11-20 2021-03-12 东软集团股份有限公司 区块链网络交易方法、装置、存储介质及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150206064A1 (en) * 2014-01-19 2015-07-23 Jacob Levman Method for supervised machine learning
CN108460490A (zh) * 2018-03-16 2018-08-28 阿里巴巴集团控股有限公司 一种业务发生量的预测方法、装置及设备
CN108573358A (zh) * 2018-05-09 2018-09-25 平安普惠企业管理有限公司 一种逾期预测模型生成方法及终端设备
CN109214578A (zh) * 2018-09-19 2019-01-15 平安科技(深圳)有限公司 电子装置、基于决策树模型的建筑物用电负荷预测方法及存储介质
CN110335067A (zh) * 2019-06-18 2019-10-15 平安普惠企业管理有限公司 一种交易预测模型构建方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150206064A1 (en) * 2014-01-19 2015-07-23 Jacob Levman Method for supervised machine learning
CN108460490A (zh) * 2018-03-16 2018-08-28 阿里巴巴集团控股有限公司 一种业务发生量的预测方法、装置及设备
CN108573358A (zh) * 2018-05-09 2018-09-25 平安普惠企业管理有限公司 一种逾期预测模型生成方法及终端设备
CN109214578A (zh) * 2018-09-19 2019-01-15 平安科技(深圳)有限公司 电子装置、基于决策树模型的建筑物用电负荷预测方法及存储介质
CN110335067A (zh) * 2019-06-18 2019-10-15 平安普惠企业管理有限公司 一种交易预测模型构建方法及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113763111A (zh) * 2021-02-10 2021-12-07 北京沃东天骏信息技术有限公司 物品搭配方法、装置及存储介质

Also Published As

Publication number Publication date
CN110335067A (zh) 2019-10-15

Similar Documents

Publication Publication Date Title
WO2020253038A1 (zh) 一种模型构建方法及装置
US11461847B2 (en) Applying a trained model to predict a future value using contextualized sentiment data
WO2019153518A1 (zh) 信息推送方法、装置、计算机设备及存储介质
CN107563757B (zh) 数据风险识别的方法及装置
TW202011285A (zh) 樣本屬性評估模型訓練方法、裝置及伺服器
WO2019114423A1 (zh) 对模型预测值进行融合的方法、装置和设备
CN111080304B (zh) 一种可信关系识别方法、装置及设备
CN105446988B (zh) 预测类别的方法和装置
CN111401940B (zh) 特征预测方法、装置、电子设备及存储介质
CN112633842B (zh) 任务推送方法、装置及系统
TW201833851A (zh) 風控事件自動處理方法及裝置
WO2022083093A1 (zh) 图谱中的概率计算方法、装置、计算机设备及存储介质
CN113379301A (zh) 通过决策树模型对用户进行分类的方法、装置和设备
US20230351426A1 (en) Techniques to predict and implement an amortized bill payment system
CN110197426B (zh) 一种信用评分模型的建立方法、装置及可读存储介质
CN112529477A (zh) 信用评估变量筛选方法、装置、计算机设备及存储介质
CN116684330A (zh) 基于人工智能的流量预测方法、装置、设备及存储介质
CN111835536B (zh) 一种流量预测方法和装置
US20190340514A1 (en) System and method for generating ultimate reason codes for computer models
KR102152081B1 (ko) 딥러닝 기반의 가치 평가 방법 및 그 장치
CN108446738A (zh) 一种聚类方法、装置及电子设备
WO2022252694A1 (zh) 神经网络优化方法及其装置
US20220083571A1 (en) Systems and methods for classifying imbalanced data
CN113269259B (zh) 一种目标信息的预测方法及装置
CN110060146B (zh) 一种数据转移方法及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19933961

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19933961

Country of ref document: EP

Kind code of ref document: A1