CN114170002A - Method and device for predicting access frequency - Google Patents

Method and device for predicting access frequency Download PDF

Info

Publication number
CN114170002A
CN114170002A CN202111509654.3A CN202111509654A CN114170002A CN 114170002 A CN114170002 A CN 114170002A CN 202111509654 A CN202111509654 A CN 202111509654A CN 114170002 A CN114170002 A CN 114170002A
Authority
CN
China
Prior art keywords
target object
access frequency
access
attribute information
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111509654.3A
Other languages
Chinese (zh)
Inventor
陈雯
钟皓明
张海川
梁剑
邹京甫
吕晟东
许阿虹
李泓佑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202111509654.3A priority Critical patent/CN114170002A/en
Publication of CN114170002A publication Critical patent/CN114170002A/en
Priority to PCT/CN2022/120519 priority patent/WO2023103527A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Abstract

The application provides a method and a device for predicting access frequency, which comprise the following steps: acquiring acquisition data, wherein the acquisition data comprises: the repeated access frequency of each target object, the time interval of the first access of each target object from the latest access and the time interval of the first access of each target object from the current time; inputting the acquired data into a parameter prediction model, and acquiring a middle value of the access frequency of each target object in a preset time period; and performing linear fitting based on the attribute information of the target object and each intermediate value, and determining the access frequency of each target object in a preset time period. According to the method, when the access frequency of the target object in the preset time period is determined, the access frequency is determined after linear fitting is performed on the attribute information and the intermediate value of the target object, the accuracy of prediction of the access frequency can be guaranteed through the method, and the access frequency is determined to be more reliable.

Description

Method and device for predicting access frequency
Technical Field
The invention relates to the field of financial technology (Fintech), in particular to a method and a device for predicting access frequency.
Background
With the development of computer technology, more and more technologies are applied in the financial field, the traditional financial industry is gradually changing to financial technology (Finteh), big data technology is no exception, but due to the requirements of the financial industry on safety and real-time performance, higher requirements are also put forward on the technologies.
In the financial industry such as banks, the demand of enterprise customers for some products, such as loan products, is not very high frequently in a period of time, and the demand cycle of the enterprise customers is self-provided. For example, some enterprises may demand significantly more loan in some months than in other months, some enterprises in a particular industry may exhibit a completely different frequency than others, and so on, which may challenge accurately predicting the user's attrition probability.
The user's attrition probability can be generally predicted by calculating the active days, retention rate and the like of the last 3, 7, 15 and 30 days of each user, but the method is limited in collecting loan habit information of the user, limited by a time window and low in prediction efficiency. Considering the industry nature of the enterprise and the historical transaction with the bank, some rules are manually determined or the decision tree model is combined to divide the customer group, but the manually defined rules may need to be modified from time to time, and once the variables are changed, the efficiency of calculating statistics becomes low.
Disclosure of Invention
The application provides a method and a device for predicting access frequency, which are used for predicting the access frequency of bank customers and improving the accuracy and efficiency of predicting the access frequency.
In a first aspect, the present application provides a method for predicting access frequency, where the method is applicable to a computing device, such as a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a server, and the like, and the method is not particularly limited herein. The server is only described as an example, wherein the server may obtain the collected data, and the collecting the data includes: the repeated access frequency of each target object, the time interval of the first access of each target object from the latest access and the time interval of the first access of each target object from the current time; then, inputting the acquired data into a parameter prediction model, and acquiring a middle value of the access frequency of each target object in a preset time period; and performing linear fitting based on the attribute information of the target object and each intermediate value, and determining the access frequency of each target object in a preset time period.
It should be noted that the target object may be a customer of a bank, a customer of a supermarket, a customer of an online service platform, and the like, and the target object may be an individual or a business, and the present application is not particularly limited herein. Taking the bank client as an example for explanation, the repeated access frequency of the target object can be understood as the total number of times that the bank client accesses the bank offline (physical store for bank transaction) or online (bank software application for transaction or telephone consultation with bank service staff for transaction) minus 1. For clarity, the time interval between the first visit of the target object and the last visit is illustrated by the following example, assuming that the time for the customer a of the bank to visit the bank for the first time (online or offline) is 11/05: 56 in 2020, and the time for the customer a to visit the bank for the last time is 11/12/11/2021/10/06: 57 in 2021, the time interval between the first visit of the target object and the last visit is 366 days, 1/1 second. For clarity, the time interval between the first visit of the target object and the current time can be illustrated by the following example, assuming that the first visit (online or offline) of the bank by customer a is 10:05:56 at 11/2020/11/12/10: 24:56 at 2021/11/12, the time interval between the first visit of the target object and the last visit is 19 minutes in 366 days.
The method includes the steps that the parameter prediction model is used for predicting collected data of each target object to determine an intermediate value of access frequency of each target object in a preset time period, the access frequency of each target object predicted based on the intermediate value is not accurate enough, accuracy is not enough, attribute information of the target object is introduced, the attribute information can be industries, business conditions and the like of the target object, and the method is not particularly limited. According to the method, the attribute information of the target object is fitted with the intermediate value, and the access frequency of the target object in the preset time period is determined.
In an optional manner, the attribute information of the target object includes M; the server can classify the attribute information of the ith target object to obtain Xi sub-attribute values; i, taking 1-M times; xi is an integer; arranging and combining the sub-attribute values of the attribute information of the M target objects to obtain
Figure BDA0003405275630000031
A combination of two; determining the mean value of the intermediate values corresponding to each combination; and performing parameter fitting based on the mean value of the intermediate values, and determining the parameter value of the linear fitting model.
By classifying the attribute information of the target object, the server can be ensured to better fit the intermediate value, the accuracy of the prediction of the access frequency is improved, and the efficiency of the prediction of the access frequency can be improved.
In an optional manner, the number of target objects corresponding to the sub-attribute values of the attribute information of any one target object is subject to uniform distribution.
By classifying the attribute information of the target object in the mode, the efficiency of predicting the access frequency can be ensured, and the efficiency of predicting the access frequency can be improved.
In an alternative mode, the server may input the intermediate value of each target object into the linear fitting model, and determine the access frequency of each target object within a preset time period.
The method can improve the accuracy of the prediction of the access frequency and can improve the efficiency of the prediction of the access frequency.
In an alternative mode, the server may construct a triplet of the repeated access frequency, the time interval of the last access from the first access, and the time interval of the current time from the first access of any target object; inputting the triple of each target object into a parameter prediction model, and acquiring a middle value of the access frequency of each target object in a preset time period; the parameter presetting model is a beta geometric-negative binomial distribution (BG-NBD) model.
In an optional mode, the BG-NBD model predicts based on the distribution of the time intervals at which each target object visits, the distribution of the probability that each target object does not visit, and the triples of each target object, and determines the median of the visit frequency of each target object within a preset time period.
In an alternative approach, the parameters of the linear fit model may be determined by:
the server inputs the attribute information of j target objects into a pre-trained linear fitting model, and the parameter preset values of 1 group of linear fitting models are determined; j is changed to 1-M; and screening the parameter preset values of the multiple groups of linear fitting models based on the screening rule to determine the parameters of the linear models.
In an alternative approach, the screening rule is determined by the following formula:
Figure BDA0003405275630000041
Figure BDA0003405275630000042
wherein, adjusted R2Indicating a correction decision coefficient; r2Indicating the decision coefficient; n indicates the acquisition number of the target objectAccording to the total number, k indicates the number of attribute information, yiAn intermediate value indicating the target object i,
Figure BDA0003405275630000043
the results of the operation of the linear fit model are indicated,
Figure BDA0003405275630000044
indicating the mean of the median values.
In an alternative manner, the attribute information of the target object includes one or more of the following: the business information of the target object, the service life of the target object, the registered capital of the target object, the change condition of the target object and the financial condition of the target object.
In an alternative approach, the linear fitting model is a Linear Regression (LR) model.
In a second aspect, the present application provides an apparatus for predicting access frequency, where the obtaining unit is configured to obtain collected data, where the collecting data includes: the repeated access frequency of each target object, the time interval of the first access of each target object from the latest access and the time interval of the first access of each target object from the current time; the prediction unit is used for inputting the acquired data into the parameter prediction model and acquiring the intermediate value of the access frequency of each target object in a preset time period; and the determining unit is used for performing linear fitting on the basis of the attribute information of the target object and each intermediate value and determining the access frequency of each target object in a preset time period.
In an optional manner, the attribute information of the target object includes M; the server can classify the attribute information of the ith target object to obtain Xi sub-attribute values; i, taking 1-M times; xi is an integer; arranging and combining the sub-attribute values of the attribute information of the M target objects to obtain
Figure BDA0003405275630000045
A combination of two; determining the mean value of the intermediate values corresponding to each combination; performing parameter fitting based on mean value of intermediate values to determine linear fitting modelThe parameter value of type (v).
In an optional manner, the number of target objects corresponding to the sub-attribute values of the attribute information of any one target object is uniformly distributed.
In an alternative mode, the server may input the intermediate value of each target object into the linear fitting model, and determine the access frequency of each target object within a preset time period.
In an alternative manner, the prediction unit may construct a triplet of the repeated access frequency, the time interval between the first access and the latest access, and the time interval between the first access and the current time of any target object; inputting the triple of each target object into a parameter prediction model, and acquiring a middle value of the access frequency of each target object in a preset time period; the parameter presetting model is a BG-NBD model.
In an optional mode, the BG-NBD model predicts based on the distribution of the time intervals at which each target object visits, the distribution of the probability that each target object does not visit, and the triples of each target object, and determines the median of the visit frequency of each target object within a preset time period.
In an alternative approach, the parameters of the linear fit model may be determined by:
the determining unit inputs the attribute information of the j target objects into a pre-trained linear fitting model, and determines the parameter preset values of 1 group of linear fitting models; j is changed to 1-M; and screening the parameter preset values of the multiple groups of linear fitting models based on the screening rule to determine the parameters of the linear models.
In an alternative approach, the screening rule is determined by the following formula:
Figure BDA0003405275630000051
Figure BDA0003405275630000052
wherein, adjusted R2Indicating a correction decision coefficient; r2Indicating the decision coefficient; n indicates the total number of collected data of the target object, k indicates the number of attribute information, yiAn intermediate value indicating the target object i,
Figure BDA0003405275630000053
the results of the operation of the linear fit model are indicated,
Figure BDA0003405275630000054
indicating the mean of the median values.
In an alternative manner, the attribute information of the target object includes one or more of the following: the business information of the target object, the service life of the target object, the registered capital of the target object, the change condition of the target object and the financial condition of the target object.
In an alternative approach, the linear fit model is an LR model.
In a third aspect, the present application provides a computing device comprising: a memory and a processor; a memory for storing program instructions; a processor for calling the program instructions stored in the memory and executing the method of the first aspect according to the obtained program.
In a fourth aspect, the present application provides a computer storage medium storing computer-executable instructions for performing the method of the first aspect.
For technical effects that can be achieved by the second aspect to the fourth aspect, please refer to a description of the technical effects that can be achieved by a corresponding possible design scheme in the first aspect, and the description of the technical effects is not repeated herein.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic view of an application scenario of prediction of access frequency according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a method for predicting access frequency according to an embodiment of the present disclosure;
FIG. 3 is a diagram illustrating an attribute information packet provided by an embodiment of the present application;
fig. 4 is a flowchart illustrating a method for predicting access frequency according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an access frequency prediction apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a computing device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.
It should be noted that the terms "first," "second," and the like in this application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
As described in the background art, the method calculates the active days, retention rate and the like of each user in the last 3, 7, 15 and 30 days, and the collection of the loan habit information of the user is limited, limited by a time window and low in efficiency; in addition, in consideration of the industry of enterprises and the historical transaction conditions with banks, some rules are manually set or decision tree models are combined to divide customer groups, the division of the method has the disadvantages that only the linear division of a multivariable plane is considered when a plurality of variables are collected, the manually defined rules may need to be modified frequently, and once the variables become more, the arrangement and combination of the rules can make the calculation statistics become inefficient; in the manner of dividing the customer group, the final output loss prediction result usually represents all individuals divided in the same group by the same value, the prediction value is inaccurate, and a large error occurs when the prediction time is slightly prolonged. In addition, a model BG-NBD model based on statistical hypothesis is used for predicting the attrition probability of the user, and the method can give the current activity probability and the number of times of repurchase of the user at a given time in the future based on the purchase frequency and the number of days of the user. But the considered factor is too single, and the accuracy is low without combining the attributes of the client and the service attribute characteristics.
The following describes the procedure of predicting the access frequency. In the following embodiments of the present application, "and/or" describes an association relationship of associated objects, indicating that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple. The singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, such as "one or more", unless the context clearly indicates otherwise. And, unless stated to the contrary, the embodiments of the present application refer to the ordinal numbers "first", "second", etc., for distinguishing a plurality of objects, and do not limit the sequence, timing, priority, or importance of the plurality of objects. For example, the first task execution device and the second task execution device are only for distinguishing different task execution devices, and do not indicate a difference in priority, degree of importance, or the like between the two task execution devices.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
Fig. 1 shows a prediction scenario of access frequency, which includes a server and N target objects, where N is a positive integer, and the number of the target objects is not specifically limited in actual application, and is only described as an example herein. The target object may be understood as a customer of a bank, a customer of a supermarket, a customer of an online service platform, and the like, and the target object may be an individual or a business, and the present application is not particularly limited herein. After data of a target object (for example, the number of visits, the number of browsing times, the number of sufficient purchases, and the like) is acquired, the acquired data of the target object is transmitted to a server, and the server predicts the visit frequency.
Fig. 2 is a flowchart illustrating a method for predicting access frequency according to an embodiment of the present application, where the method is applicable to a computing device, such as a CPU, a GPU, a server, and the like, and the present application is not limited in detail herein. The server is only used as an example, and the following steps can be specifically executed:
step 201, a server may obtain collected data, where the collected data includes: the repeated access frequency of each target object, the time interval of the first access of each target object from the latest access and the time interval of the first access of each target object from the current time.
It should be noted that the acquired data acquired by the server may be from off-line or on-line, or may be transmitted by other devices or stored in history, and the present application is not limited in particular herein. Taking the target object as a client of the bank as an example, the repeated access frequency of the target object can be understood as the total number of times that the client of the bank accesses the bank offline (physical store for bank transaction) or online (bank software application for transaction or telephone consultation with bank service staff for transaction) minus 1. For clarity, the time interval between the first visit of the target object and the last visit is illustrated by the following example, assuming that the time for the customer a of the bank to visit the bank for the first time (online or offline) is 11/05: 56 in 2020, and the time for the customer a to visit the bank for the last time is 11/12/11/2021/10/06: 57 in 2021, the time interval between the first visit of the target object and the last visit is 366 days, 1/1 second. For clarity, the time interval between the first visit of the target object and the current time can be illustrated by the following example, assuming that the first visit (online or offline) of the bank by customer a is 10:05:56 at 11/2020/11/12/10: 24:56 at 2021/11/12, the time interval between the first visit of the target object and the last visit is 19 minutes in 366 days.
It is assumed that the repeated access frequency of the target object is indicated by F, the time interval between the first access of the target object and the latest access is indicated by R, and the time interval between the first access of the target object and the current time is indicated by T. For each target object, a (F, R, T) triplet may be constructed, and the triplets for each target object are stored in a data table, where the target object may be indicated by an index, e.g., 0001 for target object 1.
Step 202, the server inputs the acquired data into the parameter prediction model, and obtains the intermediate value of the access frequency of each target object in a preset time period.
It should be noted that the parametric prediction model may be determined by machine learning, by a data statistics algorithm, etc., and the application is not limited in detail herein, and in an alternative embodiment, the parametric prediction model may be a BG-NBD model. The server can construct triples of repeated access frequency of any target object, a time interval of the last access of the first access distance and a time interval of the current time of the first access distance; and inputting the triple of each target object into the parameter prediction model, and acquiring the intermediate value of the access frequency of each target object in a preset time period.
In an alternative mode, the BG-NBD model may predict the distribution of the time intervals in which each target object visits, the distribution of the probability that each target object does not visit, and the triples of each target object, and determine the median of the visit frequency of each target object within a preset time period. The method comprises the following specific steps:
when processing data using the BG-NBD model, the following prediction rules for frequency of access may be followed:
1. the time interval delta of each target object for bank access obeys an exponential distribution (delta-exponential distribution (Exp (lambda)) with a parameter lambda;
2. the distribution of the access time intervals between each target object is different, and the lambda obeys the Gamma distribution (lambda-Gamma (r, alpha)) with the parameter r and alpha;
3. after each visit, the probability that the target object is not visited any more at all obeys the geometric distribution (P-geometry/negative binary (P)) with the parameter P;
4. the probability that each target object is not visited any more is different, and p obeys Beta distribution with parameters a and b (p-Beta (a and b)); the parameters λ and p are independent of each other.
From the above assumptions, the number of visits to the target object i in the next 12 months, i.e. the intermediate value f-pre, can be made by means of the results of the BG-NBD model.
Figure BDA0003405275630000101
Wherein the content of the first and second substances,2F1is a Gaussian hypergeometric equation, a, b, alpha, r are estimates of the parameters of the above assumptions, Fi,Ti,RiA triplet as defined above.
And step 203, the server performs linear fitting based on the attribute information of the target object and each intermediate value, and determines the access frequency of each target object in a preset time period.
The method includes the steps that the parameter prediction model is used for predicting collected data of each target object to determine an intermediate value of access frequency of each target object in a preset time period, the access frequency of each target object predicted based on the intermediate value is not accurate enough, accuracy is not enough, attribute information of the target object is introduced, the attribute information can be industries, business conditions and the like of the target object, and the method is not particularly limited. According to the method, the attribute information of the target object is fitted with the intermediate value, and the access frequency of the target object in the preset time period is determined.
In an alternative embodiment, the attribute information of the target object includes one or more of the following: the business information of the target object, the service life of the target object, the registered capital of the target object, the change condition of the target object and the financial condition of the target object. In practical application, the attribute information of the target object may further include: the age of the target legal person, the number of bank products held, the amount of credit, the amount of loan, the frequency of loan, etc., and the application is not limited in detail herein.
Assuming that the attribute information of the target object includes M; the server can classify the attribute information of the ith target object to obtain Xi sub-attribute values; i, taking 1-M times; xi is an integer; since the attribute information of the target object may include a plurality of attribute information, each attribute information may be subjected to quantization processing, such as defining V1 as the legal age of a business, grouping the legal ages belonging to the [17, 29] section into one group, and resetting the ages of all target objects belonging to the group to 1. Assuming that the attribute information V1 is the corporate age of a business, the corporate age may be grouped into a group of [17, 29] intervals, and the ages of all target objects belonging to the group are reset to 1; dividing the range of the legal age belonging to the (29, 34) section into a group and setting the ages of all the target objects belonging to the group as 2 again, dividing the range of the legal age belonging to the (34, 40) section into a group and setting the ages of all the target objects belonging to the group as 3 again, dividing the range of the legal age belonging to the (40, 50) section into a group and setting the ages of all the target objects belonging to the group as 4 again, so that the grouping can ensure that the grouping has strong linear correlation with the intermediate value of each group of target objects, making the number distribution of the target objects corresponding to the sub-attribute value of the attribute information of any one target object as uniform as possible, i.e. ensuring that the number of people in each group is as uniform as possible, so that the target objects are not stacked in a large number in a group, and avoiding any distinction of the sub-attribute values obtained by the grouping, in fig. 3, the horizontal axis value is the sub-attribute value of each group after the grouping, and the broken line is the first quartile value of the intermediate value in each group (for example, when the sub-attribute value is 1, sequentially ordering the intermediate values corresponding to all the target objects under the sub-attribute value from small to large, and taking the intermediate value corresponding to the target object at 1/4 as a first four-quantile value (corresponding to a right ordinate), wherein the polylines are in positive correlation; the histogram is the ratio of each group of target objects to all target objects (corresponding to the left ordinate), and the target objects of each group are relatively balanced.
Table 1 below shows intermediate values corresponding to target objects belonging to different combinations, for example, M is 4, each attribute information is indicated by V1, V2, V3, and V4, respectively, and the target object 0001 has a sub-attribute value of 1 at V1, a sub-attribute value of 2 at V2, a sub-attribute value of 2 at V3, and a sub-attribute value of 4 at V4, which are not described herein.
TABLE 1
Index V1 V2 V3 V4 Median value
00001 1 2 2 4 26
00002 2 3 1 1 20
00003 2 3 1 1 15
00004 2 1 2 1 0
00005 2 3 1 1 8
Then, the sub-attribute values of the attribute information of the M target objects can be arranged and combined to obtain
Figure BDA0003405275630000121
A combination of two; determining the mean value of the intermediate values corresponding to each combination; and performing parameter fitting based on the mean value of the intermediate values, and determining the parameter value of the linear fitting model. For example, the attribute information of the target objects includes 4, the server may perform classification processing on the attribute information of the 1 st target object to obtain 3 sub-attribute values, perform classification processing on the attribute information of the 2 nd target object to obtain 4 sub-attribute values, perform classification processing on the attribute information of the 3 rd target object to obtain 2 sub-attribute values, perform classification processing on the attribute information of the 4 th target object to obtain 3 sub-attribute values, and obtain 72 combinations (3 × 4 × 2 × 3) through permutation and combination.
Assuming that there are 4 pieces of attribute information, assuming that V1 is divided into 4 groups, V2 is divided into 3 groups, V3 is divided into 3 groups, and V4 is divided into 4 groups, referring to table 2 below, V1 is 1, V2 is 1, V3 is 1, V4 is 1, and the median Y is Y1 (Y1 is determined by adding and averaging the median values of all target objects whose sub-attribute values are 1 at V1, 1 at V2, 1 at V3, and 1 at V4).
TABLE 2
Combination number V1 V2 V3 V4 y
1 1 1 1 1 Y1
2 1 2 1 1 Y2
3 1 2 1 2 Y3
Y4
144 4 3 3 4 Y144
After the mean value of the intermediate values corresponding to each combination is determined, the mean value may be used as an output result of a linear fitting model, and a parameter value of the linear fitting model is determined through least square or the like, where the linear fitting model may be an LR model or another model, and the present application is not limited specifically herein. The parameters of the LR model can be determined, for example, by the following equation 1.
Figure BDA0003405275630000131
Wherein M indicates the number of attribute information of the target object, βiAs a model parameter, ViThe number of M is the attribute information, and the number of the corresponding LR model parameters is the attribute information.
In an alternative approach, the parameters of the linear fit model may be determined by: the server inputs the attribute information of j target objects into a pre-trained linear fitting model, and the parameter preset values of 1 group of linear fitting models are determined; j is changed to 1-M; and screening the parameter preset values of the multiple groups of linear fitting models based on the screening rule to determine the parameters of the linear models. In addition, when the preset values of the parameters of the linear fitting model are determined, all the attribute information may be substituted into the pre-trained linear fitting model to be determined, or some attribute information may be substituted to be determined, for example, the attribute information includes 3, 3 pieces of attribute information may be substituted into the pre-trained linear fitting model to determine the preset values of the parameters of 1 group of linear fitting models, or only 1 group of attribute information may be substituted (which group of attribute information is specifically substituted is not limited herein, predicted values of the parameters of different linear fitting models may be obtained by substituting different pieces of attribute information, and the preset values of the parameters of 3 groups of linear fitting models may be obtained by substituting 3 pieces of attribute information, respectively) to determine the preset values of the parameters of 1 group of linear fitting models.
In practical application, the server can also sequentially reduce one attribute information fitting LR model when determining the preset value of the parameters of the linear fitting model; if the attribute information comprises 3, all the 3 attribute information can be input into the pre-trained linear fitting model to determine the preset values of a group of model parameters, 1 attribute information is reduced, 2 attribute information is input into the pre-trained linear fitting model to determine the preset values of a group of model parameters, then 1 attribute information can be reduced, and 1 attribute information is input into the pre-trained linear fitting model to determine the preset values of a group of model parameters. Or selecting useful variables according to experience, fitting an LR model, and recording the grouping rule and the model parameter beta of each fitting.
The server selects the optimal model, i.e. the optimal sub-attribute information combination, the optimal grouping and the model parameter betaoptAdousted R may be used2Indexes, the number of model variables, namely the influence of the degree of freedom on the model, is fully considered; the higher the value of the adjusted R2 index is, the better the model is; the screening rule can be determined by the following formula when screening the optimal model, specifically as follows:
Figure BDA0003405275630000141
Figure BDA0003405275630000142
wherein, adjusted R2Indicating a correction decision coefficient; r2Indicating the decision coefficient; n indicates the total number of collected data of the target object, k indicates the number of attribute information, yiAn intermediate value indicating the target object i,
Figure BDA0003405275630000143
the results of the operation of the linear fit model are indicated,
Figure BDA0003405275630000144
indicating the mean of the median values.
The server can input the intermediate value of each target object into the linear fitting parameter, and determine the access frequency of each target object in a preset time period. Selecting an optimal model according to the adjusted R2 index, namely determining attribute information and a coefficient betaoptAnd a grouping rule, wherein the acquired data and the attribute information of the target object are input into an LR model for calculation to obtain the access frequency y of the target object j in a preset time periodj:
Figure BDA0003405275630000145
Wherein N is the number of variables that determine the optimal model. And for the new target object, determining the access frequency in the preset time period according to the attribute characteristics and the service characteristics of the new target object.
In summary, it can be seen that the prediction process of the access frequency provided by the present application may be as shown in fig. 4, where a software application or an applet collects target object data, a customer service call collects the target object data, a service node collects the target object data, a target object index and a ternary table corresponding to a target object are created according to the collected data, then a middle value of each target object is determined according to a BG-NBD model, an LR model is trained according to attribute information of the target object, a parameter of the LR model is determined, and the access frequency of the target object in a preset time period is determined through calculation.
Based on the same concept, an embodiment of the present application provides an apparatus for predicting access frequency, as shown in fig. 5, including: an acquisition unit 51, a prediction unit 52, and a determination unit 53.
The obtaining unit 51 is configured to obtain collected data, where the collected data includes: the repeated access frequency of each target object, the time interval of the first access of each target object from the latest access and the time interval of the first access of each target object from the current time; the prediction unit 52 is configured to input the acquired data into the parameter prediction model, and obtain an intermediate value of the access frequency of each target object within a preset time period; and a determining unit 53, configured to perform linear fitting based on the attribute information of the target object and each intermediate value, and determine an access frequency of each target object within a preset time period.
The method includes the steps that the parameter prediction model is used for predicting collected data of each target object to determine an intermediate value of access frequency of each target object in a preset time period, the access frequency of each target object predicted based on the intermediate value is not accurate enough, accuracy is not enough, attribute information of the target object is introduced, the attribute information can be industries, business conditions and the like of the target object, and the method is not particularly limited. According to the method, the attribute information of the target object is fitted with the intermediate value, and the access frequency of the target object in the preset time period is determined.
In an optional manner, the attribute information of the target object includes M; the determining unit 53 may classify the attribute information of the ith target object to obtain Xi sub-attribute values; i, taking 1-M times; xi is an integer; arranging and combining the sub-attribute values of the attribute information of the M target objects to obtain
Figure BDA0003405275630000151
A combination of two; determining the mean value of the intermediate values corresponding to each combination; and performing parameter fitting based on the mean value of the intermediate values, and determining the parameter value of the linear fitting model.
By classifying the attribute information of the target object, the server can be ensured to better fit the intermediate value, the accuracy of the prediction of the access frequency is improved, and the efficiency of the prediction of the access frequency can be improved.
In an optional manner, the number of target objects corresponding to the sub-attribute values of the attribute information of any one target object is subject to uniform distribution.
By classifying the attribute information of the target object in the mode, the efficiency of predicting the access frequency can be ensured, and the efficiency of predicting the access frequency can be improved.
In an alternative manner, the determining unit 53 may input the intermediate value of each target object to the linear fitting model, and determine the access frequency of each target object within a preset time period. The method can improve the accuracy of the prediction of the access frequency and can improve the efficiency of the prediction of the access frequency.
In an alternative manner, the prediction unit 52 may construct a triplet of the repeated access frequency, the time interval between the first access and the latest access, and the time interval between the first access and the current time of any target object; inputting the triple of each target object into a parameter prediction model, and acquiring a middle value of the access frequency of each target object in a preset time period; the parameter presetting model is a BG-NBD model.
In an optional mode, the BG-NBD model predicts based on the distribution of the time intervals at which each target object visits, the distribution of the probability that each target object does not visit, and the triples of each target object, and determines the median of the visit frequency of each target object within a preset time period.
In an alternative approach, the parameters of the linear fit model may be determined by:
the determining unit 53 inputs the attribute information of the j target objects into the pre-trained linear fitting model, and determines the parameter preset values of 1 group of linear fitting models; j is changed to 1-M; and screening the parameter preset values of the multiple groups of linear fitting models based on the screening rule to determine the parameters of the linear models.
In an alternative approach, the screening rule is determined by the following formula:
Figure BDA0003405275630000161
Figure BDA0003405275630000162
wherein, adjusted R2Indicating a correction decision coefficient; r2Indicating the decision coefficient; n indicates the total number of collected data of the target object, k indicates the number of attribute information, yiAn intermediate value indicating the target object i,
Figure BDA0003405275630000163
the results of the operation of the linear fit model are indicated,
Figure BDA0003405275630000164
indicating the mean of the median values.
In an alternative manner, the attribute information of the target object includes one or more of the following: the business information of the target object, the service life of the target object, the registered capital of the target object, the change condition of the target object and the financial condition of the target object.
In an alternative approach, the linear fit model is an LR model.
After the method and the apparatus for predicting access frequency in the exemplary embodiment of the present application are introduced, a computing device in another exemplary embodiment of the present application is introduced next.
As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
In some possible implementations, a computing device according to the present application may include at least one processor, and at least one memory. Wherein the memory stores a computer program which, when executed by the processor, causes the processor to perform the steps of the method for predicting access frequency according to various exemplary embodiments of the present application described above in the present specification. For example, the processor may perform steps 201-203 as shown in fig. 2.
The computing device 130 according to this embodiment of the present application is described below with reference to fig. 6. The computing device 130 shown in fig. 6 is only an example and should not bring any limitations to the functionality or scope of use of the embodiments of the present application. As shown in fig. 6, the computing device 130 is embodied in the form of a general purpose smart terminal. Components of computing device 130 may include, but are not limited to: the at least one processor 131, the at least one memory 132, and a bus 133 that connects the various system components (including the memory 132 and the processor 131).
Bus 133 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures. The memory 132 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)1321 and/or cache memory 1322, and may further include Read Only Memory (ROM) 1323. Memory 132 may also include a program/utility 1325 having a set (at least one) of program modules 1324, such program modules 1324 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Computing device 130 may also communicate with one or more external devices 134 (e.g., keyboard, pointing device, etc.) and/or any device (e.g., router, modem, etc.) that enables computing device 130 to communicate with one or more other intelligent terminals. Such communication may occur via input/output (I/O) interfaces 135. Also, computing device 130 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via network adapter 136. As shown, network adapter 136 communicates with other modules for computing device 130 over bus 133. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with computing device 130, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
In some possible embodiments, the aspects of the transactional data backup method provided by the present application may also be implemented in the form of a program product including a computer program for causing a computer device to perform the steps of the method for predicting access frequency according to various exemplary embodiments of the present application described above in this specification when the program product is run on the computer device. For example, the processor may perform steps 201-203 as shown in fig. 2.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The program product for three-dimensional visual repositioning of embodiments of the present application may employ a portable compact disc read-only memory (CD-ROM) and include a computer program, and may be run on a smart terminal. The program product of the present application is not so limited, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A readable signal medium may include a propagated data signal with a readable computer program embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functions of two or more units described above may be embodied in one unit, according to embodiments of the application. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.
Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable access frequency predicting device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable access frequency predicting device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable access device to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable access device to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (11)

1. A method for predicting access frequency, comprising:
acquiring acquisition data, wherein the acquisition data comprises: the repeated access frequency of each target object, the time interval of the first access of each target object from the latest access and the time interval of the first access of each target object from the current time;
inputting the acquired data into a parameter prediction model, and acquiring a middle value of the access frequency of each target object in a preset time period;
and performing linear fitting on the basis of the attribute information of the target object and each intermediate value, and determining the access frequency of each target object in a preset time period.
2. The method according to claim 1, wherein the attribute information of the target object includes M; m is a positive integer; before determining the access frequency of each target object in a preset time period based on the linear fitting performed on the attribute information of the target object and each intermediate value, the method further includes:
classifying the attribute information of the ith target object to obtain XiA sub-attribute value; taking 1-M times for the i; said XiIs an integer;
arranging and combining the sub-attribute values of the attribute information of the M target objects to obtain
Figure FDA0003405275620000011
A combination of two;
determining the mean value of the intermediate values corresponding to each combination;
and performing parameter fitting based on the mean value of the intermediate values, and determining the parameter value of the linear fitting model.
3. The method according to claim 2, wherein the number of the target objects corresponding to the sub-attribute values of the attribute information of any one target object is uniformly distributed.
4. The method according to claim 2 or 3, wherein the determining the access frequency of each target object within a preset time period based on the attribute information of the target object and each intermediate value by performing linear fitting comprises:
and inputting the intermediate value of each target object into the linear fitting model, and determining the access frequency of each target object in a preset time period.
5. The method of claim 1, wherein inputting the acquired data into a parametric prediction model to obtain an intermediate value of the frequency of visits by each of the target objects within a predetermined time period comprises:
constructing triples of repeated access frequency of any target object, time interval of the first access from the latest access and time interval of the first access from the current time;
inputting the triple of each target object into a parameter prediction model, and acquiring a middle value of the access frequency of each target object in a preset time period; the parameter presetting model is a beta geometric BG-negative binomial distribution NBD model.
6. The method of claim 5, wherein the BG-NBD model determines the median of the frequency of visits by each target object within a predetermined time period based on a distribution of time intervals during which each target object visits, a distribution of probabilities of each target object not visiting, and a triplet of the each target object.
7. The method of claim 4, wherein the parameters of the linear fit model are determined by:
inputting the attribute information of j target objects into the pre-trained linear fitting model, and determining the parameter preset values of 1 group of linear fitting models; changing j to 1-M;
and screening the preset parameter values of the multiple groups of linear fitting models based on a screening rule to determine the parameters of the linear models.
8. The method of claim 7, wherein the filtering rule is determined by the following formula:
Figure FDA0003405275620000021
Figure FDA0003405275620000022
wherein, the adjusted R2Indicating a correction decision coefficient; the R is2Indicating the decision coefficient; n indicates the total number of collected data of the target object, k indicates the number of attribute information, yiAn intermediate value indicating the target object i,
Figure FDA0003405275620000031
the results of the operation of the linear fit model are indicated,
Figure FDA0003405275620000032
indicating the mean of the median values.
9. An apparatus for predicting access frequency, comprising:
an acquisition unit configured to acquire acquisition data, the acquisition data including: the repeated access frequency of each target object, the time interval of the first access of each target object from the latest access and the time interval of the first access of each target object from the current time;
the prediction unit is used for inputting the acquired data into a parameter prediction model and acquiring a middle value of the access frequency of each target object in a preset time period;
and the determining unit is used for performing linear fitting on the basis of the attribute information of the target object and each intermediate value and determining the access frequency of each target object in a preset time period.
10. A computing device, comprising: a memory and a processor;
a memory for storing program instructions;
a processor for calling program instructions stored in said memory to execute the method of any one of claims 1 to 8 in accordance with the obtained program.
11. A computer storage medium storing computer-executable instructions for performing the method of any one of claims 1-8.
CN202111509654.3A 2021-12-10 2021-12-10 Method and device for predicting access frequency Pending CN114170002A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111509654.3A CN114170002A (en) 2021-12-10 2021-12-10 Method and device for predicting access frequency
PCT/CN2022/120519 WO2023103527A1 (en) 2021-12-10 2022-09-22 Access frequency prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111509654.3A CN114170002A (en) 2021-12-10 2021-12-10 Method and device for predicting access frequency

Publications (1)

Publication Number Publication Date
CN114170002A true CN114170002A (en) 2022-03-11

Family

ID=80485558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111509654.3A Pending CN114170002A (en) 2021-12-10 2021-12-10 Method and device for predicting access frequency

Country Status (2)

Country Link
CN (1) CN114170002A (en)
WO (1) WO2023103527A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023103527A1 (en) * 2021-12-10 2023-06-15 深圳前海微众银行股份有限公司 Access frequency prediction method and device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117478600B (en) * 2023-12-28 2024-03-08 彩讯科技股份有限公司 Flow control method and system for serving high concurrency multi-center business center

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7877346B2 (en) * 2007-06-06 2011-01-25 Affinova, Inc. Method and system for predicting personal preferences
CN106709755A (en) * 2016-11-28 2017-05-24 加和(北京)信息科技有限公司 Method of predicting user frequency and apparatus thereof
CN110415002A (en) * 2019-07-31 2019-11-05 中国工商银行股份有限公司 Customer behavior prediction method and system
CN112633573B (en) * 2020-12-21 2022-04-01 北京达佳互联信息技术有限公司 Prediction method of active state and determination method of activity threshold
CN112819258A (en) * 2021-03-24 2021-05-18 中国工商银行股份有限公司 Bank branch to store customer quantity prediction method and device
CN114170002A (en) * 2021-12-10 2022-03-11 深圳前海微众银行股份有限公司 Method and device for predicting access frequency

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023103527A1 (en) * 2021-12-10 2023-06-15 深圳前海微众银行股份有限公司 Access frequency prediction method and device

Also Published As

Publication number Publication date
WO2023103527A1 (en) 2023-06-15

Similar Documents

Publication Publication Date Title
US20180203720A1 (en) Techniques to manage virtual classes for statistical tests
CN107590688A (en) The recognition methods of target customer and terminal device
CN107705149A (en) Data method for real-time monitoring, device, terminal device and storage medium
US20050197889A1 (en) Method and apparatus for comparison over time of prediction model characteristics
WO2023103527A1 (en) Access frequency prediction method and device
US11854022B2 (en) Proactively predicting transaction dates based on sparse transaction data
US11657302B2 (en) Model selection in a forecasting pipeline to optimize tradeoff between forecast accuracy and computational cost
CN111080360B (en) Behavior prediction method, model training method, device, server and storage medium
CN113283671B (en) Method and device for predicting replenishment quantity, computer equipment and storage medium
CN112085615A (en) Method and device for training graph neural network
CN112232833A (en) Lost member customer group data prediction method, model training method and model training device
CN111695938B (en) Product pushing method and system
CN114584601A (en) User loss identification and intervention method, system, terminal and medium
CN112766536A (en) Model training method, device and terminal for calculating road engineering labor unit price
CN113835947A (en) Method and system for determining abnormality reason based on abnormality identification result
CN117011063B (en) Customer transaction risk prediction processing method and device
CN113869992B (en) Artificial intelligence based product recommendation method and device, electronic equipment and medium
CN116167646A (en) Evaluation method, device, equipment and storage medium based on transaction algorithm
CN116911902A (en) Target recommendation method and device
CN117131965A (en) Data prediction method and device, computer storage medium and electronic equipment
CN116542706A (en) Marketing campaign result prediction method and device
CN112926803A (en) Client deposit loss condition prediction method and device based on LSTM network
CN114519307A (en) Information prediction method, device, system, storage medium and electronic equipment
CN117459576A (en) Data pushing method and device based on edge calculation and computer equipment
CN115345640A (en) Product demand prediction method, product demand prediction device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication