CN110889716A - Method and device for identifying potential registered user - Google Patents

Method and device for identifying potential registered user Download PDF

Info

Publication number
CN110889716A
CN110889716A CN201910935469.7A CN201910935469A CN110889716A CN 110889716 A CN110889716 A CN 110889716A CN 201910935469 A CN201910935469 A CN 201910935469A CN 110889716 A CN110889716 A CN 110889716A
Authority
CN
China
Prior art keywords
user
data
registered
samples
time period
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910935469.7A
Other languages
Chinese (zh)
Inventor
李勇
徐丰力
朴景华
卢中县
徐裕键
张良伦
金德鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Weituo Technology Co Ltd
Tsinghua University
Original Assignee
Hangzhou Weituo Technology Co Ltd
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Weituo Technology Co Ltd, Tsinghua University filed Critical Hangzhou Weituo Technology Co Ltd
Priority to CN201910935469.7A priority Critical patent/CN110889716A/en
Publication of CN110889716A publication Critical patent/CN110889716A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0609Buyer or seller confidence or verification

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment provides a method and a device for identifying potential registered users, wherein the method comprises the following steps: acquiring user portrait data and user behavior data of a first preset time period; performing feature extraction on the acquired user portrait data and user behavior data; inputting the extracted feature data into a preset machine learning classification model, obtaining the probability of converting the user into a registered user in a second preset time period, and outputting a result; and the classification model is obtained after training according to the sample data of the determined registration conversion result. The method not only analyzes the static data such as the user portrait, but also analyzes the dynamic data such as the user behavior data, and can comprehensively and objectively reflect the potential possibility that the user becomes a registered user. The extracted feature data are input into the machine learning classification model, the probability that the user is converted into the registered user is obtained, the prediction timeliness can be effectively improved, each user can be predicted, and the targeted marketing effect on the user is strong.

Description

Method and device for identifying potential registered user
Technical Field
The present invention relates to the field of data analysis, and in particular, to a method and an apparatus for identifying potential registered users.
Background
With the development of the internet, electronic commerce has penetrated into the aspects of life. The electronic commerce industry develops rapidly, and consumers have developed an online shopping consumption habit. However, with the rapid development of the e-commerce market, the cost of acquiring new customers is high and becomes a bottleneck limiting the development of the traditional e-commerce. The social e-commerce mode is propagated by depending on a social flow platform and an acquaintance network, the cost of the e-commerce platform for obtaining new customers can be effectively reduced, and the social e-commerce mode is an important future development direction of the e-commerce industry.
Social electricity merchants are an emerging field, and research on social electricity merchants is still in a qualitative and strategic stage, and quantitative and theoretical research on social electricity merchants is still in a very short state. In a social e-commerce scenario, there is a special user, called a registered user, in addition to the ordinary user. Registered users can share recommended merchandise, invite new users and obtain a certain monetary reward from them. Ordinary users can be converted into registered users through some series of modes such as paying certain fee and the like. The registered user has profound significance for selling goods and acquiring new customers on the E-commerce platform. Therefore, predicting whether a normal user will transform into a registered user in the future is a very valuable issue.
The current method mainly comprises the following steps: acquiring user offline data; fusing data of different data sources according to the identification codes to form an offline knowledge base; preprocessing off-line data such as normalization, discretization, attribute reduction and the like; extracting the characteristics of the offline data according to the customized label rule to construct a user basic label; carrying out weight and time attenuation factor processing on the label data, and establishing a user image off-line prediction model based on a quality assurance set QPS clustering algorithm; carrying out data clustering mining on the offline knowledge base by using a prediction model to obtain an electric commercial user portrait of the mobile terminal; and performing distributed processing on the online behavior data and fusing with an offline model.
The current methods have the following limitations: (1) the data used is only user portrait data, and no analysis is performed on user order data and the like; (2) the method has poor performance in the realization process; (3) the existing method is more prone to shunting users through certain logic on a transaction link instead of predicting each user, and the targeted marketing effect on the users is poor.
Disclosure of Invention
In order to solve the above problem, the present embodiment provides a method and an apparatus for identifying potential registered users.
In a first aspect, the present embodiment provides a method for identifying potential registered users, including: acquiring user portrait data and user behavior data of a first preset time period; performing feature extraction on the acquired user portrait data and user behavior data; inputting the extracted feature data into a preset machine learning classification model, obtaining the probability of converting the user into a registered user in a second preset time period, and outputting a result; and the classification model is obtained after training according to the sample data of the determined registration conversion result.
Further, the user behavior data includes at least: one of user order data, user click data, and user invited data.
Further, the feature extraction is performed on the acquired user portrait data and user behavior data, and the feature extraction includes: and determining a user behavior index value in a first preset time period, and taking the index value as a characteristic value of the user behavior data.
Further, before the obtaining of the user portrait data and the user behavior data of the first preset time period, the method further includes: obtaining a plurality of user samples with behavior records from historical data, and extracting the characteristics of user portrait data and user behavior data of all the users; taking the combination of the sample data and the characteristic data which are converted into the registered user as a positive sample, and taking the combination of the sample data and the characteristic data which are not converted into the registered user as a negative sample; and training the classification model by using the obtained positive samples and negative samples.
Further, after obtaining the probability that the user is converted into the registered user in the second preset time period, the method further includes: determining a user list capable of being converted into a registered user from all users according to the probability and a preset threshold; correspondingly, the output result specifically includes: outputting a user list which can be converted into a registered user; the preset threshold is determined according to the proportion of the positive samples to the negative samples.
Further, the training of the classification model includes the following steps, after the training of the plurality of positive samples and the plurality of negative samples includes: and verifying the classification model by using the verification set sample, adjusting the preset threshold according to a verification result, wherein evaluation indexes for adjusting the threshold comprise recall rate or evaluation precision.
Further, after outputting the result, the method further includes: every third preset time period, obtaining a plurality of user samples with behavior records from the latest historical data as updating samples, and performing feature extraction on user portrait data and user behavior data of all the updating samples; taking the combination of the update sample data and the feature data which are converted into the registered user as a positive update sample, and taking the combination of the update sample data and the feature data which are not converted into the registered user as a negative update sample; training and verifying the classification model by using the obtained multiple positive update samples and multiple negative update samples, and adjusting the preset threshold; wherein the third preset time period is greater than or equal to the second preset time period.
In a second aspect, the present embodiment provides an apparatus for identifying potential registered users, including: the data acquisition module is used for acquiring user portrait data and user behavior data in a first preset time period; the characteristic extraction module is used for extracting characteristics of the acquired user portrait data and the acquired user behavior data; the processing module is used for inputting the extracted feature data into a preset machine learning classification model, obtaining the probability that the user is converted into a registered user in a second preset time period, and outputting a result; and the classification model is obtained after training according to the sample data of the determined registration conversion result.
In a third aspect, the present embodiment provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the method for identifying a potential registered user according to the first aspect of the present invention are implemented.
In a fourth aspect, the present embodiments provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of identifying potential registered users of the first aspect of the present invention.
The method and the device for identifying potential registered users provided by the embodiment can acquire the user portrait data and the user behavior data in the first preset time period, analyze not only the static data such as user portrait, but also the dynamic data such as user behavior data, and comprehensively and objectively reflect the potential possibility that the user becomes the registered user. The extracted feature data are input into a preset machine learning classification model, the probability that the user is converted into the registered user is obtained, the prediction timeliness can be effectively improved, each user can be predicted, and the directional marketing effect on the user is strong.
Drawings
In order to more clearly illustrate the technical solutions in the present embodiment or the prior art, the drawings needed to be used in the description of the embodiment or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a method for identifying potential registered users according to the present embodiment;
fig. 2 is a diagram illustrating a relationship between an average number of user invitees and a conversion rate of a registered user according to the present embodiment;
FIG. 3 is a diagram illustrating the relationship between the purchase frequency of the user and the conversion rate of the registered user according to this embodiment;
fig. 4 is a diagram illustrating a relationship between the average number of user interactions and the conversion rate of registered users according to this embodiment;
FIG. 5 is a block diagram of an apparatus for identifying potential registered users according to the present embodiment;
fig. 6 is a schematic physical structure diagram of an electronic device provided in this embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present embodiments more clear, the technical solutions in the present embodiments will be described clearly and completely with reference to the drawings in the present embodiments, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be understood that although the terms "first", "second", etc. are used hereinafter to describe the preset time period, such information should not be limited to these terms, which are used only to distinguish one type of thing from another.
Fig. 1 is a flowchart of a method for identifying a potential registered user according to this embodiment, and as shown in fig. 1, this embodiment provides a method for identifying a potential registered user, including:
101, user portrait data and user behavior data of a first preset time period are obtained.
In 101, the first preset time period may be 1, 3, 7, 15, 30 days, such as acquiring user portrait data and user behavior data of the past 30 days. Meanwhile, the second preset time period may also be 1, 3, 7, 15, 30 days, and the two time periods are not limited to each other. The determination may be whether the common user will be converted into the registered user in the future 1, 3, 7, 15, 30 and the conversion probability thereof, and the specific time period is determined according to the sample data of the training. For example, the training samples are user portrait data and user behavior data of 30 days, the determined registered conversion result is a conversion result within 7 days in the future, the acquired first preset time period data is data of the past 30 days, and the predicted probability of the second time period is correspondingly the probability of 7 days in the future. In addition, the first time period and the second time period may be plural, for example, data of past 30 days is acquired, and the probabilities of 7 days and 15 days in the future of the user are predicted, respectively, or data of past 15 days and 30 days is acquired, and the probabilities of 7 days in the future of the user are predicted, respectively.
The user image data is static data of the user and may include a user ID, a registration time of the user, and a time when the user becomes a general user. The user behavior data is dynamic data of the user, and mainly comprises data of purchasing, clicking and browsing and interacting with other users.
And 102, extracting the characteristics of the acquired user portrait data and user behavior data.
At 102, the user portrait data and the user behavior data are large in data flow, and an appropriate index is selected from the data flow as the feature data.
103, inputting the extracted feature data into a preset machine learning classification model, obtaining the probability of converting the user into a registered user in a second preset time period, and outputting a result; and the machine learning classification model is obtained by training according to the samples of the determined registration conversion result and the corresponding characteristic data.
In 103, the preset machine learning classification model is obtained by training samples with determined registered transformation results and corresponding feature data, where the sample data is user sample data which has been transformed in advance and has been extracted to obtain feature data according to the portrait data and the behavior data. Preferably, the sample data is obtained from user portrait data and user behavior data in a first preset time period, and the determined registered conversion result is a known conversion result in a second preset time period.
After the machine learning classification model is established, a large number of sample users are trained to obtain a preset machine learning classification model, and for subsequent users to be identified, the acquired feature data is input into the machine learning classification model, so that the corresponding probability which can be converted into registered users can be quickly and accurately obtained. For example, the machine learning model may select a random forest model, a neural network, a SVM, a logistic regression model, a decision tree, and the like. And if the forest model is a random forest model, selecting a Gini index by the information gain function.
The method for identifying the potential registered user provided by the embodiment acquires the user portrait data and the user behavior data in the first preset time period, analyzes not only the static data such as the user portrait, but also the dynamic data such as the user behavior data, and can comprehensively and objectively reflect the potential possibility that the user becomes the registered user. The extracted feature data are input into a preset machine learning classification model, the probability that the user is converted into the registered user is obtained, the prediction timeliness can be effectively improved, each user can be predicted, and the directional marketing effect on the user is strong.
Based on the content of the foregoing embodiment, as an optional embodiment, the user behavior data at least includes: one of user order data, user click data, and user invited data.
The user order data is the order related data which is successful by the user, and comprises order creating time and the order quantity of the first preset time period. Lo C, Frankowsi D, Leskovec J et al found that in the Pinterest platform, the save and click behavior was a long-term signal of purchase, and the search and focus was a short-term signal of purchase. By the user saving, clicking, searching and focusing on the social interaction information, the purchasing behavior of the user can be predicted up to 28 days ago.
Rejection of evasion effect: by observing the relationship between the average number of times of invitations of the user and the conversion rate of the registered user, the conversion rate of the registered user is increased along with the increase of the average number of times of invitations, and a remarkable positive correlation is shown. Fig. 2 is a graph of the relationship between the average number of user invites and the conversion rate of the registered user according to this embodiment, and as shown in fig. 2, the result shows that as the number of invites of the registered user increases, not only the absolute conversion number of the general user increases, but also the conversion efficiency thereof significantly increases. Therefore, the conversion probability of the ordinary users subordinate to the registered user whose average number of offers is high is significantly higher. The empirical result of the data analysis is consistent with the prediction of the evasion rejection effect, so the user invited data is selected in the embodiment, and the average number of times of invitation can be used as a behavior index.
Psychological account effect: by observing the relationship between the purchase frequency of the general user and the conversion rate of the registered user, it can be found that the conversion rate of the registered user is increased along with the increase of the purchase frequency, and fig. 3 is a relationship diagram between the purchase frequency of the user and the conversion rate of the registered user provided by this embodiment, as shown in fig. 3. Due to the mechanism of self-purchase discount, users who purchase frequently are converted into registered users with higher expected income, the conversion cost is lower than the sinking cost of the users who purchase frequently, and the conversion potential is higher under the prediction of the mechanism of physical account effect and is consistent with the result obtained by data analysis. Therefore, the invention selects the user order data and the user click data, and the number of times of purchase can be used as a behavior index.
Social enabling effect: by observing the relationship between the number of interactions between the registered user and the normal user and the conversion rate of the registered user, it is found that as the number of interactions between the registered user and the normal user increases, the conversion rate of the registered user tends to increase in the previous period, and fig. 4 is a relationship diagram between the average number of interactions between the user and the conversion rate of the registered user provided in this embodiment, as shown in fig. 4. The frequency of interactions characterizes how strong or weak the social relationship between registered users and ordinary users, so the observations are consistent with the prediction of social enabling effect, i.e., ordinary users are more inclined to accept offers from friends. The number of interactions between the registered user and the ordinary user can also be selected as the behavior index. See table 1 for details:
TABLE 1
Figure BDA0002221484250000071
In this embodiment, the user behavior data includes user order data, user click data, and user invited data, and can objectively reflect that the user is converted into the characteristic attribute of the registered user.
Based on the content of the foregoing embodiment, as an optional embodiment, the performing feature extraction on the acquired user portrait data and user behavior data includes: and determining a user behavior index value in a first preset time period according to the user order data, the user click data and/or the user invited data, and taking the index value as a characteristic value of the user behavior data.
The user behavior index value reflects the number of times of the user behavior. For example, the behavior index value of the user order data in the first preset time period includes the amount of orders in the first preset time period.
Further, determining the index value of the user behavior in the first preset time period includes determining an index value corresponding to the feature data according to user order data, user click data or user invited data, respectively, where the index value includes: the number of times that the user browses the details of the store commodities in the first preset time period, the order number of the user, the number of times that the user is invited, the time sequence number of the user becoming a common user, the time length from the time that the user becomes the common user to the time that the user receives the invitation at the last time, the order number of the store and the conversion rate of the registered user of the store are obtained.
In order to screen out valuable behavior characteristics from the dynamic behavior and static attribute data of the user, the embodiment combines the existing social behavior and economic behavior theory to perform behavior characteristic engineering design, and quantitatively analyzes the prediction capability of each behavior characteristic in a data-driven manner. The embodiment mainly checks the application of the avoidance rejection effect, the psychological account effect and the social enabling effect in predicting the transformation behaviors of the common users. The avoidance refusing effect explains that the user can accept the invitation because the face of the friend is avoided being hurt, and the multiple invitations are more difficult to reject; the psychological account effect shows that the user tends to comprehensively consider the invested sinking cost and the expected profit in decision making; social enabling effects predict that users are more inclined to accept recommendations from social friends due to their trust and preferences. For convenience of explanation, the scheme first defines the index value as follows.
The first preset time period may be a series of time periods, for example, 1, 3, 7, 30 days, according to the above listed behavior index values of the user. The present embodiment picks out the following 13 features: the merchant detailed exposure times (i.e. the times of the user browsing the details of the store goods) of the previous 1, 3, 7 and 30 days of the user, the page recruitment exposure times (i.e. the times of the user being invited) of the registered user of the user in the previous 30 days, the order number (i.e. the number of orders of the store) of the store to which the user belongs, the order number of the user in the previous 1, 3, 7 and 30 days, the user is the second ordinary user of the store (i.e. the time sequence number of the user becoming the ordinary user), the time of the user receiving the invitation last time is subtracted by the time becoming the ordinary user, the conversion percentage (i.e. the conversion rate of the registered user of the store) of the user in the registered store.
TABLE 2
Figure BDA0002221484250000081
Figure BDA0002221484250000091
In this embodiment, the index value includes: the number of times that the user browses the details of the store commodities, the number of orders of the user, the number of times that the user is invited, the time sequence number of the user becoming a common user, the time from the time that the user becomes the common user to the time that the user receives the invitation at the last time, the number of orders of the store and the conversion rate of the registered user of the store can be objectively reflected, and the characteristic attribute of the user converted into the registered user can be objectively reflected.
Based on the content of the foregoing embodiment, as an optional embodiment, before the obtaining the user portrait data and the user behavior data in the first preset time period, the method further includes: obtaining a plurality of user samples with behavior records from historical data, and extracting the characteristics of user portrait data and user behavior data of all the users; taking the combination of the sample data and the characteristic data which are converted into the registered user as a positive sample, and taking the combination of the sample data and the characteristic data which are not converted into the registered user as a negative sample; and training the classification model by using the obtained positive samples and negative samples.
The truth values of whether a certain user turns into a registered user 1, 3, 7, 15 and 30 days in the future at a specific time point are generated from the historical data. Specifically, if the user has a behavior record in the past 30 days and is transformed into a registered user in the future 1, 3, 7, 15, 30 days, then it is a positive sample; negative examples are given if the user has a behavioral record in the past 30 days and does not translate to a registered user in the future 1, 3, 7, 15, 30 days. And finally, splicing the features and the labels together to manufacture a training set sample for training.
In the embodiment, the combination of the sample data and the feature data which are converted into the registered user is used as a positive sample, and the combination of the sample data and the feature data which are not converted into the registered user is used as a negative sample, so that a preset machine learning classification model is obtained after training, and the accuracy of identifying potential registered users is facilitated.
Based on the content of the foregoing embodiment, as an optional embodiment, after obtaining the probability that the user is converted into the registered user in the second preset time period, the method further includes: determining a user list capable of being converted into a registered user from all users according to the probability and a preset threshold; correspondingly, the output result specifically includes: outputting a user list which can be converted into a registered user; the preset threshold is determined according to the proportion of the positive sample to the negative sample.
Due to the variability of the service requirements, the embodiment supports two modes of list output and probability output. The list output is a list of users with high conversion potential after threshold judgment, and the probability output is all users uid and conversion probability thereof which are actively recorded in the past without threshold judgment.
Since the number of people who are converted into registered users is far less than the number of people who are not converted into registered users, a problem of imbalance in positive and negative sample categories occurs. In order to solve the problem, the scheme introduces a 'threshold value moving' method, namely learning is directly carried out on the basis of an original training set, but when a trained classifier is used for prediction, the following formula is embedded into the decision process of the classifier:
Figure BDA0002221484250000101
wherein y represents a predetermined threshold value, m+Represents the number of positive samples, m-Representing the number of negative examples. In a specific implementation process, the output probability can be compared with a preset threshold, and if the output probability is greater than the threshold, the output probability is output to a potential registered user list.
In this embodiment, according to the probability and a preset threshold, a user list capable of being converted into a registered user is determined from all users, and the user list capable of being converted into the registered user is output, so that the output result is more intuitive. The preset threshold value is determined according to the proportion of the positive sample and the negative sample, so that the true situation of a potential registered user can be objectively reflected.
Based on the content of the foregoing embodiment, as an optional embodiment, the training of the classification model includes: and verifying the classification model by using the verification set sample, adjusting the preset threshold according to a verification result, wherein evaluation indexes for adjusting the threshold comprise recall rate or evaluation precision.
And splicing the features and the labels together while obtaining the training sample to manufacture a training set sample and a verification set sample. And the training set sample is used for training the model, after the training is finished, the model is verified by adopting the generated verification set sample, and the threshold value is adjusted above and below the preset threshold value y. The evaluation indexes for adjusting the threshold include recall rate, evaluation accuracy, and the like. In this embodiment, the preset threshold is adjusted according to the verification result, so as to further objectively reflect the proportion of the potential registered users in all the user terminals.
Based on the content of the foregoing embodiment, as an optional embodiment, after outputting the result, the method further includes: every third preset time period, obtaining a plurality of user samples with behavior records from the latest historical data as updating samples, and performing feature extraction on user portrait data and user behavior data of all the updating samples; taking the combination of the update sample data and the feature data which are converted into the registered user as a positive update sample, and taking the combination of the sample data and the feature data which are not converted into the registered user as a negative update sample; training and verifying the classification model by using the obtained multiple positive update samples and multiple negative update samples, and adjusting the preset threshold; wherein the third preset time period is greater than or equal to the second preset time period.
The updating sample is used for updating the preset threshold, in the specific application of the model, according to the service requirement, prediction decision can be made every day, and the user set for prediction in the embodiment is all users with active records in the past thirty days. And selecting partial samples with prediction results from the latest daily prediction samples as a training set and a verification set every a third preset time period, such as 30 days (namely every month), training and verifying the classification model again, and adjusting a preset threshold. The adjusted threshold is used by the classification model in subsequent potential user identification, for example, the potential user identification is performed every day, data of the past 30 days (a first preset time period) is acquired, the probability of the future 15 days (a second preset time period) is predicted, and after the prediction result comes out, the threshold needs to be updated, so that the third preset time period needs to be greater than or equal to the second time period. Preferably, the preset threshold value is adjusted every month according to the user data with the latest identification result, and the new preset threshold value is used for outputting the user list, so that the prediction model and the prediction threshold value can be adjusted according to the real-time change of the use scene in the prediction process, and the real-time reliability of the model is ensured.
Fig. 5 is a block diagram of an apparatus for identifying potential registered users according to the present embodiment, and as shown in fig. 5, the apparatus for identifying potential registered users includes: a data acquisition module 501, a feature extraction module 502 and a processing module 503. The data obtaining module 501 is configured to obtain user portrait data and user behavior data in a first preset time period; the feature extraction module 502 is configured to perform feature extraction on the acquired user portrait data and user behavior data; the processing module 503 is configured to input the extracted feature data to a preset machine learning classification model, obtain a probability that the user is converted into a registered user in a second preset time period, and output a result; and the classification model is obtained after training according to the sample data of the determined registration conversion result.
The first preset time period may be 1, 3, 7, 15, 30 days, and it may also be determined whether the common user will be converted into the registered user in the future 1, 3, 7, 15, 30 and the conversion probability thereof, which is determined according to the sample data. The user image data is static data of the user and may include a user ID, a registration time of the user, and a time when the user becomes a general user. The user behavior data is dynamic data of the user, and mainly comprises data of purchasing, clicking and browsing and interacting with other users. The data obtaining module 501 obtains user portrait data and user behavior data at the same time.
The data flow of the user portrait data and the user behavior data is large, and an appropriate index value needs to be selected from the data flow as the feature data through the feature extraction module 502.
The processing module 503 is preset with a machine learning classification model, which is obtained by training samples with determined registered conversion results and corresponding feature data, where the sample data is user sample data with a conversion result in advance and feature data extracted according to portrait data and behavior data. After the machine learning classification model is established, a large number of sample users are trained to obtain a preset machine learning classification model, and for subsequent users to be identified, the acquired feature data is input into the machine learning classification model, so that the corresponding probability which can be converted into registered users can be quickly and accurately obtained. For example, the machine learning model selects a random forest model, and the information gain function of the random forest selects the kini index.
The device embodiment provided in this embodiment is for implementing the above method embodiments, and for details of the process and the details, reference is made to the above method embodiment, which is not described herein again.
The device for identifying potential registered users provided by the embodiment acquires the user portrait data and the user behavior data in the first preset time period, analyzes not only the static data such as user portrait, but also the dynamic data such as user behavior data, and can comprehensively and objectively reflect the potential possibility that the user becomes a registered user. The extracted feature data are input into a preset machine learning classification model, the probability that the user is converted into the registered user is obtained, the prediction timeliness can be effectively improved, each user can be predicted, and the directional marketing effect on the user is strong.
Fig. 6 is a schematic entity structure diagram of an electronic device provided in this embodiment, and as shown in fig. 6, the electronic device may include: a processor (processor)601, a communication interface (communication interface)602, a memory (memory)603 and a bus 604, wherein the processor 601, the communication interface 602 and the memory 603 complete communication with each other through the bus 604. The communication interface 602 may be used for information transfer of an electronic device. The processor 601 may call logic instructions in the memory 603 to perform a method comprising: acquiring user portrait data and user behavior data of a first preset time period; performing feature extraction on the acquired user portrait data and user behavior data; inputting the extracted feature data into a preset machine learning classification model, obtaining the probability of converting the user into a registered user in a second preset time period, and outputting a result; and the classification model is obtained after training according to the sample data of the determined registration conversion result.
In addition, the logic instructions in the memory 603 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the above-described method embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
On the other hand, the present embodiment also provides a non-transitory computer readable storage medium, on which a computer program is stored, and the computer program is implemented to perform the transmission method provided in the foregoing embodiments when executed by a processor, for example, the method includes: acquiring user portrait data and user behavior data of a first preset time period; performing feature extraction on the acquired user portrait data and user behavior data; inputting the extracted feature data into a preset machine learning classification model, obtaining the probability of converting the user into a registered user in a second preset time period, and outputting a result; and the classification model is obtained after training according to the sample data of the determined registration conversion result.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method of identifying potential registered users, comprising:
acquiring user portrait data and user behavior data of a first preset time period;
performing feature extraction on the acquired user portrait data and user behavior data;
inputting the extracted feature data into a preset machine learning classification model, obtaining the probability of converting the user into a registered user in a second preset time period, and outputting a result;
and the classification model is obtained after training according to the sample data of the determined registration conversion result.
2. The method of identifying potential registered users according to claim 1, wherein said user behavior data comprises at least: one of user order data, user click data, and user invited data.
3. The method of claim 2, wherein the step of performing feature extraction on the acquired user representation data and user behavior data comprises:
and determining a user behavior index value in a first preset time period according to user order data, user click data or user invited data, and taking the index value as a characteristic value of the user behavior data.
4. The method of claim 1, wherein prior to obtaining the user representation data and the user behavior data for the first predetermined period of time, further comprising:
obtaining a plurality of user samples with behavior records from historical data, and extracting the characteristics of user portrait data and user behavior data of all the users;
taking the combination of the sample data and the characteristic data which are converted into the registered user as a positive sample, and taking the combination of the sample data and the characteristic data which are not converted into the registered user as a negative sample;
and training the classification model by using the obtained positive samples and negative samples.
5. The method of claim 4, wherein after obtaining the probability of the user transforming into the registered user for the second predetermined period of time, the method further comprises:
determining a user list capable of being converted into a registered user from all users according to the probability and a preset threshold;
correspondingly, the output result specifically includes:
outputting a user list which can be converted into a registered user;
the preset threshold is determined according to the proportion of the positive samples to the negative samples.
6. The method of identifying potential registered users according to claim 5, wherein said plurality of positive samples and said plurality of negative samples comprise training set samples and validation set samples, and wherein said training said classification model further comprises, after said training said classification model:
and verifying the classification model by using the verification set sample, adjusting the preset threshold according to a verification result, wherein evaluation indexes for adjusting the threshold comprise recall rate or evaluation precision.
7. The method of identifying potential registered users of claim 5, further comprising:
every third preset time period, obtaining a plurality of user samples with behavior records from the latest historical data as updating samples, and performing feature extraction on user portrait data and user behavior data of all the updating samples;
taking the combination of the update sample data and the feature data which are converted into the registered user as a positive update sample, and taking the combination of the update sample data and the feature data which are not converted into the registered user as a negative update sample;
training and verifying the classification model by using the obtained multiple positive update samples and multiple negative update samples, and adjusting the preset threshold;
wherein the third preset time period is greater than or equal to the second preset time period.
8. An apparatus for identifying potential registered users, comprising:
the data acquisition module is used for acquiring user portrait data and user behavior data in a first preset time period;
the characteristic extraction module is used for extracting characteristics of the acquired user portrait data and the acquired user behavior data;
the processing module is used for inputting the extracted feature data into a preset machine learning classification model, obtaining the probability that the user is converted into a registered user in a second preset time period, and outputting a result;
and the classification model is obtained after training according to the sample data of the determined registration conversion result.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of identifying potential registered users according to any one of claims 1 to 7 are implemented when the program is executed by the processor.
10. A non-transitory computer readable storage medium, having stored thereon a computer program, which, when being executed by a processor, carries out the steps of the method of identifying potential registered users according to any one of claims 1 to 7.
CN201910935469.7A 2019-09-29 2019-09-29 Method and device for identifying potential registered user Pending CN110889716A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910935469.7A CN110889716A (en) 2019-09-29 2019-09-29 Method and device for identifying potential registered user

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910935469.7A CN110889716A (en) 2019-09-29 2019-09-29 Method and device for identifying potential registered user

Publications (1)

Publication Number Publication Date
CN110889716A true CN110889716A (en) 2020-03-17

Family

ID=69746051

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910935469.7A Pending CN110889716A (en) 2019-09-29 2019-09-29 Method and device for identifying potential registered user

Country Status (1)

Country Link
CN (1) CN110889716A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704599A (en) * 2021-07-14 2021-11-26 大箴(杭州)科技有限公司 Marketing conversion user prediction method and device and computer equipment
CN114064440A (en) * 2022-01-18 2022-02-18 恒生电子股份有限公司 Training method of credibility analysis model, credibility analysis method and related device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107093084A (en) * 2016-08-01 2017-08-25 北京小度信息科技有限公司 Potential user predicts method for transformation and device
CN110222272A (en) * 2019-04-18 2019-09-10 广东工业大学 A kind of potential customers excavate and recommended method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107093084A (en) * 2016-08-01 2017-08-25 北京小度信息科技有限公司 Potential user predicts method for transformation and device
CN110222272A (en) * 2019-04-18 2019-09-10 广东工业大学 A kind of potential customers excavate and recommended method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704599A (en) * 2021-07-14 2021-11-26 大箴(杭州)科技有限公司 Marketing conversion user prediction method and device and computer equipment
CN114064440A (en) * 2022-01-18 2022-02-18 恒生电子股份有限公司 Training method of credibility analysis model, credibility analysis method and related device

Similar Documents

Publication Publication Date Title
CN110222272B (en) Potential customer mining and recommending method
JP6134444B2 (en) Method and system for recommending information
Coussement et al. Integrating the voice of customers through call center emails into a decision support system for churn prediction
US20190156426A1 (en) Systems and methods for collecting and processing alternative data sources for risk analysis and insurance
Aakash et al. Assessment of hotel performance and guest satisfaction through eWOM: big data for better insights
CN109711955B (en) Poor evaluation early warning method and system based on current order and blacklist base establishment method
KR20160065429A (en) Hybrid personalized product recommendation method
KR20200048183A (en) Method and apparatus for online product recommendation considering reliability of product
CN113256397B (en) Commodity recommendation method and system based on big data and computer-readable storage medium
US20150095111A1 (en) Method and system for using social media for predictive analytics in available-to-promise systems
CN114547475B (en) Resource recommendation method, device and system
KR102458510B1 (en) Real-time complementary marketing system
CN111429214B (en) Transaction data-based buyer and seller matching method and device
CN111400613A (en) Article recommendation method, device, medium and computer equipment
CN111104590A (en) Information recommendation method, device, medium and electronic equipment
CN114266443A (en) Data evaluation method and device, electronic equipment and storage medium
CN112633690A (en) Service personnel information distribution method, service personnel information distribution device, computer equipment and storage medium
CN115545886A (en) Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium
CN110889716A (en) Method and device for identifying potential registered user
CN111429161A (en) Feature extraction method, feature extraction device, storage medium, and electronic apparatus
JP5603678B2 (en) Demand forecasting apparatus and demand forecasting method
CN111177581A (en) Multi-platform-based social e-commerce website commodity recommendation method and device
CN109118243B (en) Product sharing, useful evaluation identification and pushing method and server
Leventhal Predictive Analytics for Marketers: Using Data Mining for Business Advantage
Li Consumer behavior analysis model based on machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination