CN117455524A - User liveness assessment method, device, electronic equipment and storage medium - Google Patents

User liveness assessment method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117455524A
CN117455524A CN202311233187.5A CN202311233187A CN117455524A CN 117455524 A CN117455524 A CN 117455524A CN 202311233187 A CN202311233187 A CN 202311233187A CN 117455524 A CN117455524 A CN 117455524A
Authority
CN
China
Prior art keywords
user
feature
retention
score
retention score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311233187.5A
Other languages
Chinese (zh)
Inventor
郑才华
杨弋鋆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Communication Information System Co Ltd
Original Assignee
Inspur Communication Information System Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Communication Information System Co Ltd filed Critical Inspur Communication Information System Co Ltd
Priority to CN202311233187.5A priority Critical patent/CN117455524A/en
Publication of CN117455524A publication Critical patent/CN117455524A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Strategic Management (AREA)
  • Software Systems (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Medical Informatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of data processing, and provides a user liveness assessment method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring basic features, de-timing features and sequence features of user data; based on the basic feature, the de-timing feature and the sequence feature, obtaining an initial retention score and a classification value of the retention score; determining retention points of users based on the two classification values and the initial retention points, and determining active users based on the retention points; the retention score characterizes how active the user is using the shopping software. According to the method, basic characteristics, de-timing characteristics and sequence characteristics are determined, so that a first retention fraction and a classification value of the retention fraction are obtained; and based on the first retention score and the classification value, the retention score of the user is determined, so that the active user is determined, and the accuracy of evaluating the activity of the user is improved.

Description

User liveness assessment method, device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a user liveness assessment method, a device, an electronic device, and a storage medium.
Background
As the digital industry continues to go deep, the data collection, storage and processing capabilities of all directions are significantly improved. In this process, the cooperators accumulate a large amount of mass user data which is widely covered, strong in persistence and high in precision. The financial industry has widely used this data to mine potential customers, develop new products, and guide business development. In actual work, there are many methods for evaluating the intention of a user to purchase a car loan by making corresponding expert experience rules, traditional statistical methods or machine learning modeling methods to measure the activity of the user on the car purchasing app.
According to a traditional statistical scheme, key factors influencing the activity of a user are extracted by analyzing sequence behavior data of the user by using an app, and prediction is performed by using a regression model or an ARIMA model and other methods. The accuracy of traditional statistical learning methods tends to be very low.
In widely used machine learning modeling schemes, the characteristics are deconstructed by too much relying on artificial experience, so that some information is lost, and the final effect of the model is affected.
Based on rule construction of expert experience, single-index-based statistical analysis or multi-index joint distribution observation is needed, which limits the number of indexes, and too many indexes are difficult to form a fixed screening caliber, and too few indexes cannot guarantee the accuracy of results. The rule model is simple, poor in stability and low in accuracy due to various reasons.
The predictive goals of the current several schemes do not evaluate the user's activity with the app well.
Disclosure of Invention
The invention provides a user liveness assessment method, a device, electronic equipment and a storage medium, which are used for solving the problem that the existing method for assessing user liveness is inaccurate, and further obtaining a first retention score and a classification value of the retention score by determining basic characteristics, de-timing characteristics and sequence characteristics; and based on the first retention score and the classification value, the retention score of the user is determined, so that the active user is determined, and the accuracy of evaluating the activity of the user is improved.
The invention provides a user liveness assessment method, which comprises the following steps:
acquiring basic features, de-timing features and sequence features of user data;
based on the basic feature, the de-timing feature and the sequence feature, obtaining an initial retention score and a classification value of the retention score;
determining a retention score for a user based on the classification value and the initial retention score, and determining an active user based on the retention score; the retention score characterizes the user's liveness in using the shopping software.
In one embodiment, obtaining a binary classification value for a retention score based on the base feature, the de-timing feature, and the sequence feature comprises:
inputting the basic feature, the de-timing feature and the sequence feature into a first two-class model to obtain a first prediction probability output by the first two-class model; the first predictive probability characterizes a probability of a retention score of 0;
inputting the basic feature, the de-timing feature and the sequence feature into a second classification model to obtain a second prediction probability output by the second classification model; the second predictive probability characterizes a retention score of N;
setting a classification value corresponding to the first prediction probability to be 0 or not to be 0 based on the sequencing result of the first prediction probability;
and setting a classification value corresponding to the second prediction probability as N or not based on the ordering result of the second prediction probability.
In one embodiment, obtaining an initial retention score based on the base feature, the de-clocking feature, and the sequence feature comprises:
based on a lightweight gradient elevator LightGBM model, a classification enhancement Catboost model, an extreme gradient enhancement Xgboost model and a depth forest deep model, fusing the basic features and the de-timing features to obtain fusion features;
processing the sequence features based on a gate control circulation unit GRU model to obtain a sequence feature result;
and fusing the fusion characteristic and the sequence characteristic result to obtain the initial retention fraction.
In one embodiment, obtaining the de-clocking features and sequence features of the user data includes:
deriving the user data to obtain derived data;
acquiring historical retention points associated with the user data, and setting weights for the historical retention points;
and acquiring the de-timing characteristic and the sequence characteristic based on the derivative data, the historical retention score and the weight thereof.
In one embodiment, after obtaining the initial retention score based on the base feature, the de-timing feature, and the sequence feature, further comprising:
if the user does not access the shopping software for more than N days, setting the initial retention score of the user to 0;
and if the initial retention score of the user is smaller than a set threshold value and the number of times that the user accesses the shopping software is 0, setting the initial retention score of the user to 0.
In one embodiment, the determining the retention score for the user based on the classification value and the initial retention score comprises:
setting the initial persistence score to the persistence score when the initial persistence score matches the classification value;
when the initial persistence does not match the classification value, the classification value is set to the persistence score.
In one embodiment, the user data includes basic data and access data for a user to access shopping software, and acquiring the basic data includes:
determining an impact factor that affects the user's shopping intent;
and acquiring the basic data based on the influence factors, wherein the basic data comprises user identity data, user characteristic data and user communication data.
The invention also provides a user liveness assessment device, which comprises:
the acquisition module is used for acquiring basic characteristics, time-sequence removing characteristics and sequence characteristics of the user data;
the first determining module is used for acquiring an initial retention score and a classification value of the retention score based on the basic feature, the de-timing feature and the sequence feature;
the second determining module is used for determining the retention score of the user based on the classification value and the initial retention score and determining the active user based on the retention score; the retention score characterizes the user's liveness in using the shopping software.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the user activity assessment method according to any one of the above when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a user activity assessment method as described in any of the above.
According to the user activity assessment method, the device, the electronic equipment and the storage medium, the basic characteristics, the de-timing characteristics and the sequence characteristics of the user data are obtained; based on the basic feature, the de-timing feature and the sequence feature, obtaining an initial retention score and a classification value of the retention score; determining a retention score for a user based on the classification value and the initial retention score, and determining an active user based on the retention score; the retention score characterizes the user's liveness in using the shopping software. According to the method, basic characteristics, de-timing characteristics and sequence characteristics are determined, so that a first retention fraction and a classification value of the retention fraction are obtained; and based on the first retention score and the classification value, the retention score of the user is determined, so that the active user is determined, and the accuracy of evaluating the activity of the user is improved.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a user activity assessment method provided by the invention;
FIG. 2 is a second flowchart of a user activity assessment method according to the present invention;
FIG. 3 is a schematic representation of derived data provided by the present invention;
FIG. 4 is a schematic diagram of a user activity assessment device according to the present invention;
fig. 5 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The user liveness assessment method, apparatus, electronic device and storage medium of the present invention are described below with reference to fig. 1 to 5.
Specifically, the present invention provides a user activity assessment method, and referring to fig. 1, fig. 1 is one of flow diagrams of the user activity assessment method provided by the present invention.
The user activity evaluation method provided by the embodiment of the invention comprises the following steps:
s100, acquiring basic features, de-timing features and sequence features of user data;
the user activity assessment method provided by the embodiment of the invention is used for assessing the activity of the user using certain shopping software. And acquiring user data according to the shopping software and the third party platform.
The basic features are used for characterizing basic conditions of the user, including basic identity features, portrait class features and basic communication features of the user.
The de-timing feature is a feature of removing time information, such as whether shopping software is used, the number of times of use, the length of use, the number of uses, and the like.
The sequence characteristics are sequence characteristics of retention time information, for example, whether a sequence is used, a daily use sequence, a daily use time-long sequence, a history retention score sequence, and the like.
Acquiring user data of a user, including acquiring access data of the user to shopping software (e.g., a second-hand cart network); and acquiring the basic data of the user through the relevant cooperators. Preprocessing and feature extraction are carried out on the user data to obtain basic features, de-timing features and sequence features of the user data.
S200, acquiring an initial retention score and a classification value of the retention score based on basic features, a de-timing feature and sequence features;
a plurality of models are trained in advance to predict retention. Inputting the basic features, the de-timing features and the sequence features into a trained model to obtain a plurality of prediction results of the retention. And fusing the prediction results to obtain initial retention fractions.
Based on the sample data of the basic features, the de-sequenced features and the sequence features, a plurality of classification models, such as a first classification model and a second classification model, for performing classification on the retention components are trained in advance. Inputting the basic feature, the de-timing feature and the sequence feature into a plurality of pre-trained classification models to obtain a classification value of the retention score, for example, the retention score is 0 or the retention score is not 0.
S400, determining retention points of users based on the two classification values and the initial retention points, and determining active users based on the retention points; the retention score characterizes how active the user is using the shopping software.
The retention score may evaluate the intent of the user to shop. For example, if the user's retention score is 5, which means that the user will use the shopping software for 5 days in the next 5 days, the user's intention to purchase is great.
Comparing whether the initial retention and the two classification values are matched; and determining the retention score of the user based on the matching result. Users with a retention score greater than or equal to a second set threshold are determined to be active users, e.g., users with a retention score greater than or equal to 4 are set to be active users.
According to the user activity assessment method provided by the embodiment of the invention, the basic characteristics, the time-sequence removing characteristics and the sequence characteristics of the user data are obtained; based on the basic feature, the de-timing feature and the sequence feature, obtaining an initial retention score and a classification value of the retention score; determining retention points of users based on the two classification values and the initial retention points, and determining active users based on the retention points; the retention score characterizes how active the user is using the shopping software. According to the embodiment of the invention, the basic characteristics, the de-timing characteristics and the sequence characteristics are determined, so that a first retention score and a classification value of the retention score are obtained; and based on the first retention score and the classification value, the retention score of the user is determined, so that the active user is determined, and the accuracy of evaluating the activity of the user is improved.
Based on the above embodiment, based on the basic feature, the de-timing feature, and the sequence feature, obtaining the classification value of the retention score includes:
s210, inputting basic features, de-timing features and sequence features into a first two-class model to obtain a first prediction probability output by the first two-class model; the first predictive probability characterizes a probability of a retention score of 0;
s220, inputting the basic feature, the de-timing feature and the sequence feature into a second classification model to obtain a second prediction probability output by the second classification model; the second predictive probability characterizes a retention score of N;
s230, setting a classification value corresponding to the first prediction probability to be 0 or not to be 0 based on the sequencing result of the first prediction probability;
s240, setting the classification value corresponding to the second prediction probability as N or not N based on the ordering result of the second prediction probability.
The classification models are trained in advance to output the classification values, and include a classification model (first classification model) of whether or not the retention score is 0 and a classification model (second classification model) of whether or not the retention score is N. Each classification model includes a CNN layer and a DNN layer, where the CNN layer includes two convolutions and two maximum pooling. The embodiment of the invention is described by taking a binary classification model with a retention score of 0 and a binary classification model with a retention score of 5 as examples.
Inputting the basic feature, the de-timing feature and the sequence feature into a binary classification model of whether the retention score is 0, and obtaining a first prediction probability of the retention score of 0. Inputting the basic feature, the de-timing feature and the sequence feature into a classification model of whether the retention score is 5, and obtaining a second prediction probability of the retention score of 5. Each sequence feature is converted into a matrix form of N x N, for example, each sequence feature is converted into a matrix form of 5*5. And extracting local features and periodic features in the sequence features by adopting a convolutional layer of the CNN, and extracting the maximum value by using a maximum pooling layer of the CNN to obtain a final feature map. And combining the feature map with the de-timing features and the sequence features by adopting DNN, and extracting the features of the combined feature map to obtain a combination result. And activating the combination result by adopting an activation function to obtain a first prediction probability or a second prediction probability, wherein the activation function can be sigmoid.
In the two-classification model of whether the retention score is 0, the two-classification value includes 0 and not 0. The first prediction probabilities are ranked (e.g., from large to small) to obtain a first ranking result. Based on the first ranking result, the first 3% of the first prediction probability is taken as the first target prediction probability. The classification value of the first target prediction probability is set to 0. For example, in the first ranking result, the first 3% of the first prediction probabilities are 99%, 98%, 97%, 96.5%, and the two classification values of the first prediction probabilities are 99%, 98%, 97%, 96.5% are all set to 0. The other first prediction probability is set to a classification value other than 0.
In the two-classification model of whether the retention score is 5, the two-classification value includes 5 and not 5. And sorting the second prediction probabilities (for example, sorting from large to small) to obtain a second sorting result. Based on the second ranking result, the first 2% of the second prediction probability is taken as the second target prediction probability. The classification value of the second target prediction probability is set to 5. For example, in the second ranking result, the second prediction probabilities of the first 2% of the ranking are 99.1%, 99%, 98.7%, 98.0%, and the two classification values of the second prediction probabilities of 99.1%, 99%, 98.7%, 98.0% are all set to 5. The other second prediction probability is set to a classification value other than 5.
According to the embodiment of the invention, the accuracy of determining the binary value of the reserved fraction is improved by setting the binary model of whether the reserved fraction is 0 or not and the binary model of whether the reserved fraction is N or not and setting the binary value of the prediction probability with larger numerical value to be 0 or N.
Based on the above embodiment, based on the basic feature, the de-timing feature, and the sequence feature, obtaining the initial retention score includes:
s250, fusing basic features and de-timing features to obtain fusion features based on a lightweight gradient elevator LightGBM model, a classification enhancement Catboost model, an extreme gradient lifting Xgboost model and a depth forest model;
s260, processing the sequence features based on a gate control circulation unit GRU model to obtain a sequence feature result;
s270, fusing the fusion characteristic and the sequence characteristic result to obtain an initial retention fraction.
The lightweight gradient hoist (Light Gradient Boosting Machine, lightGBM) model is a machine learning model trained using the lightGBM algorithm, predicting target variables by iteratively training multiple decision tree models. The lightGBM model may be used for classification, regression, ranking, and the like.
The classification enhancement (Categorical Boosting, catboost) model is a machine learning model that is trained on data using the Catboost algorithm. The lightGBM model may be used for classification, regression, ordering, and the like.
The extreme gradient lifting (Extreme Gradient Boosting, xgboost) model is a machine learning model that is trained on data using Xgboost algorithms.
Deep forest is a deep learning-based target detection framework that uses convolutional neural networks to detect and locate target objects in images. The deep forest model can be applied to various computer vision tasks such as target detection, object recognition, image segmentation and the like.
The gated loop unit (Gated Recurrent Unit, GRU) model is a variant of a loop neural network model for processing sequence data. The GRU model solves the problems of gradient elimination and gradient explosion in the traditional RNN by introducing a gating mechanism, and simultaneously enhances the modeling capability of long-term dependency. The GRU model has fewer parameters and better performance than a traditional RNN.
As shown in fig. 2, the first preset model is trained in advance by using the sample data of the basic features and the de-sequenced features, so as to obtain a lightGBM model. And training a second preset model by using sample data of the basic features and the time-sequence removing features in advance to obtain a Catboost model. And training a third preset model by using sample data of the basic features and the time-sequence-removed features in advance to obtain an Xgboost model. And training a fourth preset model by using sample data of the basic features and the de-timing features in advance to obtain a deep model. And training a fifth preset model by using sample data of the sequence characteristics in advance to obtain a GRU model.
And respectively inputting the basic feature and the de-timing feature into a lightGBM model to obtain a first fusion feature. And respectively inputting the basic features and the de-timing features into a Catboost model to obtain second fusion features. And respectively inputting the basic features and the de-timing features into an Xgboost model to obtain a third fusion feature. And respectively inputting the basic features and the de-timing features into a deep forest model to obtain a fourth fusion feature. And inputting the sequence features into a GRU model to obtain a sequence feature result.
Blending and averaging the first fusion feature, the second fusion feature, the third fusion feature, the fourth fusion feature and the sequence feature results to obtain an initial retention score, wherein the calculation formula of the initial retention score is as follows:
wherein,for the first fusion feature, +.>For the second fusion feature, +.>For the third fusion feature, +.>For the fourth fusion feature, +.>For sequence characterization result, ++>For initial retention.
According to the embodiment of the invention, the initial retention is obtained based on the LightGBM model, the Catboost model, the Xgboost model, the deep model and the GRU model, so that the accuracy of determining the initial retention is improved.
Based on the above embodiment, acquiring the de-timing feature and the sequence feature of the user data includes:
s110, deriving user data to obtain derived data;
s120, acquiring historical retention points associated with user data, and setting weights for the historical retention points; de-clocking features and sequence features are derived based on the derived data, historical retention scores and weights.
The habit of using the shopping cart APP by the user has continuity or sporadic property, and in either case, the habit is closely related to the historical use state of the user, and key features in the access data are extracted according to the access data of the user accessing shopping software in the user data to obtain derivative data:
(1) And extracting derivative data for predicting whether the user uses the target day, the number of times of use/the duration of the day and the like. The target day is the date on which the reservation score needs to be predicted.
(2) And extracting derivative data such as the number of times/number/duration of use of the target day five days before.
(3) Extracting derivative data such as historical total use times, number of days, total use times of three days, total use times of five days, total use times of one week, number of days and the like
(4) And extracting derivative data such as the difference of days from the current time point in the last use.
As shown in fig. 3, a history retention score associated with user data is acquired, and a weight is set to the history retention score. Extracting the median of the previous month retention and the average value of the time point retention corresponding to the previous six periods of the user, and setting weights for the median and the average value to obtain a weighted median and a weighted average value. Wherein, every 5 days is set as a cycle, and the corresponding time points of the first six cycles are target_date-5 (5 days before the target day), target_date-10 (10 days before the target day), target_date-15 (15 days before the target day), target_date-20 (20 days before the target day), target_date-25 (25 days before the target day), and target_date-30 (30 days before the target day). The weight settings of the average value and the median in 6 periods are sequentially set to 0.3, 0.2, 0.15, 0.1 and 0.05 according to the distance from the target day, and the weight setting formulas are as follows:
wherein weights are weights, diff is the distance from the target day, and the value range of diff is [1, 30].
And extracting derivative data, historical retention scores and weights of the same time period to obtain the de-timing characteristics. In order to compensate for information loss caused by de-timing, constructing sequence characteristics; and obtaining sequence characteristics based on the derivative data, the historical retention score and the weight of different time periods. The sequence characteristics comprise whether to use a sequence, a daily use frequency sequence, a daily use time sequence and a historical retention score sequence, and the retention score is directly modeled and predicted by adopting a GRU deep learning algorithm.
Whether or not to use the sequence, the number of times of day sequence, and the time of day sequence are obtained based on the derivative data of 25 days before the target day, the history retention score, and the weight.
Considering the case where when the target day is target_date-3 (3 days before the target day) when the target day is left for a certain of 5 days before the target_date is calculated, for example, the five-day left fraction thereof leaks from target_date+1 (1 day after the target day) to target_date+2 (2 days after the target day) for two days, which causes an incongruous situation in the past predicted in the future. The history retention score sequence is obtained based on the history retention score and the weight of 30 days before the target day to 5 days before the target day.
Further, feature identification is performed on holidays and weekends to identify what day is the weekend, what day is the month, and whether the day is the holiday.
According to the embodiment of the invention, the sequence characteristics and the time-sequence removing characteristics are determined by acquiring the derivative data, the history retention and the weights thereof, so that the accuracy of determining the sequence characteristics and the time-sequence removing characteristics is improved.
Based on the above embodiment, after obtaining the initial retention score based on the basic feature, the de-timing feature, and the sequence feature, the method further includes:
s280, if the user has no shopping software access for more than N days, setting the initial retention score of the user to 0;
s290, if the initial retention score of the user is smaller than the set threshold value and the number of times the user accesses the shopping software is 0, setting the initial retention score of the user to 0.
Some users do not use the APP for a long time, and the APP is used randomly on a target day, has strong uncertainty and cannot make predictions through limited data. And the probability of continuous use occurring suddenly is extremely low even if the device is not used for a long time. Therefore, the initial retention of the group not using the APP for purchasing vehicles is directly corrected to 0 for more than N days, for example, 25 days.
Since the retention score of 0 is the tag with the largest duty ratio in the data set and the probability of reuse by the historically unused user is low, the initial retention score is smaller than the set threshold, for example, 0.5 and the initial retention score of the user who never uses the purchase class APP is also corrected to 0 together.
According to the embodiment of the invention, the initial retention score of part of users is directly set to 0 by post-processing the initial retention score, so that the accuracy of determining the initial retention score is improved.
Based on the above embodiment, determining the retention score of the user based on the binary classification value and the initial retention score includes:
s410, setting the initial retention score as a retention score when the initial retention score is matched with the two classification values;
s420, when the initial retention and the two classification values are not matched, setting the two classification values as retention scores.
And comparing the initial retention score with the corresponding classification value of the initial retention score. When the initial retention score falls within the range of the two classification values (matching), setting the initial retention score as a retention score; when the initial retention score does not fall within the range of the two classification values (no match), the two classification values are set to the retention score.
For example, if the initial retention is 1.5 and the classification value is not 0, the initial retention is matched with the classification value, and 1.5 is set as the retention. For example, if the initial retention is 1.5 and the two-class value is 0, the initial retention does not match the two-class value, and 0 is set as the retention.
According to the embodiment of the invention, the retention is determined based on the matching result of the initial retention and the two classification values, so that the accuracy of determining the retention is improved.
Based on the above embodiment, the user data includes basic data and access data for the user to access the shopping software, and the obtaining the basic data includes:
s130, determining an influence factor influencing shopping intention of a user;
and S140, acquiring basic data based on the influence factors, wherein the basic data comprises user identity data, user characteristic data and user communication data.
The shopping software comprises vehicle purchasing software, house purchasing software and the like, and the embodiment of the invention uses a vehicle purchasing APP as an example for explanation. In the existing APP library, 10 types of software are screened as shown in table 1, and the software is defined as vehicle purchasing software. The software is bound integrally, and a user uses any software in a certain day, namely, the user considers that the software is used in the same day. The access data is key information in the user data. Access data of a user accessing shopping software is obtained. For example, feature information such as whether the user uses the purchase vehicle APP, the frequency of using the purchase vehicle APP, and the like is acquired.
Influence factors are determined that influence the user's shopping intent, such as user identity, user characteristics, and user base communications. Based on the impact factors and other partner platforms, base data is obtained, including user identity data, user feature data, and user communication data. The contents of the specific basic data are shown in table 2.
TABLE 1 purchasing class APP detail list
Large category Subclass ID App name
Second hand cart 105021100002000000 Easy-to-use second hand cart
Automobile 105020605027000000 Big car quotation
Automobile 105020604987031000 Automobile home
Automobile 105020605075000000 Vehicle-learning emperor
Automobile 105020605088000000 Mobile type loving card automobile
Automobile 105020602276043000 Aika automobile
Automobile 105020605037078000 Pacific automobile net
Table 2 user data list
According to the embodiment of the invention, the user data is determined by acquiring the basic data and the access data of the user, so that the content of the user data is enriched, and the activity of the user can be evaluated more comprehensively and accurately.
Further, the LightGBM model, the Catboost model, the Xgboost model, and the deep model are predicted on a reserved test set, and each model is scored. The calculation formula of the score is:
wherein Score is model Score, F i The output result of the model comprises a first fusion feature, a second fusion feature, a third fusion feature, a fourth fusion feature and a sequence feature result, A i For true retention, N is the number of users.
The scores of the models can be obtained through calculation. Wherein, the score of the LightGBM model is 0.8019, the score of the Catboost model is 0.8066, the score of the Xgboost model is 0.8005, and the score of the deep model is 0.8059. The average score for each model was 0.8037.
After adding the GRU model, the average score of the LightGBM model, the Catboost model, the Xgboost model, the deep model, and the GRU model was calculated to be 0.81. After post-processing the initial retention score based on the two classification values, the overall score of the model reached 0.8365.
Adding a GRU model can increase the score of the overall model. And carrying out post-treatment on the initial retention score based on the two classification values to obtain the retention score, so that the score of the overall model can be remarkably improved.
According to the embodiment of the invention, the effect of evaluating the activity of the user by each model is evaluated by scoring each model, so that the performance of the model is accurately evaluated.
Fig. 4 is a schematic structural diagram of a user activity assessment device provided by the present invention, and referring to fig. 4, an embodiment of the present invention provides a user activity assessment device, including:
an acquisition module 401, configured to acquire a basic feature, a de-timing feature, and a sequence feature of user data;
a first determining module 402, configured to obtain an initial retention score and a classification value of the retention score based on the basic feature, the de-timing feature, and the sequence feature;
a second determining module 403, configured to determine a retention score of the user based on the two classification values and the initial retention score, and determine an active user based on the retention score; the retention score characterizes how active the user is using the shopping software.
The user activity evaluation device provided by the embodiment of the invention obtains the basic characteristics, the de-timing characteristics and the sequence characteristics of the user data; based on the basic feature, the de-timing feature and the sequence feature, obtaining an initial retention score and a classification value of the retention score; determining a retention score for a user based on the classification value and the initial retention score, and determining an active user based on the retention score; the retention score characterizes the user's liveness in using the shopping software. According to the embodiment of the invention, the basic characteristics, the de-timing characteristics and the sequence characteristics are determined, so that a first retention score and a classification value of the retention score are obtained; and based on the first retention score and the classification value, the retention score of the user is determined, so that the active user is determined, and the accuracy of evaluating the activity of the user is improved.
In one embodiment, the first determining module 402 is configured to: based on the base features, the de-timing features and the sequence features, obtaining a classification value of the retention score comprises: inputting the basic features, the de-timing features and the sequence features into a first two-class model to obtain a first prediction probability output by the first two-class model; the first predictive probability characterizes a probability of a retention score of 0; inputting the basic features, the de-timing features and the sequence features into a second classification model to obtain a second prediction probability output by the second classification model; the second predictive probability characterizes a retention score of N; setting a classification value corresponding to the first prediction probability to be 0 or not to be 0 based on the sequencing result of the first prediction probability; and setting the classification value corresponding to the second prediction probability as N or not N based on the ordering result of the second prediction probability.
In one embodiment, the first determining module 402 is configured to: based on the base features, the de-timing features, and the sequence features, obtaining an initial retention score includes: based on a lightweight gradient elevator LightGBM model, a classification enhancement Catboost model, an extreme gradient lifting Xgboost model and a depth forest deep model, fusing basic features and de-timing features to obtain fusion features; processing the sequence features based on a gate control circulation unit GRU model to obtain a sequence feature result; and fusing the fusion characteristic and the sequence characteristic result to obtain the initial retention fraction.
In one embodiment, the obtaining module 401 is configured to: acquiring a de-clocking feature and a sequence feature of user data, comprising: deriving user data to obtain derived data; acquiring historical retention points associated with user data, and setting weights for the historical retention points; based on the derived data, the historical retention scores and weights thereof, de-clocking features and sequence features are obtained.
In one embodiment, the first determination module 402 is further configured to: if the user does not access the shopping software for more than N days, setting the initial retention score of the user to 0; if the initial retention score of the user is smaller than the set threshold value and the number of times the user accesses the shopping software is 0, the initial retention score of the user is set to 0.
In one embodiment, the second determining module 403 is configured to: when the initial retention is matched with the two classification values, setting the initial retention as a retention; when the initial retention does not match the two classification values, the two classification values are set as retention scores.
In one embodiment, the user data includes basic data and access data for the user to access the shopping software, and the obtaining module 401 is configured to: acquiring basic data, including: determining an influence factor that influences the shopping intent of the user; and acquiring basic data based on the influence factors, wherein the basic data comprises user identity data, user characteristic data and user communication data.
Fig. 5 illustrates a physical schematic diagram of an electronic device, as shown in fig. 5, which may include: processor 510, communication interface 520, memory 530, and communication bus 540, wherein processor 510, communication interface 520, and memory 530 communicate with each other via communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform a user liveness assessment method comprising:
acquiring basic features, de-timing features and sequence features of user data; based on the basic feature, the de-timing feature and the sequence feature, obtaining an initial retention score and a classification value of the retention score; determining retention points of users based on the two classification values and the initial retention points, and determining active users based on the retention points; the retention score characterizes how active the user is using the shopping software.
Further, the logic instructions in the memory 530 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, randomAccessMemory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the user activity assessment method provided by the above methods, the method comprising:
acquiring basic features, de-timing features and sequence features of user data; based on the basic feature, the de-timing feature and the sequence feature, obtaining an initial retention score and a classification value of the retention score; determining retention points of users based on the two classification values and the initial retention points, and determining active users based on the retention points; the retention score characterizes how active the user is using the shopping software.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A user liveness assessment method, comprising:
acquiring basic features, de-timing features and sequence features of user data;
based on the basic feature, the de-timing feature and the sequence feature, obtaining an initial retention score and a classification value of the retention score;
determining a retention score for a user based on the classification value and the initial retention score, and determining an active user based on the retention score; the retention score characterizes the user's liveness in using the shopping software.
2. The user liveness assessment method of claim 1, wherein obtaining a classification value for a retention score based on the base feature, the de-sequenced feature, and the sequenced feature comprises:
inputting the basic feature, the de-timing feature and the sequence feature into a first two-class model to obtain a first prediction probability output by the first two-class model; the first predictive probability characterizes a probability of a retention score of 0;
inputting the basic feature, the de-timing feature and the sequence feature into a second classification model to obtain a second prediction probability output by the second classification model; the second predictive probability characterizes a retention score of N;
setting a classification value corresponding to the first prediction probability to be 0 or not to be 0 based on the sequencing result of the first prediction probability;
and setting a classification value corresponding to the second prediction probability as N or not based on the ordering result of the second prediction probability.
3. The user liveness assessment method of claim 1, wherein obtaining an initial retention score based on the base feature, the de-sequenced feature, and the sequenced feature comprises:
based on a lightweight gradient elevator LightGBM model, a classification enhancement Catboost model, an extreme gradient enhancement Xgboost model and a depth forest deep model, fusing the basic features and the de-timing features to obtain fusion features;
processing the sequence features based on a gate control circulation unit GRU model to obtain a sequence feature result;
and fusing the fusion characteristic and the sequence characteristic result to obtain the initial retention fraction.
4. The user activity assessment method according to claim 1, wherein obtaining the de-timed feature and the sequence feature of the user data comprises:
deriving the user data to obtain derived data;
acquiring historical retention points associated with the user data, and setting weights for the historical retention points;
and acquiring the de-timing characteristic and the sequence characteristic based on the derivative data, the historical retention score and the weight thereof.
5. The user activity assessment method according to claim 1, wherein after obtaining an initial retention score based on the base feature, the de-timing feature, and the sequence feature, further comprising:
if the user does not access the shopping software for more than N days, setting the initial retention score of the user to 0;
and if the initial retention score of the user is smaller than a set threshold value and the number of times that the user accesses the shopping software is 0, setting the initial retention score of the user to 0.
6. The user liveness assessment method of claim 1, wherein the determining a retention score for a user based on the classification value and the initial retention score comprises:
setting the initial persistence score to the persistence score when the initial persistence score matches the classification value;
when the initial persistence does not match the classification value, the classification value is set to the persistence score.
7. The user activity assessment method according to claim 1, wherein the user data includes basic data and access data of a user to shopping software, and acquiring the basic data includes:
determining an impact factor that affects the user's shopping intent;
and acquiring the basic data based on the influence factors, wherein the basic data comprises user identity data, user characteristic data and user communication data.
8. A user liveness assessment apparatus, comprising:
the acquisition module is used for acquiring basic characteristics, time-sequence removing characteristics and sequence characteristics of the user data;
the first determining module is used for acquiring an initial retention score and a classification value of the retention score based on the basic feature, the de-timing feature and the sequence feature;
the second determining module is used for determining the retention score of the user based on the classification value and the initial retention score and determining the active user based on the retention score; the retention score characterizes the user's liveness in using the shopping software.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the user activity assessment method according to any one of claims 1 to 7 when executing the program.
10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the user activity assessment method according to any one of claims 1 to 7.
CN202311233187.5A 2023-09-22 2023-09-22 User liveness assessment method, device, electronic equipment and storage medium Pending CN117455524A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311233187.5A CN117455524A (en) 2023-09-22 2023-09-22 User liveness assessment method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311233187.5A CN117455524A (en) 2023-09-22 2023-09-22 User liveness assessment method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117455524A true CN117455524A (en) 2024-01-26

Family

ID=89580646

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311233187.5A Pending CN117455524A (en) 2023-09-22 2023-09-22 User liveness assessment method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117455524A (en)

Similar Documents

Publication Publication Date Title
CN109902753B (en) User recommendation model training method and device, computer equipment and storage medium
CN107730131B (en) Capability prediction and recommendation method and device for crowdsourced software developers
CN111310814A (en) Method and device for training business prediction model by utilizing unbalanced positive and negative samples
CN112633962B (en) Service recommendation method and device, computer equipment and storage medium
CN110889450B (en) Super-parameter tuning and model construction method and device
CN111210072B (en) Prediction model training and user resource limit determining method and device
CN110956303A (en) Information prediction method, device, terminal and readable storage medium
CN113610552A (en) User loss prediction method and device
CN114048468A (en) Intrusion detection method, intrusion detection model training method, device and medium
CN113032638A (en) Network link prediction method and device
CN114937182B (en) Image emotion distribution prediction method based on emotion wheel and convolutional neural network
CN112487284A (en) Bank customer portrait generation method, equipment, storage medium and device
CN112148986A (en) Crowdsourcing-based top-N service re-recommendation method and system
CN115456707A (en) Method and device for providing commodity recommendation information and electronic equipment
CN111222026B (en) Training method of user category recognition model and user category recognition method
CN116501979A (en) Information recommendation method, information recommendation device, computer equipment and computer readable storage medium
CN113393023B (en) Mold quality evaluation method, apparatus, device and storage medium
CN110717787A (en) User classification method and device
CN117455524A (en) User liveness assessment method, device, electronic equipment and storage medium
Arunadevi et al. Comparison of feature selection strategies for classification using rapid miner
CN111523649B (en) Method and device for preprocessing data aiming at business model
CN113361653A (en) Deep learning model depolarization method and device based on data sample enhancement
CN109308565B (en) Crowd performance grade identification method and device, storage medium and computer equipment
KR102399833B1 (en) synopsis production service providing apparatus using log line based on artificial neural network and method therefor
CN116610783B (en) Service optimization method based on artificial intelligent decision and digital online page system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination