CN113706182A - User classification method and device - Google Patents

User classification method and device Download PDF

Info

Publication number
CN113706182A
CN113706182A CN202010430462.2A CN202010430462A CN113706182A CN 113706182 A CN113706182 A CN 113706182A CN 202010430462 A CN202010430462 A CN 202010430462A CN 113706182 A CN113706182 A CN 113706182A
Authority
CN
China
Prior art keywords
user
value
data set
user data
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010430462.2A
Other languages
Chinese (zh)
Inventor
李成林
廖耀华
雷章明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Wodong Tianjun Information Technology Co Ltd
Priority to CN202010430462.2A priority Critical patent/CN113706182A/en
Publication of CN113706182A publication Critical patent/CN113706182A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements
    • G06Q30/0245Surveys
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements

Abstract

The invention discloses a user classification method and device, and relates to the technical field of computers. One embodiment of the method comprises: determining a value classification label of a user in the first user data set according to the behavior data of the user in the first user data set; training a user value classifier by taking the user characteristics and the value classification labels of the first user data set as training data; and inputting the user characteristics of the second user data set into the trained user value classifier to calculate the goodness of fit of the user characteristics of the second user data set to each value classification, converting the goodness of fit into the probability corresponding to a certain value class, and determining the value classification information of the users in the second user data set according to the probability. The implementation method can dig out deep abstract features of unregistered users, enables the business activities to have pertinence, improves the execution efficiency of the business activities, and reduces the activity cost.

Description

User classification method and device
Technical Field
The invention relates to the technical field of computers, in particular to a user classification method and device.
Background
Currently, value identification for unregistered users mainly adopts an analytic hierarchy process, the method gives corresponding weight to each level, continuously adjusts the weight of each level through experimental effect, and finally obtains the value of the unregistered users after weighted average. Because the method continuously adjusts the weight of each layer through the experimental effect and finally calculates the value of the unregistered user, the realization is rigid, and the deep abstract characteristics of the unregistered user cannot be excavated, so that the determined value classification information of the unregistered user cannot be used for developing business activities for various unregistered users in a targeted manner when applied to the business activities, and the execution efficiency of the business activities is influenced.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:
deep abstract features of unregistered users cannot be mined, and the determined value classification information of the unregistered users is applied to business activities, so that the business activities lack pertinence, the execution efficiency of the business activities is influenced, and the activity cost is increased.
Disclosure of Invention
In view of this, embodiments of the present invention provide a user classification method and apparatus, which can dig out deep abstract features of unregistered users, so that a business activity has pertinence, improve execution efficiency of the business activity, and reduce activity cost.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a user classification method.
A user classification method, comprising: determining a value classification label of a user in a first user data set according to behavior data of the user in the first user data set; training a user value classifier by taking the user characteristics of the first user data set and the value classification label as training data; inputting the user characteristics of a second user data set into the trained user value classifier to calculate the goodness of fit of the user characteristics of the second user data set to each value classification, converting the goodness of fit into the probability corresponding to a certain value class, and determining the value classification information of the users in the second user data set according to the probability.
Optionally, the determining a value classification label of a user in the first user data set according to behavior data of the user in the first user data set includes: calculating at least one behavior index value of each user in the first user data set, and calculating a mean value according to categories based on all the obtained behavior index values to obtain a mean value of each behavior index value; and determining a value classification label of the user in the first user data set by using a preset user value model according to the calculated behavior index value and the behavior index value mean value.
Optionally, the at least one behavior index value comprises a duration of the last action time from the current time, a behavior frequency and a behavior related amount; the determining, according to the calculated behavior index value and the behavior index value mean, a value classification label of the user in the first user data set by using a preset user value model includes: comparing the current time length of the last action time of each user in the first user data set, the action frequency and the action related amount with corresponding action index value means respectively to determine the value evaluation level of each user in the first user data set in three aspects of the last action, the action frequency and the action related amount; determining the value category of each user in the first user data set according to the value evaluation level of each user in the first user data set in the three aspects and the corresponding relationship between the value evaluation level combination and the value category in the preset user value model, wherein the value evaluation level combination is an ordered combination of the value evaluation levels in the three aspects; and coding the value category of each user in the first user data set to obtain the value classification label of the user in the first user data set.
Optionally, the user characteristics of one user data set are extracted by the following method, where the user data set is the first user data set or the second user data set: and coding one or more information of user equipment information, browser information, the area, occupation and cell price of the user in the user data set, and taking the obtained coding vector as the user characteristic of the user data set.
Optionally, the user value classifier is implemented based on two or more layers of neural networks.
According to another aspect of the embodiments of the present invention, there is provided a user classification apparatus.
A user classification apparatus comprising: the tag determination module is used for determining value classification tags of users in a first user data set according to behavior data of the users in the first user data set; the training module is used for training a user value classifier by taking the user characteristics of the first user data set and the value classification label as training data; and the classification determining module is used for inputting the user characteristics of the second user data set into the trained user value classifier so as to calculate the goodness of fit of the user characteristics of the second user data set to each value classification, converting the goodness of fit into the probability corresponding to a certain value class, and determining the value classification information of the users in the second user data set according to the probability.
Optionally, the tag determination module is further configured to: calculating at least one behavior index value of each user in the first user data set, and calculating a mean value according to categories based on all the obtained behavior index values to obtain a mean value of each behavior index value; and determining a value classification label of the user in the first user data set by using a preset user value model according to the calculated behavior index value and the behavior index value mean value.
Optionally, the at least one behavior index value comprises a duration of the last action time from the current time, a behavior frequency and a behavior related amount; the tag determination module includes a value classification tag determination sub-module to: comparing the current time length of the last action time of each user in the first user data set, the action frequency and the action related amount with corresponding action index value means respectively to determine the value evaluation level of each user in the first user data set in three aspects of the last action, the action frequency and the action related amount; determining the value category of each user in the first user data set according to the value evaluation level of each user in the first user data set in the three aspects and the corresponding relationship between the value evaluation level combination and the value category in the preset user value model, wherein the value evaluation level combination is an ordered combination of the value evaluation levels in the three aspects; and coding the value category of each user in the first user data set to obtain the value classification label of the user in the first user data set.
Optionally, the system further comprises a user feature extraction module, configured to extract a user feature of a user data set, where the user data set is the first user data set or the second user data set: and coding one or more information of user equipment information, browser information, the area, occupation and cell price of the user in the user data set, and taking the obtained coding vector as the user characteristic of the user data set.
Optionally, the user value classifier is implemented based on two or more layers of neural networks.
According to yet another aspect of an embodiment of the present invention, an electronic device is provided.
An electronic device, comprising: one or more processors; a memory for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the user classification method provided by embodiments of the present invention.
According to yet another aspect of an embodiment of the present invention, a computer-readable medium is provided.
A computer-readable medium, on which a computer program is stored, which, when executed by a processor, implements the user classification method provided by an embodiment of the present invention.
One embodiment of the above invention has the following advantages or benefits: determining a value classification label of a user in the first user data set according to the behavior data of the user in the first user data set; training a user value classifier by taking the user characteristics and the value classification labels of the first user data set as training data; and inputting the user characteristics of the second user data set into the trained user value classifier to calculate the goodness of fit of the user characteristics of the second user data set to each value classification, converting the goodness of fit into the probability corresponding to a certain value class, and determining the value classification information of the users in the second user data set according to the probability. Deep abstract features of unregistered users can be mined, so that business activities have pertinence, the execution efficiency of the business activities is improved, and the activity cost is reduced.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of the main steps of a user classification method according to one embodiment of the present invention;
FIG. 2 is a schematic diagram of a user classification flow according to one embodiment of the invention;
FIG. 3 is a schematic diagram of a structure of a user value classifier according to one embodiment of the invention;
FIG. 4 is a schematic diagram of the main modules of a user classification apparatus according to one embodiment of the present invention;
FIG. 5 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 6 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram of the main steps of a user classification method according to an embodiment of the present invention.
As shown in fig. 1, the user classification method according to an embodiment of the present invention mainly includes the following steps S101 to S103.
Step S101: and determining the value classification labels of the users in the first user data set according to the behavior data of the users in the first user data set.
Step S102: and training a user value classifier by taking the user characteristics and the value classification labels of the first user data set as training data.
Step S103: and inputting the user characteristics of the second user data set into the trained user value classifier to calculate the goodness of fit of the user characteristics of the second user data set to each value classification, converting the calculated goodness of fit into the probability corresponding to a certain value class, and determining the value classification information of the users in the second user data set according to the probability.
In one embodiment, the first user data set is, for example, a set of registered users and the second user data set is, for example, a set of unregistered users.
The behavior data comprises various behavior indexes, such as duration of the latest behavior time from the current time, behavior frequency and behavior-related amount. Taking the case that the behavior data is consumption behavior data, the consumption behavior data comprises various consumption behavior indexes, such as the current time length of the last consumption time, consumption frequency, consumption amount and the like.
The value classification label of the user indicates the value category of the user, the value classification label can be specifically n-bit value category codes, n is the number of the value categories, and the value category codes can be in the form of one-hot codes (one-hot codes), label codes (label codes) and the like.
In one embodiment, determining a value classification label of a user in a first user data set according to behavior data of the user in the first user data set specifically includes: calculating at least one behavior index value of each user in the first user data set, and respectively calculating a mean value according to the types based on all the obtained behavior index values to obtain a mean value of each behavior index value; and determining the value classification label of the user in the first user data set by using a preset user value model according to the calculated behavior index value and the behavior index value mean value.
In one embodiment, the at least one behavioral indicator value includes a duration of a most recent time of the activity from a current time, a frequency of the activity, and a behavior-related amount. Taking a consumption behavior as an example, the time of the last behavior is far from the current time length, namely the time of the last consumption is far from the current time length, the frequency of behaviors is the consumption frequency, and the related amount of the behavior is the consumption amount. The consumption frequency specifically refers to the consumption frequency in the latest preset time period, and the consumption amount specifically refers to the average consumption amount of the user in the latest preset time period, which is equal to the total consumption amount of the user in the latest preset time period divided by the next amount of the user in the latest preset time period.
In the e-commerce platform, the behavior index value mean value can comprise a consumption amount mean value, a time length mean value of the latest consumption time from the current consumption time and a consumption frequency mean value. The average consumption amount may be a unit price of the platform in the latest preset time period, which is equal to a total consumption amount of the e-commerce platform in the latest preset time period divided by a total order placing amount of all paying users in the e-commerce platform in the latest preset time period. The preset time period may be defined according to the requirement. The average value of the time length from the last consumption time to the current time length is obtained by averaging the time length from the last consumption time to the current time length of all users of the e-commerce platform. The average value of the consumption frequency is obtained by averaging the consumption frequency of all users in the E-commerce platform in the latest preset time period.
Determining a value classification label of the user in the first user data set by using a preset user value model according to the calculated behavior index value and the behavior index value mean value, which may specifically include: comparing the current time length, behavior frequency and behavior related amount of the latest behavior time of each user in the first user data set with the corresponding behavior index value mean value respectively to determine the value evaluation level of each user in the first user data set in three aspects of the latest behavior, the behavior frequency and the behavior related amount; determining the value category of each user in the first user data set according to the value evaluation level of each user in the first user data set in the three aspects and the corresponding relationship between the value evaluation level combination and the value category in a preset user value model, wherein the value evaluation level combination is an ordered combination of the value evaluation levels in the three aspects; and coding the value category of each user in the first user data set to obtain the value classification label of the user in the first user data set.
In one embodiment, the user features of the second user data set are input into a trained user value classifier to calculate goodness of fit of the user features of the second user data set to each value classification, and the goodness of fit is converted into a probability corresponding to a certain value classification; and determining value classification information of the users in the second user data set according to the probability. The determined value classification information of the user is n value category codes, n is the number of the value categories, and taking one-hot codes as an example, the bit corresponding to '1' represents the value category of the user.
In one embodiment, the user characteristics of one user data set, which is either the first user data set or the second user data set, are extracted by: and coding one or more information of user equipment information, browser information, the area, occupation and cell price of the user in the user data set, and taking the obtained coding vector as the user characteristic of the user data set.
The user value classifier of the embodiment of the invention can be realized based on two or more layers of neural networks.
Fig. 2 is a schematic diagram of a user classification flow according to an embodiment of the present invention.
As shown in fig. 2, the user classification process according to the embodiment of the present invention is described by taking value classification of unregistered users of the e-commerce platform as an example. Wherein value classification of an unregistered user is performed, i.e. it is determined to which value class the unregistered user specifically belongs.
The problem to be solved by the embodiments of the present invention is how to determine the value of a small portion (e.g., 20-30%) of registered users of the e-commerce platform and an advertisement platform, which are usually only a small portion of coincident users, in other words, most of the contact users of the advertisement platform are not registered in the e-commerce platform and belong to unregistered users of the e-commerce platform.
According to the embodiment of the invention, the RFM value classification corresponding to the user characteristics of the unregistered user is deduced by training the correlation between the user characteristics of the registered user and the RFM (R means last consumption, F means consumption frequency, and M means consumption amount) value classification.
The RFM model is the most typical user value model, wherein R is the last consumption (recency), which represents the time length from the current time of the last consumption of the user, and the shorter the time length is, the better the time length is. F, i.e. frequency, is the number of times that the user has consumed within the last period of time, i.e. the preset time period, and the specific length is defined as required, for example, the last half year. M is the amount of consumption (monety) which represents the value contribution of the user, specifically, the average amount of consumption of the user in the last period is equal to the total amount of consumption of the user in the last period divided by the next amount of consumption of the user in the last period. The period of time, i.e., the above-mentioned predetermined period of time, is defined as the last half year, for example.
R, F, M are three consumption behavior indexes, which can be calculated for registered users in the e-commerce platform, and the average value of each consumption behavior index, namely R-average, F-average, and M-average, can be obtained by statistics according to the three consumption behavior index values of all registered users.
Three aspects R, F, M in the RFM model of the embodiment of the present invention may be defined as two high and low value rating levels, and then the value rating levels of R, F, M may have 2 × 2 × 2 ═ 8 combinations, i.e. 8 value rating level combinations, each value rating combination corresponds to one value category, i.e. there are 8 value categories in total, and the 8 value categories of the embodiment of the present invention include: important value users, important development users, important maintenance users, important saving users, general value users, general development users, general maintenance users, and general saving users. The correspondence between the value rating level combinations and the value categories in the RFM model is shown in table 1, wherein R, F, M in each row from row 2 to row 9 of table 1 is listed as a value rating level combination, for example, in the row corresponding to "important value user", the value rating level combination is: high; high; high.
TABLE 1
Value categories R (consumption last time) F (consumption frequency) M (consumption amount)
Important value user Height of Height of Height of
Important developing users Height of Is low in Height of
Important keeping user Is low in Height of Height of
Important saving user Is low in Is low in Height of
General value user Height of Height of Is low in
General development user Height of Is low in Is low in
General maintenance user Is low in Height of Is low in
General saving of users Is low in Is low in Is low in
If the R value of the user is smaller than the R mean value, the value evaluation grade in the aspect of R is high, otherwise, the value evaluation grade is low; if the F value of the user is larger than the F mean value, the value evaluation grade in the F aspect is high, otherwise, the value evaluation grade is low; if the user's M value is greater than the M mean, the value rating in terms of M is high, otherwise the value rating is low. For example, the time of the last consumption of a certain user is shorter than the R mean from the present time, the consumption frequency in the last half year is smaller than the F mean, and the average consumption amount in the last half year is larger than the M mean (namely the platform passenger unit price in the last half year, which is equal to the total consumption amount of the e-commerce platform in the last half year divided by the total order placing amount of all paid users of the e-commerce platform in the last half year)), then according to the table 1, the value evaluation grades of the user in R, F, M are respectively high, low and high, namely the user belongs to an important development user.
One-hot encoding (one-hot encoding) of 8 bits can be used: [ 01000000 ] represents the value category code for this user, where "1" represents that this user's value category is an important evolving user and "0" represents that this user is not the remaining 7 value categories. Each registered user corresponds to a value category of the RFM model and has a value category code, the set of all registered users is the first user data set, and then the value category code of each registered user is the value category label of the user, and is not the value category label for the unregistered user because the unregistered user does not have consumption behavior data on the e-commerce platform, such as, but not limited to, the three consumption behavior indexes mentioned above.
Both registered and unregistered users have user characteristics, and therefore, the definition of the user characteristics needs to characterize both registered and unregistered users. In the embodiment of the invention, the User characteristics of the unregistered User can be mined from the data collected by the exposure data, for the unregistered User, the e-commerce advertisement delivery system can establish the exposure data to collect the basic information of the User, and the basic information can comprise mobile phone equipment number information, User-Agent information (User-Agent namely User Agent, UA for short, which is a special character string header, so that a server can identify an operating system and version used by the client, a CPU type, a browser and version, a browser rendering engine, a browser language, a browser plug-in and the like) and latitude and longitude information of the User. The embodiment of the invention defines the user characteristics as 52 bits, and the user characteristics can be defined through the analysis of the model and the region of the mobile phone, and can be specifically defined according to the brand of the mobile phone, the price of the mobile phone, the occupation of the user, the region where the user is located, the price of the cell where the user is located and the like, which are respectively explained below.
Analyzing the model of the mobile phone:
the mobile phone brand takes 12 bits (including the most commonly used mobile phone brand in the market), and one-hot coding (also called single hot coding, which is simply a code system with a bit of 1 and all other bits of 0) is adopted. The model of the Mobile phone of the User can be cleaned through the User-Agent information, the User-Agent information Mozilla/5.0 (Linux; Android 8.1; EML-AL00 built/HEML-AL 00; wv) AppleWebKit/537.36(KHTML, like Gecko) Version/4.0 Chrome/53.0.2785.143 Crosswalk/24.53.595.0 XW EB/358 MMWEBSDK/23 Mobile Safari/537.36 MicroMessenger/6.7.2.1340(0x2607023A) type/4G Language/zh _ CN, the HEML-AL00 in the model indicates the model of the H-brand P20, and the model of the Mobile phone can be obtained and the brand of the Mobile phone is the H-brand.
The mobile phone prices account for 11 bits ([0,500), [500,1000), [1000,1500), [1500,2000), [2000,2500), [2500,3000), [3000,3500), [3500,4000), [4000,4500), [4500,5000), > (5000), and one-hot coding is also adopted. If the model of the mobile phone is known, the price of the mobile phone can be estimated, and if the model of the mobile phone has a plurality of memory versions, the median can be taken as the price of the mobile phone.
Analyzing the region:
the user's occupation takes 9 places (financial practitioner, medical staff, officer/institution, white-collar/general staff, worker/service staff, teacher, farmer, student, unidentified), again using one-hot coding. For registered users, the user occupation is easily obtained, and can be obtained in the user portrait model. For unregistered users, the exposure data can be mined, the latitude and longitude requested by the users each time can be recorded in the exposure data, the positions of the users (including residential areas, schools, hospitals, government agencies, work parks and the like, the error is within 100 meters through data analysis, and the like) can be calculated through the latitude and longitude, for example, when a certain user frequently appears in a hospital in the daytime, the professional industry of the user has a high probability of being medical personnel.
The area of the user is 9 bits (north China, east China, northeast China, south China, southwest, northwest China, Beijing and Shanghai), and one-hot coding is also adopted.
The price of the user in the cell is 11 bits (unit: per square meter, [0,5000), [5000,10000), [10000,15000), [15000,20000), [20000,25000), [25000,30000), [30000,35000), [35000,40000), [40000,45000), [45000,50000, ] ═ 50000), and one-hot coding is also adopted. Firstly, the cell where the user is located is obtained according to the longitude and latitude, and then the corresponding cell price can be obtained.
According to the above method, for example, for a certain user's mobile phone brand, mobile phone price, user's occupation, user's area, and user's cell price, respectively, corresponding to (P brand mobile phone, 3000, financial practitioner, beijing, > (50000)), the user's feature is defined as the following 52-bit code: [0100000000000000001000010000000000000001000000000001].
The user features and value classification labels of the first user data set (i.e. the set formed by registered users) are used as training data to train a user value classifier, as shown in fig. 2, that is, parameters W and b of the user value classifier are obtained through specific training, and then according to the user features of unregistered users (i.e. users in the second user data set), value classification information (RFM value classification) for unregistered use can be deduced by using the trained user value classifier.
The training process of the user value classifier is described in detail below.
Registered users will all correspond to one value category and value category label of the RFM model, but unregistered users will not have a value category label. To get evidence that a given user feature belongs to a certain value category, the user features may be summed up in a weighted manner. Thus for a given user feature x, it represents evidence E of a value classification iiCan be expressed as:
Figure BDA0002500381300000111
wherein WiRepresents a weight, biAn offset representing the value class i, j representing the index of a given user feature x, j being used for user feature summation. WiAnd biMay be determined by training a user value classifier. EiWhich may also be referred to as the goodness of fit of the user feature x to the value classification i.
These evidences can finally be converted into probabilities y with the softmax (normalized) function:
y=softmax(relu(Ei))
the relu, i.e., a Linear rectification function (Rectified Linear Unit), also called a modified Linear Unit, is an excitation function, which can convert the output of the defined Linear function into a required format by performing nonlinear transformation, i.e., into probability distributions of 8 value classes for the embodiment of the present invention.
Thus, given a user characteristic, the goodness of fit of the user characteristic to each value category may be calculated and converted by the softmax function described above into a probability value indicating the probability that the user belongs to the respective value category.
In order to improve the training efficiency, batch training may be adopted, the 100000 user feature vectors are input at a time, so the input may be defined as a two-dimensional matrix of [100000,52], and the output corresponding to each user is 8-bit one-hot value classification information, so the output is a two-dimensional matrix of [100000,8 ]. In the embodiment of the invention, a two-layer neural network is constructed, the user features are 52-bit vectors, so that the input layer comprises 52 neurons, the two-layer neural network has a hidden layer comprising 93 neurons (namely 2 multiplied by the number of neurons in the input layer +1), and the output layer is 8-bit value classification information, so that the output layer is 8 neurons.
After the training of the user value classifier is finished, training parameters can be obtainedW and b, in the two-layer neural network of the embodiment of the invention, each layer has corresponding training parameters, and each layer calculates E according to the way described abovei. Wherein W of the first layer is one [52,93 ]]B is a 93-bit one-dimensional vector. And W of the second layer is one [93, 8]]B is an 8-bit one-dimensional vector, i.e., the correlation of user features with the RFM value classification.
FIG. 3 is a schematic diagram of a structure of a user value classifier according to one embodiment of the invention. As shown in fig. 3, the user value classifier according to the embodiment of the present invention is a two-layer neural network, where relu is an activation function of a hidden layer, softmax is an activation function of an output layer, the input user characteristics include user characteristics obtained through mobile phone model analysis and user characteristics obtained through regional analysis, so as to obtain input 52-bit encoded user characteristics, corresponding to an input layer one-hot (52) in the diagram, relu (93) corresponds to the hidden layer, which includes 93 neurons, and softmax (8) corresponds to the output layer, which includes 8 neurons.
Since the unregistered user also has user characteristics, the user characteristics of the unregistered user can be input into the trained user value classifier to obtain the probability that each unregistered user belongs to each value category, and the RFM value classification of each unregistered user is determined according to the obtained probability and is 8-bit one-hot value classification information (8-bit one-hot code, wherein one bit equal to 1 indicates the value category corresponding to the unregistered user).
The one-hot encoding method in the embodiment of the present invention may be replaced by other encoding methods such as a label encoding method.
According to the user classification process, the deep abstract characteristics of the unregistered users can be mined, so that when the advertisement is put in (particularly, the advertisement is put in a new activity, namely, the unregistered users become registered users), the user selection can be performed in a targeted manner, the advertisement putting efficiency is improved, the new rate is improved, and the cost of service activities such as the new operation is reduced.
Fig. 4 is a schematic diagram of main blocks of a user classification apparatus according to an embodiment of the present invention.
As shown in fig. 4, the user classifying device 400 according to an embodiment of the present invention mainly includes: a label determination module 401, a training module 402, a classification determination module 403.
A tag determining module 401, configured to determine a value classification tag of a user in the first user data set according to behavior data of the user in the first user data set.
A training module 402, configured to train a user value classifier using the user features and the value classification labels of the first user data set as training data.
The classification determining module 403 is configured to input the user features of the second user data set into the trained user value classifier, to calculate an goodness of fit of the user features of the second user data set to each value classification, and convert the calculated goodness of fit into a probability corresponding to a certain value class, so as to determine the value classification information of the user in the second user data set according to the probability.
The tag determination module 401 may be specifically configured to: calculating at least one behavior index value of each user in the first user data set, and respectively calculating a mean value according to the types based on all the obtained behavior index values to obtain a mean value of each behavior index value; and determining the value classification label of the user in the first user data set by using a preset user value model according to the calculated behavior index value and the behavior index value mean value.
The at least one behavior index value comprises a duration of the last behavior time from the current time, a behavior frequency and a behavior related amount.
The tag determination module may include a value classification tag determination sub-module to: comparing the current time length, behavior frequency and behavior related amount of the latest behavior time of each user in the first user data set with the corresponding behavior index value mean value respectively to determine the value evaluation level of each user in the first user data set in three aspects of the latest behavior, the behavior frequency and the behavior related amount; determining the value category of each user in the first user data set according to the value evaluation level of each user in the first user data set in the three aspects and the corresponding relationship between the value evaluation level combination and the value category in a preset user value model, wherein the value evaluation level combination is an ordered combination of the value evaluation levels in the three aspects; and coding the value category of each user in the first user data set to obtain the value classification label of the user in the first user data set.
The user classifying device 400 may further include a user feature extracting module for extracting a user feature of one user data set, the user data set being a first user data set or a second user data set, by: and coding one or more information of user equipment information, browser information, the area, occupation and cell price of the user in the user data set, and taking the obtained coding vector as the user characteristic of the user data set.
The user value classifier of the embodiment of the invention can be realized based on two or more layers of neural networks.
In addition, the detailed implementation of the user classifying device in the embodiment of the present invention has been described in detail in the above user classifying method, so that repeated descriptions are not repeated here.
Fig. 5 illustrates an exemplary system architecture 500 to which the user classification method or the user classification apparatus of the embodiments of the present invention may be applied.
As shown in fig. 5, the system architecture 500 may include terminal devices 501, 502, 503, a network 504, and a server 505. The network 504 serves to provide a medium for communication links between the terminal devices 501, 502, 503 and the server 505. Network 504 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 501, 502, 503 to interact with a server 505 over a network 504 to receive or send messages or the like. The terminal devices 501, 502, 503 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 501, 502, 503 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 505 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the terminal devices 501, 502, 503. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.
It should be noted that the user classification method provided by the embodiment of the present invention is generally executed by the server 505, and accordingly, the user classification apparatus is generally disposed in the server 505.
It should be understood that the number of terminal devices, networks, and servers in fig. 5 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 6, shown is a block diagram of a computer system 600 suitable for use in implementing a terminal device or server of an embodiment of the present application. The terminal device or the server shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The above-described functions defined in the system of the present application are executed when the computer program is executed by the Central Processing Unit (CPU) 601.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a label determination module, a training module, and a classification determination module. Where the names of these modules do not in some cases constitute a limitation on the module itself, for example, the tag determination module may also be described as "a module for determining a value classification tag for a user in a first user data set based on behavioral data of the user in the first user data set".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: determining a value classification label of a user in a first user data set according to behavior data of the user in the first user data set; training a user value classifier by taking the user characteristics of the first user data set and the value classification label as training data; inputting the user characteristics of a second user data set into the trained user value classifier to calculate the goodness of fit of the user characteristics of the second user data set to each value classification, converting the goodness of fit into the probability corresponding to a certain value class, and determining the value classification information of the users in the second user data set according to the probability.
According to the technical scheme of the embodiment of the invention, the value classification labels of the users in the first user data set are determined according to the behavior data of the users in the first user data set; training a user value classifier by taking the user characteristics and the value classification labels of the first user data set as training data; and inputting the user characteristics of the second user data set into the trained user value classifier to calculate the goodness of fit of the user characteristics of the second user data set to each value classification, converting the calculated goodness of fit into the probability corresponding to a certain value class, and determining the value classification information of the users in the second user data set according to the probability. Deep abstract features of unregistered users can be mined, so that business activities have pertinence, the execution efficiency of the business activities is improved, and the activity cost is reduced.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (11)

1. A method for classifying a user, comprising:
determining a value classification label of a user in a first user data set according to behavior data of the user in the first user data set;
training a user value classifier by taking the user characteristics of the first user data set and the value classification label as training data;
inputting the user characteristics of a second user data set into the trained user value classifier to calculate the goodness of fit of the user characteristics of the second user data set to each value classification, converting the goodness of fit into the probability corresponding to a certain value class, and determining the value classification information of the users in the second user data set according to the probability.
2. The method of claim 1, wherein determining a value category label for a user in a first user data set based on behavioral data of the user in the first user data set comprises:
calculating at least one behavior index value of each user in the first user data set, and calculating a mean value according to categories based on all the obtained behavior index values to obtain a mean value of each behavior index value;
and determining a value classification label of the user in the first user data set by using a preset user value model according to the calculated behavior index value and the behavior index value mean value.
3. The method of claim 2, wherein the at least one behavioral indicator value includes a duration of a most recent time of the behavior from a current time, a frequency of the behavior, and a behavior-related amount;
the determining, according to the calculated behavior index value and the behavior index value mean, a value classification label of the user in the first user data set by using a preset user value model includes:
comparing the current time length of the last action time of each user in the first user data set, the action frequency and the action related amount with corresponding action index value means respectively to determine the value evaluation level of each user in the first user data set in three aspects of the last action, the action frequency and the action related amount;
determining the value category of each user in the first user data set according to the value evaluation level of each user in the first user data set in the three aspects and the corresponding relationship between the value evaluation level combination and the value category in the preset user value model, wherein the value evaluation level combination is an ordered combination of the value evaluation levels in the three aspects;
and coding the value category of each user in the first user data set to obtain the value classification label of the user in the first user data set.
4. The method according to claim 1, characterized in that the user characteristics of one user data set, being the first user data set or the second user data set, are extracted by:
and coding one or more information of user equipment information, browser information, the area, occupation and cell price of the user in the user data set, and taking the obtained coding vector as the user characteristic of the user data set.
5. The method of claim 1, wherein the user value classifier is implemented based on two or more layers of neural networks.
6. A user classifying apparatus, comprising:
the tag determination module is used for determining value classification tags of users in a first user data set according to behavior data of the users in the first user data set;
the training module is used for training a user value classifier by taking the user characteristics of the first user data set and the value classification label as training data;
and the classification determining module is used for inputting the user characteristics of the second user data set into the trained user value classifier so as to calculate the goodness of fit of the user characteristics of the second user data set to each value classification, converting the goodness of fit into the probability corresponding to a certain value class, and determining the value classification information of the users in the second user data set according to the probability.
7. The apparatus of claim 6, wherein the tag determination module is further configured to:
calculating at least one behavior index value of each user in the first user data set, and calculating a mean value according to categories based on all the obtained behavior index values to obtain a mean value of each behavior index value;
and determining a value classification label of the user in the first user data set by using a preset user value model according to the calculated behavior index value and the behavior index value mean value.
8. The apparatus according to claim 7, wherein the at least one behavior index value comprises a duration of a most recent behavior time from a current time, a frequency of the behavior, and a behavior-related amount;
the tag determination module includes a value classification tag determination sub-module to:
comparing the current time length of the last action time of each user in the first user data set, the action frequency and the action related amount with corresponding action index value means respectively to determine the value evaluation level of each user in the first user data set in three aspects of the last action, the action frequency and the action related amount;
determining the value category of each user in the first user data set according to the value evaluation level of each user in the first user data set in the three aspects and the corresponding relationship between the value evaluation level combination and the value category in the preset user value model, wherein the value evaluation level combination is an ordered combination of the value evaluation levels in the three aspects;
and coding the value category of each user in the first user data set to obtain the value classification label of the user in the first user data set.
9. The apparatus of claim 6, further comprising a user feature extraction module configured to extract a user feature of a user data set, wherein the user data set is the first user data set or the second user data set, by:
and coding one or more information of user equipment information, browser information, the area, occupation and cell price of the user in the user data set, and taking the obtained coding vector as the user characteristic of the user data set.
10. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-5.
11. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-5.
CN202010430462.2A 2020-05-20 2020-05-20 User classification method and device Pending CN113706182A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010430462.2A CN113706182A (en) 2020-05-20 2020-05-20 User classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010430462.2A CN113706182A (en) 2020-05-20 2020-05-20 User classification method and device

Publications (1)

Publication Number Publication Date
CN113706182A true CN113706182A (en) 2021-11-26

Family

ID=78645617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010430462.2A Pending CN113706182A (en) 2020-05-20 2020-05-20 User classification method and device

Country Status (1)

Country Link
CN (1) CN113706182A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114596097A (en) * 2022-05-10 2022-06-07 富算科技(上海)有限公司 User identification method, device, electronic equipment and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491985A (en) * 2017-08-01 2017-12-19 携程旅游网络技术(上海)有限公司 The user's methods of marking and device of electric business platform, electronic equipment, storage medium
CN107729469A (en) * 2017-10-12 2018-02-23 北京小度信息科技有限公司 Usage mining method, apparatus, electronic equipment and computer-readable recording medium
CN109325640A (en) * 2018-12-07 2019-02-12 中山大学 User's Value Prediction Methods, device, storage medium and equipment
US20190114677A1 (en) * 2017-10-13 2019-04-18 Yahoo Holdings, Inc. Systems and Methods for User Propensity Classification and Online Auction Design
CN110689355A (en) * 2019-09-03 2020-01-14 浙江数链科技有限公司 Client classification method, device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491985A (en) * 2017-08-01 2017-12-19 携程旅游网络技术(上海)有限公司 The user's methods of marking and device of electric business platform, electronic equipment, storage medium
CN107729469A (en) * 2017-10-12 2018-02-23 北京小度信息科技有限公司 Usage mining method, apparatus, electronic equipment and computer-readable recording medium
US20190114677A1 (en) * 2017-10-13 2019-04-18 Yahoo Holdings, Inc. Systems and Methods for User Propensity Classification and Online Auction Design
CN109325640A (en) * 2018-12-07 2019-02-12 中山大学 User's Value Prediction Methods, device, storage medium and equipment
CN110689355A (en) * 2019-09-03 2020-01-14 浙江数链科技有限公司 Client classification method, device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张路中: "有效的88种营销分析工具", 30 June 2010, 企业管理出版社, pages: 175 - 177 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114596097A (en) * 2022-05-10 2022-06-07 富算科技(上海)有限公司 User identification method, device, electronic equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN112148987B (en) Message pushing method based on target object activity and related equipment
CN106997549A (en) The method for pushing and system of a kind of advertising message
WO2017190610A1 (en) Target user orientation method and device, and computer storage medium
CN112118551B (en) Equipment risk identification method and related equipment
CN113254542B (en) Data visualization processing method and device and electronic equipment
CN106535129A (en) Method and apparatus for counting mobile devices, and calculation device
CN111966730A (en) Risk prediction method and device based on permanent premises and electronic equipment
CN116701584A (en) Intelligent question-answering method and device based on electricity user portrait and electronic equipment
CN113780329A (en) Method, apparatus, server and medium for identifying data anomalies
CN112016793A (en) Target user group-based resource allocation method and device and electronic equipment
CN113706182A (en) User classification method and device
CN115204881A (en) Data processing method, device, equipment and storage medium
CN117391866A (en) Data processing method, device, equipment and storage medium thereof
CN110119784B (en) Order recommendation method and device
CN116450723A (en) Data extraction method, device, computer equipment and storage medium
CN112348661B (en) Service policy distribution method and device based on user behavior track and electronic equipment
CN114663149A (en) Product delivery method based on privacy protection and related equipment thereof
CN114022184A (en) Data management method and device, electronic equipment and storage medium
CN114066603A (en) Post-loan risk early warning method and device, electronic equipment and computer readable medium
CN114154052A (en) Information recommendation method and device, computer equipment and storage medium
CN113704407A (en) Complaint amount analysis method, device, equipment and storage medium based on category analysis
CN113220947A (en) Method and device for encoding event characteristics
CN109150934B (en) Information pushing method and device
CN113392203B (en) Intelligent question-answering method, intelligent question-answering device, electronic equipment and computer readable storage medium
CN114757541B (en) Performance analysis method, device, equipment and medium based on training behavior data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination