CN118115237A

CN118115237A - User tag prediction model construction method and user tag prediction method

Info

Publication number: CN118115237A
Application number: CN202410235026.8A
Authority: CN
Inventors: 王亚辉; 梁洲硕
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2024-03-01
Filing date: 2024-03-01
Publication date: 2024-05-31

Abstract

The invention discloses a user tag prediction model construction method and a user tag prediction method, wherein the construction method comprises the following steps: acquiring feature data of a research object, wherein the feature data comprises behavior data and attribute information of the research object; carrying out standardized processing on the characteristic data; and constructing a prediction model of the research object based on the feature data after the standardized processing. According to the invention, a naive Bayesian algorithm is utilized to construct attribute models such as a sex model, a city level model and the like of the research object, and behavior models such as an liveness model, a consumption capability model, a value model and the like, so that labels such as sex, city level, liveness, consumption capability, value and the like of the research object can be output by utilizing a prediction model, and finally, the accurate prediction of various label information of the research object is realized, the label information of the research object is perfected, and a scientific basis is provided for the accurate deep understanding of the research object.

Description

User tag prediction model construction method and user tag prediction method

Technical Field

The invention relates to the technical field of computers, in particular to a user tag prediction model construction method and a user tag prediction method.

Background

With the rise of short videos, the demand of users for shopping on an e-commerce platform is continuously increased, data can be generated by logging in, browsing, clicking and consulting of the users, the deep analysis of the data is the original purpose of constructing user figures, only the effective analysis of each behavior index of the users is carried out, different user attributes and behavior models are constructed, different kinds of user labels are generated, the user figures can help enterprises to accurately position target users, the user preference characteristics are grasped, the competitiveness of self products and services is improved, and a bridge is built for improving commercial value and social value of companies and indicating the advancing direction. The user labels are enriched, embodied and systemized, so that enterprises can intuitively and deeply understand clients, find the demands of the users and make targeted adjustment.

The existing method for determining consumer user portrait labels is mostly determined by basic information and simple data statistics of consumers, and the prediction of missing user portrait information is not accurate enough, so that the effects of enterprise advertising and marketing activities are affected, and waste of manpower and material resources is caused.

Disclosure of Invention

The invention solves the problem of accurately predicting the portrait information of the user.

In order to solve the above problems, in a first aspect, the present invention provides a method for constructing a user tag prediction model, including:

acquiring feature data of a research object, wherein the feature data comprises behavior data and attribute information of the research object;

carrying out standardization processing on the characteristic data;

constructing a prediction model of the research object based on the standardized feature data, wherein the prediction model comprises an attribute model and a behavior model, the attribute model is used for outputting an attribute tag of the research object, and the behavior model is used for outputting a behavior tag of the research object;

The attribute models include a gender model and a city level model, and the behavior models include an liveness model, a consumption capability model and a value model.

Optionally, in the method for constructing the user tag prediction model provided by the invention, the output tag of the gender model comprises a first gender, a second gender and an unknown gender; the output labels of the city level model comprise a first line city, a second line city, a third-fourth line city and a fifth-sixth line city; the output label of the jump model comprises only registration, activity, falling asleep and loss; the output label of the consumption capability model comprises high, medium and low; the output labels of the value model include important maintenance, important development, important value, important saving, general importance, general object, general saving and no value.

Optionally, in the method for constructing a prediction model of a user tag provided by the present invention, the constructing a prediction model of the research object based on the feature data after the normalization processing includes:

Determining an entropy value corresponding to each feature data after the normalization processing;

According to the entropy value, determining sample characteristic data for training in the characteristic data;

training the sample characteristic data by using a naive Bayesian algorithm, and determining posterior probability corresponding to the sample data, wherein the posterior probability is used for determining an output label of the prediction model.

Optionally, in the method for constructing the user tag prediction model, the sample feature data corresponding to the gender model includes the number of purchase of the male feature class orders, the number of purchase of the female feature class orders, the number of browsing the male feature class, the number of browsing the female feature class, the duration of browsing the male feature class and/or the duration of browsing the female feature class.

Optionally, in the method for constructing the user tag prediction model provided by the invention, the sample characteristic data corresponding to the city level model includes a receiving address, a login IP address, a coupon record and/or an order record.

Optionally, in the method for constructing the user tag prediction model, sample feature data corresponding to the consumption capability model includes a minimum consumption record value, a maximum consumption record value, an accumulated consumption record value, a consumption number, voucher practical record information, a consumption number in a preset time period and/or an object of interest in the preset time period.

Optionally, in the method for constructing the user tag prediction model provided by the invention, the sample characteristic data corresponding to the liveness model includes a consumption record of a first preset time period and a consumption record of a second preset time period.

Optionally, in the method for constructing a user tag prediction model provided by the present invention, when the prediction model is the user value model, determining the output tag of the prediction model according to the posterior probability includes:

Based on the RFM model, determining an output label of the prediction model according to the posterior probability.

Optionally, the method for constructing the user tag prediction model provided by the invention includes:

Taking a logarithmic value for the characteristic data;

And sequentially carrying out deviation standardization processing and normalization standardization processing on the logarithmic value corresponding to the characteristic data.

In a second aspect, the present invention provides a user tag prediction method, the method comprising:

carrying out standardization processing on the characteristic data;

Inputting the standardized characteristic data into a pre-constructed prediction model, and outputting a label of the research object, wherein the label represents an attribute characteristic or a behavior characteristic of the research object, the prediction model comprises an attribute model and a behavior model, the attribute model is used for determining the attribute label of the research object, and the behavior model is used for determining the behavior label of the research object;

According to the user tag prediction model construction method and the user tag prediction method, the feature data of the research object are obtained, the obtained feature data are subjected to standardized processing, the standardized feature data are processed by means of a naive Bayesian algorithm, so that attribute models such as a sex model, a city level model and the like of the research object, and behavior models such as an liveness model, a consumption capability model, a value model and the like are constructed, attribute tags such as the sex, the city level and the like of the research object can be output by means of the constructed attribute models, and tags such as the liveness, the consumption capability and the value of the research object are output by means of the constructed behavior models, so that accurate prediction of various tag information of the research object is finally achieved, tag information of the research object is perfected, scientific basis is provided for accurate deep understanding of the research object, and accurate picture drawing of the research object can be finally achieved.

Drawings

FIG. 1 is a flow chart of a method for constructing a user tag model according to some embodiments of the present invention;

FIG. 2 is a flow chart of a method for constructing a user tag model according to still other embodiments of the present invention;

FIG. 3 is a flowchart illustrating a user tag prediction method according to some embodiments of the present invention;

Fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the invention are shown in the drawings.

It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other. The invention will be described in detail below with reference to the drawings in connection with embodiments.

It can be understood that in the embodiment of the invention, the information of the user image is perfected by means of various user portrait labels, namely, the specific prediction of the missing user labels based on a naive Bayesian algorithm is realized by constructing a prediction model of the labels of the user portrait.

In order to better understand the construction of the user tag prediction model and the user tag prediction provided by the embodiment of the invention, the details are described below through the attached drawings.

The method may be performed by a computer device, as shown in fig. 1, and specifically includes:

S110, acquiring feature data of the research object, wherein the feature data comprises behavior data and attribute information of the research object.

And S120, performing standardization processing on the characteristic data.

S130, constructing a prediction model of the research object based on the feature data after the standardized processing, wherein the prediction model comprises an attribute model and a behavior model, the attribute model is used for outputting an attribute tag of the research object, and the behavior model is used for outputting a behavior tag of the research object; the attribute models include a gender model and a city level model, and the behavior models include an liveness model, a consumption capability model and a value model.

Specifically, in the embodiment of the invention, in order to accurately locate the user to construct the prediction model describing the tag, the feature data of the user, namely the research object, can be acquired first.

The feature data may include attribute information, behavior information, etc. of the user.

For example, the characteristic data may include login related information, consumption record information, and the like.

Further, after the feature data is obtained, in order to achieve the output accuracy of the prediction model and the training efficiency, the obtained feature data may be subjected to standardized processing, so as to reduce the feature data to a consistent dimension.

Finally, after the normalization processing is completed, training processing can be performed on the feature data after the normalization processing by using a naive Bayesian algorithm so as to construct a prediction model of the user tag.

Specifically, an attribute model and a behavior model of the user tag can be constructed. The attribute model may be used to output attribute tags for users, i.e., study objects, and the behavior model is used to output behavior tags for users.

In practice, the attribute model includes a gender model, a city level model, and the like, and the behavior model includes an liveness model, a consumption capability model, a value model, and the like.

For example, the output tag of the gender model may include a first gender, a second gender, and an unknown gender; the output labels of the city level model may include a first line city, a second line city, a third to fourth line city, and a fifth to sixth line city; the output labels of the jerk model may include register, active, sleep, and attrition only; the output labels of the consumption capability model may include high, medium, and low; the output labels of the value model may include important maintenance, important development, important value, important saving, general importance, general object, general saving, and no value.

It will be appreciated that the definition of the tag described above may be determined according to practical situations, and embodiments of the present invention are not limited thereto.

It can be appreciated that, according to the method for constructing the user tag prediction model provided by the embodiment of the invention, the feature data of the research object is obtained, the obtained feature data is standardized, and the standardized feature data is processed by using a naive bayes algorithm, so as to construct attribute models such as a sex model, a city level model and the like of the research object, and behavior models such as an liveness model, a consumption capability model, a value model and the like, so that attribute tags such as the sex, the city level and the like of the research object can be output by using the constructed attribute models, and tags such as the liveness, the consumption capability, the value and the like of the research object are output by using the constructed behavior models, so that the accurate prediction of various tag information of the research object is finally realized, the tag information of the research object is perfected, a scientific basis is provided for the accurate deep understanding of the research object, and the accurate depiction of the picture of the research object can be finally realized.

Optionally, in some embodiments of the present invention, in S110, when performing normalization processing on the acquired feature data, the method specifically includes the following steps:

S01, taking a logarithmic value for the characteristic data.

S02, performing dispersion normalization processing and normalization processing on the pair of values corresponding to the characteristic data in sequence.

Specifically, in the embodiment of the present invention, after the cleaning, conversion and deduplication of the acquired source data, that is, the feature data of the user, in order to facilitate the subsequent data processing operation, the feature data needs to be generally standardized.

It will be appreciated that this operation aims to solve the problem of large differences in span of data which may be caused by differences in dimensions or dimensions of the data. If the normalization processing is not performed, the data with different dimensions or dimensions will cause uneven data weight distribution in different intervals, which is not only unfavorable for statistical analysis, but also increases the calculation burden of the algorithm.

In practice, first, the feature data may be normalized, where the normalization formula is as follows:

Wherein x _ij is the j index value of the i users of the original data, namely the characteristic data value, and then the data is subjected to dispersion normalization processing, as shown in the following

Where x '_ij is the value corresponding to the j index in the i-th user after taking the logarithm, and x' _j is the value corresponding to the j index in each user.

At this point, the new data sequence x' _ij E [0,1] has no dimension.

Further, the data sequence may be normalized, and the conversion formula is as follows:

wherein the new sequence x' _ij obtained has a mean equal to 0 and a variance equal to 1.

Optionally, as shown in fig. 2, in some embodiments of the present invention, in S130, the normalized feature data is trained by using a naive bayes algorithm, so as to construct a prediction model, specifically, the following steps may be adopted:

S131, determining an entropy value corresponding to each feature data after the normalization processing.

And S132, determining sample characteristic data for training in the characteristic data according to the entropy value.

And S133, training the sample characteristic data by using a naive Bayesian algorithm, and determining a posterior probability corresponding to the sample data, wherein the posterior probability is used for determining an output label of the prediction model.

Specifically, in the embodiment of the invention, if the acquired data volume of the feature data is large, in order to improve the processing efficiency, the feature data may be screened by using an entropy method to determine key data affecting the user tag from the feature data, that is, key index dimensions are screened out from all index dimensions corresponding to the feature data and used as sample feature data.

In addition, the large data size may have a certain influence on the training result of the naive bayes algorithm due to the large index dimension. Therefore, after the data preprocessing is completed, different index dimensions can be calculated by adopting an entropy method so as to realize data screening, and sample characteristic data for model construction is obtained.

In practice, the entropy value corresponding to each feature data is first determined, that is, the probability of occurrence of x _ij in the feature J is determined by the following formula:

calculating an entropy value e _j of the feature j:

Further, after the entropy value of each feature data is determined, the information amount represented is smaller as the entropy value is larger; the smaller the entropy value is, the larger the information amount is represented, so when the naive Bayesian algorithm is used for training the sample, some indexes with relatively smaller values are selected to be used as index dimensions for predicting the gender of the user, namely, characteristic data with smaller entropy values are determined to be used as sample characteristic data in the subsequent model construction.

And finally, summarizing all the determined sample characteristic data, namely, eight parts into two parts, training eighty percent of sample data to generate a classifier, taking the rest twenty percent of data as gender prediction data, and constructing a prediction model by using a naive Bayesian algorithm, namely, determining the posterior probability corresponding to the sample data by using the naive Bayesian algorithm.

The posterior probability may be used to determine an output label of the predictive model. The value range of the posterior probability can be preset according to experience and other modes, and corresponds to the output tag value one by one, so that the corresponding tag information can be output after the prediction model is constructed and the posterior probability is determined.

In order to better understand the construction of each behavior model and attribute model, the following describes each model in detail by taking an application scenario of an e-commerce website as an example.

Alternatively, in some embodiments, for the gender model, in the e-commerce website, knowledge of the user's gender information is critical because of the large differences in shopping needs, interest preferences, and shopping frequency for different genders. Thus, the gender label is a key label that is not negligible when constructing a user representation. However, since the user may fill out wrong sex information, not fill out sex information, or often use a family shared account at the time of registration, it is difficult for the website to accurately distinguish the sexes of some users. In view of this, the best approach is currently to predict the gender of the user based on their behavior on the website.

We construct a gender model based on the user's browsing and purchasing behavior. Because of different needs of men and women, men generally prefer to search for goods such as electronic products, men's wear, men's shoes, shavers, belts, etc., and women generally search for goods such as skin care products, high-heeled shoes, women's wear, etc. From the user's search and click records, we use a modified naive Bayesian classification algorithm based on the EM improvement algorithm to predict the gender of the user. Sex classification dimensions include male, female, and unknown sexes. To better understand the user, we also add the following index dimensions. These metrics help to further study the user's features to more fully construct a user representation:

That is, the outputted tag information may include a first gender (e.g., female), a second gender (e.g., male), a third gender (e.g., unknown), etc. Correspondingly, the determined index dimension, i.e. the sample feature data, may comprise: purchasing a number of male feature class orders, purchasing a number of female feature class orders, browsing a number of male feature classes, browsing a number of female feature classes, browsing a time period for male feature classes, and/or browsing a time period for female feature classes, and the like.

Further, in order to avoid the influence of the association relationship between the source data indexes on the sex, the data may be first subjected to normalization processing. And further, different index dimensions are calculated by adopting an entropy method.

It can be understood that, for different features x, after training by using the algorithm, the probability of occurrence of male is defined as P (x _i∣y₁), the probability of occurrence of female is defined as P (x _i∣y₂), and the prior probability is defined as P (y _j), so that in order to observe the required data information, the data can be smoothed:

In the above formula, j=1, 2,3, and numerator N _yj is represented by the sum of the data values of y _j, and N in the denominator represents the sum of the sample numbers.

The above formula i=1, 2 …, n _yj,xi is typically the sum of the number of samples with dimension i of eigenvalue xi and n of the dimension of eigenvalue x in the y _j data, respectively.

After calculation by the above formula, the prior probability P (y _j) is generated, and the posterior probability is calculated by using a naive bayes algorithm as follows:

It can be understood that, in the case where the posterior probability of the gender y ₁ is P1 (y ₁ |x) and the posterior probability of the gender y ₂ is P1 (y ₂ |x) are known, respectively, the ratio of the probabilities of the two sexes is theoretically:

When R >1 is known by analysis, the male probability is larger than that of females, and the model predicts that the gender of the user is male; when R <1, then the male probability is less than female, and the model predicts that the user's gender is female.

Optionally, in some embodiments of the present invention, for the city level model, such as in the e-commerce website scenario, the city in which the user is located is also an important dimension, because the shopping needs of users in different cities vary greatly. For example, southern cities may not require as much down jackets as northern cities, while cities near the sea may purchase seafood more frequently. Since the IP address of the network generally corresponds to a city, we can generally obtain the city information of the user through the IP address and the receiving address.

For users lacking city information, we can also use naive bayes algorithm to predict cities. The construction of this model still uses a naive bayes algorithm to predict the city in which the user is located based on the user's behavior and other information. The city level classification dimension includes different city names, and to more fully understand the user, we add the following index dimension, which helps to better understand the geographic location information and shopping behavior of the user, thereby more fully building the user representation:

That is, the output labels of the city level model may include first line cities, second line cities, third-fourth line cities, and fifth-sixth line cities, and the corresponding determined index dimensions, that is, the sample feature data may include a shipping address, a login IP address, a coupon usage record, and/or a order placement record, etc.

Specifically, similar to the gender model in the above embodiment, the data is firstly standardized, then different index dimensions are calculated through an entropy method, and finally the posterior probability is calculated through an algorithm.

The formula of the posterior probability is as follows:

Wherein j has values of 1,2,3,4 and P2 (y _j |x), the probability of different grades of the city where the user is located, P2 (y ₁ |x) refers to the probability of the city where the user is located being a line, and P2 (y ₂ |x) refers to the probability of the city where the user is located being a two-line city. The method can analyze, after the probability of the city of the user in different lines is calculated, the maximum value of a plurality of probabilities is taken, and the represented city is the city of the user predicted by the data sample through an algorithm.

Optionally, in some embodiments of the present invention, for each prediction model in the behavior model, such as for user liveness model construction, under the electronic market scenario, the frequency of shopping on the electronic commerce website by the user, the browsing time, and the number of clicks may be collectively referred to as user liveness, where the high and low of the user liveness indicate the dependency and acceptance degree on the electronic commerce platform, where when the user liveness model construction is performed, the states of the user are divided into four types: registering non-purchased, active, lost and asleep, wherein the reasons for generating registration but non-purchased are third party software login, such as WeChat, QQ and the like, and the website or APP is exited without purchasing behavior after registration, wherein the frequency of purchasing goods by a user in a first preset time period, such as the last 60 days, can be divided into high frequency, medium frequency and low frequency according to the main observation of the user liveness, and the method is expressed by a formula:

In the above formula, F represents the frequency of purchasing goods by the user in the website, D ₆₀ represents the number of days of purchasing behavior generated in the last 60 days, and I ₆₀ represents the maximum number of days of purchasing goods twice in the last two months.

The larger the F value is, the more times the user purchases goods in the last two months is indicated, and the higher the user liveness is. Conversely, a smaller value of F indicates that the user has less frequent purchases of the good in the last two months and that the user has less liveness. Different companies may define the high, medium and low frequencies in different ways, for example, an F value range between 0 and 0.5 is defined as low frequency, an F value range between 0.5 and 0.8 is defined as medium frequency, and more than 0.8 is defined as high frequency. Four classifications are used for dividing the user liveness, and only users which are not purchased are registered as only registration; the definition purchased in the last two months is active: the purchase of the merchandise was made in nearly 90 days, but no record of the purchase of the merchandise was defined as asleep in nearly two months, and the user who had purchased the merchandise but did not purchase the merchandise in nearly 90 days was defined as churn.

That is, the output labels of the liveness model may include register, liveness, sleepiness, and chum only; the index dimension, i.e. the sample feature data, may comprise a first predetermined time period consumption record and a second predetermined time period consumption record.

Such as after the purchase in the last 60 days and after the purchase in the last 90 days, after the purchase in the last 60 days, etc.

Alternatively, in some embodiments of the present invention, for the consumer capability model, to estimate the consumer capability of the user, the analysis may typically be performed by analyzing data on the price and frequency of purchase of the commodity by the consumer. Firstly, commodities are divided into different categories according to information such as consumption habits and purchase times of users, so that purchasing capability and characteristics of the users can be better known. The consumer is then labeled to learn the purchasing capabilities and interest characteristics of each individual.

In order to more accurately understand the purchasing power and interests of users, most recent consumption situation data is relied on, and through analysis of consumer groups, commodities which each user may be interested in and have the ability to purchase are obtained. Although various data types have been collected to meet the needs of different contexts, errors in describing the consumer may still exist. Thus, based on the existing data, we can further refine feature dimensions, including "minimum amount consumed", "maximum amount consumed", and "number of consumption", etc., to further analyze the actual purchasing power of the user. These feature dimensions help to more accurately assess the consumer's ability to consume and purchase preferences.

That is, the output labels of the predictive model may include important maintenance, important development, important value, important saving, general importance, general subject, general saving, and no value.

Correspondingly, the index dimension can comprise the minimum consumption amount, the maximum consumption amount, the accumulated consumption number (without rejection), the accumulated consumption amount (without rejection), the accumulated using voucher amount, the accumulated using voucher number, the latest consumption number, the latest 30-day shopping cart commodity number and the like.

Specifically, on the basis of the index dimension, the capability level of the purchasing dynamics of the consumers is divided by using a naive Bayesian algorithm, and the consumer capabilities of the consumers are respectively reflected by 'low', 'medium', 'high'. The calculation formula is as follows:

Wherein, the value range of j in the above formula is 1 to 4, P3 (y _j |x) refers to the probability of high, low and medium consumption ability of the user in the e-commerce website, and the situations that the value of j is 1, 2 and 3 after the probability of P3 (y _j |x) is calculated respectively, the maximum value is taken, and the class of j corresponding to the maximum value is the consumption ability of the user.

Optionally, in some embodiments of the invention, for user value model construction:

it will be appreciated that the purpose of this model is to evaluate the importance of users from a more intuitive perspective in order to measure the consumption value of different user groups on enterprise products, so as to implement corresponding strategies that avoid losing important customers in general.

In some embodiments, the RFM model may be used to calculate the value of the user, divide the user into intervals of value of varying degrees, and record consumer information for each interval.

Wherein, the construction of the RFM model involves three elements:

r (recovery): reflecting the current interval from the time the user last purchased the merchandise. The smaller this value, the closer the user's most recent purchase time, and vice versa, the larger the user's purchase time is, the longer from the current time.

F (Frequency): representing the number of purchases by the consumer over a period of sample data collection. The more purchases, the greater the F value and vice versa.

M (monnetary): representing the total amount of consumption of the user over the period of sample data collection. The higher the consumption amount, the greater the M value.

These data can be processed in half, i.e., there are two possible values for each piece of data, symbiosis into 8 states. The state division is helpful for better knowing the consumption behaviors and the value of different users, and further makes corresponding strategies and strategies so as to meet the requirements of different users, improve the user satisfaction and reserve important clients.

Correspondingly, the corresponding relation between the index dimension and the output label is shown in the following table:

Solution of posterior probability using naive bayes:

The value range of j in the formula is 1 to 8, P4 (y _j |x) refers to the probability of the user value, the probability is calculated by using a naive Bayes formula, the result respectively represents the probability of the user in the y _j state, and finally the index dimension corresponding to the maximum probability is used as the representation of the user value.

It will be appreciated that the output labels and the index dimensions in the foregoing embodiments may be adjusted according to practical situations, which is not limited in this embodiment of the present invention.

On the other hand, the invention also provides a method for determining the user tag information, as shown in fig. 3, the method comprises the following steps:

S210, acquiring feature data of the research object, wherein the feature data comprises behavior data and attribute information of the research object.

And S220, performing standardization processing on the characteristic data.

S230, inputting the feature data after the standardization processing into a pre-constructed prediction model, and outputting a label of the research object, wherein the label represents the attribute feature or the behavior feature of the research object, the prediction model comprises an attribute model and a behavior model, the attribute model is used for determining the attribute label of the research object, and the behavior model is used for determining the behavior label of the research object; the attribute models include a gender model and a city level model, and the behavior models include an liveness model, a consumption capability model and a value model.

Specifically, in the embodiment of the invention, after the prediction model is constructed through the embodiments, the constructed prediction model can be utilized to process the obtained characteristic data of each user so as to realize the prediction of the labels of each user and provide a basis for the representation of the user.

It can be understood that the pre-constructed prediction model is constructed in the foregoing embodiments, and will not be described herein.

It may be further understood that, according to the research object label prediction model construction method and the research object label prediction method provided by the embodiments of the present invention, by acquiring the feature data of the user, further performing standardization processing on the acquired feature data, further processing the feature data after standardization processing by using a naive bayes algorithm, so as to construct attribute models such as a gender model and a city level model of the user, and behavior models such as an activity model, a consumption capability model and a value model, so that attribute labels such as the gender and the city level of the user can be output by using the constructed attribute models, and labels such as the activity, the consumption capability and the value of the user are output by using the constructed behavior models, so that accurate prediction of various label information of the user is finally realized, label information of the user is perfected, scientific basis is provided for accurate deep understanding of the user, and accurate depiction of the user can be finally realized.

On the other hand, the embodiment of the invention also provides a device for constructing a user tag prediction model, which comprises the following steps:

The device comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring characteristic data of a research object, and the characteristic data comprises behavior data and attribute information of the research object;

the processing module is used for carrying out standardized processing on the characteristic data;

The construction module is used for constructing a prediction model of the research object by using a naive Bayesian algorithm based on the feature data after the standardized processing, wherein the prediction model comprises an attribute model and a behavior model, the attribute model is used for outputting an attribute tag of the research object, and the behavior model is used for outputting a behavior tag of the research object; the attribute models include a gender model and a city level model, and the behavior models include an liveness model, a consumption capability model and a value model.

Optionally, the user tag prediction model building device provided by the embodiment of the present invention, where an output tag of the gender model includes a first gender, a second gender and an unknown gender; the output labels of the city level model comprise a first line city, a second line city, a third-fourth line city and a fifth-sixth line city; the output label of the jump model comprises only registration, activity, falling asleep and loss; the output label of the consumption capability model comprises high, medium and low; the output labels of the value model include important maintenance, important development, important value, important saving, general importance, general object, general saving and no value.

Optionally, the device for constructing the user tag prediction model provided by the embodiment of the present invention, the construction module is specifically configured to:

Optionally, the user tag prediction model building device provided by the embodiment of the present invention, the sample feature data corresponding to the gender model includes a number of purchased male feature categories, a number of purchased female feature categories, a number of times of browsing male feature categories, a number of times of browsing female feature categories, a duration of browsing male feature categories, and/or a duration of browsing female feature categories.

Optionally, the user tag prediction model building device provided by the embodiment of the present invention, where sample feature data corresponding to the city level model includes a receiving address, a login IP address, a coupon record and/or an order record.

Optionally, the user tag prediction model building device provided by the embodiment of the present invention, the sample feature data corresponding to the consumption capability model includes a minimum consumption record value, a maximum consumption record value, an accumulated consumption record value, a consumption number, voucher practical record information, a consumption number in a preset time period, and/or an object of interest in the preset time period.

Optionally, in the user tag prediction model building device provided by the embodiment of the present invention, sample feature data corresponding to the liveness model includes a consumption record of a first preset time period and a consumption record of a second preset time period.

Optionally, when the prediction model is the user value model, determining the output label of the prediction model according to the posterior probability includes:

Optionally, the user tag prediction model building device provided by the embodiment of the present invention, the processing module is specifically configured to:

Taking a logarithmic value for the characteristic data;

In another aspect, an embodiment of the present invention further provides a user tag prediction apparatus, where the apparatus includes:

the prediction module is used for inputting the feature data after the standardized processing into a pre-constructed prediction model, outputting a label of the research object, wherein the label represents an attribute feature or a behavior feature of the research object, the prediction model comprises an attribute model and a behavior model, the attribute model is used for determining the attribute label of the research object, and the behavior model is used for determining the behavior label of the research object;

On the other hand, the embodiment of the invention provides the computer device, which further comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the user tag prediction model construction method and the user tag prediction method when executing the program.

Referring now to fig. 4, fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

As shown in fig. 4, the electronic device includes a Central Processing Unit (CPU) 301 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 302 or a program loaded from a storage section 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data required for the operation of the electronic apparatus 300 are also stored. The CPU 301, ROM 302, and RAM 303 are connected to each other through a bus 304. An input/output (I/O) interface 305 is also connected to bus 304. In some embodiments, the following components are connected to the I/O interface 305: an input section 306 including a keyboard, a mouse, and the like; an output portion 307 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 308 including a hard disk or the like; and a communication section 309 including a network interface card such as a LAN card, a modem, or the like. The communication section 309 performs communication processing via a network such as the internet. The drive 310 is also connected to the I/O interface 305 as needed. A removable medium 311 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 310 as needed, so that a computer program read therefrom is installed into the storage section 308 as needed. In particular, according to embodiments of the present invention, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a machine-readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 309, and/or installed from the removable medium 311. The above-described functions defined in the electronic device of the present invention are performed when the computer program is executed by the Central Processing Unit (CPU) 301.

The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic device, apparatus, or device of electronic, magnetic, optical, electromagnetic, infrared, or semiconductor, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution electronic device, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution electronic device, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of electronic devices, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based electronic devices which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described units or modules may also be provided in a processor, for example, as: a processor, comprising: the device comprises an acquisition module, a processing module and a construction module. The names of these units or modules do not in some cases limit the units or modules themselves, for example, the construction module may be further described as "a predictive model for constructing the study object based on the feature data after normalization processing, using a naive bayes algorithm, where the predictive model includes an attribute model for outputting an attribute tag of the study object and a behavior model for outputting a behavior tag of the study object; the attribute models include a gender model and a city level model, and the behavior models include an liveness model, a consumption capability model and a value model.

As another aspect, the present invention also provides a computer-readable storage medium that may be contained in the electronic device described in the above embodiment; or may be present alone without being incorporated into the electronic device. The computer readable storage medium stores one or more computer programs which, when used by one or more processors, perform the user tag prediction model construction method described in the present invention:

carrying out standardization processing on the characteristic data;

Based on the feature data after the standardized processing, constructing a prediction model of the research object by using a naive Bayesian algorithm, wherein the prediction model comprises an attribute model and a behavior model, the attribute model is used for outputting an attribute tag of the research object, and the behavior model is used for outputting a behavior tag of the research object;

Or performing a user tag prediction method:

carrying out standardization processing on the characteristic data;

In summary, according to the method for constructing the research object label prediction model and the method for predicting the research object label provided by the embodiments of the invention, the feature data of the research object is obtained, the obtained feature data is further standardized, and the standardized feature data is further processed by using a naive bayes algorithm, so as to construct attribute models such as a gender model, a city level model and the like of the research object, and behavior models such as an liveness model, a consumption capability model, a value model and the like, so that attribute labels such as the gender, the city level and the like of the research object can be output by using the constructed attribute models, and labels such as the liveness, the consumption capability and the value of the research object are output by using the constructed behavior models, so that accurate prediction of various label information of the research object is finally realized, label information of the research object is perfected, scientific basis is provided for accurate deep understanding of the research object, and accurate characterization of an image of the research object can be finally realized.

The above description is only illustrative of the preferred embodiments of the present invention and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in the present invention is not limited to the specific combinations of technical features described above, but also covers other technical features which may be formed by any combination of the technical features described above or their equivalents without departing from the spirit of the disclosure. Such as the above-mentioned features and the technical features disclosed in the present invention (but not limited to) having similar functions are replaced with each other.

Claims

1. A method for constructing a user tag prediction model, the method comprising:

carrying out standardization processing on the characteristic data;

2. The method for constructing a user tag prediction model according to claim 1, wherein the output tag of the gender model includes a first gender, a second gender and an unknown gender; the output labels of the city level model comprise a first line city, a second line city, a third-fourth line city and a fifth-sixth line city; the output label of the jump model comprises only registration, activity, falling asleep and loss; the output label of the consumption capability model comprises high, medium and low; the output labels of the value model include important maintenance, important development, important value, important saving, general importance, general object, general saving and no value.

3. The method for constructing a predictive model of a user tag according to claim 2, wherein the constructing a predictive model of the study object using a naive bayes algorithm based on the normalized feature data includes:

4. The method for constructing a user tag prediction model according to claim 3, wherein the sample feature data corresponding to the gender model includes a purchase number of male feature categories, a purchase number of female feature categories, a number of times of browsing male feature categories, a number of times of browsing female feature categories, a duration of browsing male feature categories, and/or a duration of browsing female feature categories.

5. A method of constructing a predictive model of a user tag according to claim 3, wherein the sample characteristic data corresponding to the city level model includes a shipping address, a login IP address, a usage coupon record, and/or an order record.

6. The method for constructing a user tag prediction model according to claim 3, wherein the sample feature data corresponding to the consumption capability model includes a minimum consumption record value, a maximum consumption record value, an accumulated consumption record value, a consumption number, voucher utility record information, a consumption number of a preset time period, and/or an object of interest of the preset time period.

7. The method for constructing a user tag prediction model according to claim 3, wherein the sample feature data corresponding to the liveness model includes a consumption record of a first preset time period and a consumption record of a second preset time period.

8. The method of claim 3, wherein determining the output label of the predictive model based on the posterior probability when the predictive model is the user value model comprises:

9. The method of claim 2, wherein the normalizing the feature data comprises:

Taking a logarithmic value for the characteristic data;

10. A method of user tag prediction, the method comprising:

carrying out standardization processing on the characteristic data;