Summary of the invention
In view of the above problems, this application provides a kind of method and device of locating recommendation user, recommend user to navigate to more accurately.
In order to solve the problems of the technologies described above, the application adopts following technical scheme:
Locate a method of recommending user, it comprises:
Gather customer transaction data and user video viewing data in video system;
From the customer transaction data of described collection and the user video viewing extracting data training specific characteristic data set of user and the specific characteristic data set of test subscriber;
The weighted value obtaining each specific characteristic is trained according to training algorithm each specific characteristic to the specific characteristic data centralization of training user;
Weighted value according to the specific characteristic data set of test subscriber and each specific characteristic of training acquisition determines the probabilistic forecasting data that each test subscriber pays;
User is recommended according to the probabilistic forecasting data location that each test subscriber pays.
Wherein, collection video system customer transaction data and user video viewing data comprise:
The daily record obtaining member's tran list obtains video system customer transaction data;
Obtain user video viewing daily record and obtain video system customer transaction data.
Wherein, the daily record of described member's tran list and described user video viewing daily record comprise personal computer and daily record corresponding to wireless side.
Wherein, determine that the probabilistic forecasting data that each test subscriber pays comprise according to the weighted value of the specific characteristic data set of test subscriber and each specific characteristic of training acquisition:
The weighted value obtaining each specific characteristic is sorted;
Determine to specify the crucial specific characteristic in sequencing horizon according to ranking results;
The probabilistic forecasting data that each test subscriber pays are determined according to the weight of crucial specific characteristic and the specific characteristic data set of test subscriber.
Wherein, the probabilistic forecasting data location of paying according to each test subscriber recommends user to comprise:
Determine according to recommendation accuracy the first threshold curve recommending user according to the probabilistic forecasting data of paying according to each test subscriber in section in positioning time;
The probabilistic forecasting data of paying according to each test subscriber in positioning time section are according to the Second Threshold curve recommending efficiency to determine to recommend user;
The threshold value of recommending user is determined according to described first threshold curve and Second Threshold curve;
User is recommended according to the threshold value location of the described recommendation user determined.
Wherein, the specific characteristic data set of described training user comprises positive sample data and negative sample data, described positive sample data is the specific characteristic data set of the user that fixed time point paying is bought, and described negative sample data are put the specific characteristic data set of the user bought that do not pay the fixed time.
Wherein, described fixed time point is that positive sample and negative sample gather the same day.
Wherein, described negative sample data bulk is three times of positive sample data quantity.
Wherein, the described user bought that pays is the user buying member.
Wherein, described training algorithm is L2 canonical logistic regression training algorithm.
Wherein, specific characteristic is one or more in following characteristics:
Movie channel, series channel, automobile channel, to make laughs channel, animation channel, XATV-6, fashion channel, parent-offspring's channel, GameChannel, original channel, advertisement channel, music channel, Info channel, sports channel, life channel, tourism channel, science and technology channel, education channel, entertainment channel, documentary film channel, other channel, android equipment, iphone equipment, ipad equipment, ipod equipment, miscellaneous equipment, member, non-member, paid video, free video, complete viewing and trying.
The application also provides a kind of and locates the device recommending user, and it comprises:
Acquisition module, for gathering customer transaction data and user video viewing data in video system;
Extraction module, for from the customer transaction data of described collection and the user video viewing extracting data training specific characteristic data set of user and the specific characteristic data set of test subscriber;
Training module, for training according to training algorithm each specific characteristic to the specific characteristic data centralization of training user the weighted value obtaining each specific characteristic;
Determination module, the weighted value for each specific characteristic obtained according to specific characteristic data set and the training of test subscriber determines the probabilistic forecasting data that each test subscriber pays;
Locating module, recommends user for the probabilistic forecasting data location of paying according to each test subscriber.
Wherein, acquisition module comprises:
First obtains submodule, obtains video system customer transaction data for obtaining the daily record of member's tran list;
Second obtains submodule, obtains video system customer transaction data for obtaining user video viewing daily record.
Wherein, the daily record of described member's tran list and described user video viewing daily record comprise personal computer and daily record corresponding to wireless side.
Wherein, determination module comprises:
Sorting sub-module, for sorting to the weighted value obtaining each specific characteristic;
Crucial specific characteristic determination submodule, specifies the crucial specific characteristic in sequencing horizon for determining according to ranking results;
Probabilistic forecasting data determination submodule, for determining according to the weight of crucial specific characteristic and the specific characteristic data set of test subscriber the probabilistic forecasting data that each test subscriber pays.
Wherein, locating module comprises:
First threshold curve determination submodule, for determining according to recommendation accuracy the first threshold curve recommending user according to the probabilistic forecasting data of paying according to each test subscriber in positioning time in section;
Second Threshold curve determination submodule, the probabilistic forecasting data of paying according to each test subscriber in positioning time section are according to the Second Threshold curve recommending efficiency to determine to recommend user;
Threshold value determination submodule, determines the threshold value of recommending user according to described first threshold curve and Second Threshold curve;
Recommend user's locator module, recommend user for the threshold value location according to the described recommendation user determined.
Wherein, the specific characteristic data set of described training user comprises positive sample data and negative sample data, described positive sample data is the specific characteristic data set of the user that fixed time point paying is bought, and described negative sample data are put the specific characteristic data set of the user bought that do not pay the fixed time.
Wherein, described fixed time point is that positive sample and negative sample gather the same day.
Wherein, described negative sample data bulk is three times of positive sample data quantity.
Wherein, the described user bought that pays is the user buying member.
Wherein, described training algorithm is L2 canonical logistic regression training algorithm.
Wherein, specific characteristic is one or more in following characteristics:
Movie channel, series channel, automobile channel, to make laughs channel, animation channel, XATV-6, fashion channel, parent-offspring's channel, GameChannel, original channel, advertisement channel, music channel, Info channel, sports channel, life channel, tourism channel, science and technology channel, education channel, entertainment channel, documentary film channel, other channel, android equipment, iphone equipment, ipad equipment, ipod equipment, miscellaneous equipment, member, non-member, paid video, free video, complete viewing and trying.
Locate the method and device of recommending user according to a kind of of the application, it gathers customer transaction data and user video viewing data in video system; From the customer transaction data of described collection and the user video viewing extracting data training specific characteristic data set of user and the specific characteristic data set of test subscriber; The weighted value obtaining each specific characteristic is trained according to training algorithm each specific characteristic to the specific characteristic data centralization of training user; Weighted value according to the specific characteristic data set of test subscriber and each specific characteristic of training acquisition determines the probabilistic forecasting data that each test subscriber pays; User is recommended according to the probabilistic forecasting data location that each test subscriber pays, wherein specific characteristic data set is the set of the data of user's usage behavior frequency on specific characteristic, the weighted value of each specific characteristic accurately can be obtained by the specific characteristic data set of training algorithm to training user, according to the weighted value of each specific characteristic described and then determine test set user, from test set user, the user with tendency of paying can be picked out more accurately, can navigate to more accurately and recommend user.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, the every other embodiment that those of ordinary skill in the art obtain, all belongs to the scope of protection of the invention.
Refer to Fig. 1, it is an a kind of specific embodiment process flow diagram of locating the method for recommending user of the present invention.In the present embodiment, location recommends the method for user mainly to comprise the steps:
Step S101, gathers customer transaction data and user video viewing data in video system;
During specific implementation, collection video system customer transaction data and user video viewing data realize by various mode, as a preferred embodiment, such as, can adopt following manner:
The daily record obtaining member's tran list obtains video system customer transaction data; And acquisition user video viewing daily record obtains video system customer transaction data.
Need to illustrate, the daily record of described member's tran list and described user video viewing daily record can comprise personal computer and daily record corresponding to wireless side, and the analysis that the data of acquisition like this watch behavior for user is more accurate.
Step S102, from the customer transaction data of described collection and the user video viewing extracting data training specific characteristic data set of user and the specific characteristic data set of test subscriber;
For the user's (such as buying the user of member) bought that pays, in its usage behavior feature in video website, some feature is bought with paying generally stronger associating, namely specific characteristic described in the present embodiment is the feature associated with user charges buying behavior, the specific characteristic data set of user is the set of the data of the user's usage behavior frequency on specific characteristic, for example, paying customer can relate to viewing movie channel, namely movie channel is a specific characteristic, namely user's usage behavior frequency on specific characteristic is the frequency that in timing statistics section, user watches movie channel, and namely the specific characteristic data set extracted needs the frequency data of the usage behavior of user on specific characteristic in specific characteristic and timing statistics section, such as, timing statistics section is 60 days, the characteristic data set of the movie channel extracted needs to preserve preserves according to the classification of movie channel the number of times that user watches movie channel in 60 days, as a specific embodiment, such as, it is one or more that the specific characteristic associated with user charges buying behavior can comprise in following characteristics:
Movie channel, series channel, automobile channel, to make laughs channel, animation channel, XATV-6, fashion channel, parent-offspring's channel, GameChannel, original channel, advertisement channel, music channel, Info channel, sports channel, life channel, tourism channel, science and technology channel, education channel, entertainment channel, documentary film channel, other channel, android equipment, iphone equipment, ipad equipment, ipod equipment, miscellaneous equipment, member, non-member, paid video, free video, complete viewing and trying, increasing along with the increase of the increase of video-see channel or the mobile device of use and Association Identity and business in reality, more specific characteristic can also be increased, here only illustrate, instead of be specifically confined to above-mentioned feature.
In addition, the specific characteristic data set of the user extracted in this step is divided into the training specific characteristic data set of user and the specific characteristic data set of test subscriber, the specific characteristic data set of user is wherein trained to comprise positive sample data and negative sample data, described negative sample data are put the specific characteristic data set of the user bought that do not pay the fixed time, during specific implementation, ratio between positive sample and negative sample can be adjusted according to actual conditions, such as, described negative sample data bulk is three times of positive sample data quantity, in addition, as specific embodiment, such as described positive sample data is the specific characteristic data set of the user that fixed time point paying is bought, illustrate, predict that next day user buys the probability of member, then the fixed time puts is then the same day, can guarantee like this to train and probabilistic forecasting data have identical ageing, wherein the quantity of positive sample bought the number of member the same day, quantity is 1.5 ten thousand people, simultaneously owing to not buying the number of member every day far away more than the number buying member, therefore the number of negative sample more than positive sample, should can have chosen 50,000 people as negative sample, and namely negative sample quantity is about 3 times of positive sample number.
Step S103, trains according to training algorithm each specific characteristic to the specific characteristic data centralization of training user the weighted value obtaining each specific characteristic;
Various existing training algorithm can be adopted in the application to train, here be not construed as limiting, only illustrate, such as, training algorithm can adopt L2 canonical logistic regression training algorithm, L2 canonical logistic regression training algorithm is also referred to as L2 regularization logistic algorithm, it is widely used in statistics, in the present embodiment, above-mentioned steps S102 is extracted and obtain training the specific characteristic data set of user to carry out training the weighted value that can obtain each specific characteristic, such as, using the positive sample data of above-mentioned 1.5 ten thousand people and the negative sample data of 50,000 people as input data, through training algorithm, the weighted value of each specific characteristic such as can be obtained after the training of L2 canonical logistic regression training algorithm, such as, if weight total score is 100, in above-mentioned specific characteristic, the weight of movie channel is 8, the weight of series channel is 10, the weight of other specific characteristics can obtain equally, here only illustrate, repeat no more.
Step S104, the weighted value according to the specific characteristic data set of test subscriber and each specific characteristic of training acquisition determines the probabilistic forecasting data that each test subscriber pays;
During specific implementation, the weighted value of each specific characteristic that above-mentioned steps S103 obtains varies in size, the weighted value of some specific characteristic may be less, prediction can be not used in, namely can use the weighted value of whole specific characteristic to predict in this step, also the weight of the larger specific characteristic of fractional weight can be adopted to predict, such as, a kind of mode is:
The weighted value obtaining each specific characteristic is sorted;
Determine to specify the crucial specific characteristic in sequencing horizon according to ranking results, such as using the specific characteristic of sequence in the sequencing horizon of front ten as crucial specific characteristic;
The probabilistic forecasting data that each test subscriber pays are determined according to the weight of crucial specific characteristic and the specific characteristic data set of test subscriber.
Step S105, recommends user according to the probabilistic forecasting data location that each test subscriber pays.
During specific implementation, user can be recommended according to the size location of probabilistic forecasting data according to the probabilistic forecasting data that each test subscriber pays, but need to reach to realize high conversion ratio under the condition of lower coverage rate, for this reason, suppose accuracy=predict correct number/actual purchase number, namely the ratio of correct paying number and the same day actual paying number is predicted, efficiency=predict correct number/prediction purchase number, namely predict correct paying number and dope the number buying member, the probabilistic forecasting data location of paying according to each test subscriber in the present embodiment recommends user to locate in the following manner, that is:
Determine according to recommendation accuracy the first threshold curve recommending user according to the probabilistic forecasting data of paying according to each test subscriber in section in positioning time;
The probabilistic forecasting data of paying according to each test subscriber in positioning time section are according to the Second Threshold curve recommending efficiency to determine to recommend user;
The threshold value of recommending user is determined according to described first threshold curve and Second Threshold curve;
User is recommended according to the threshold value location of the described recommendation user determined.
According to the above embodiments, the input less crowd being carried out to video can be realized, reach the feature of raising the efficiency, and also check prediction effect by reality input test further, by adding up the data of every day, can verify further or adjust, repeat no more here.
Refer to Fig. 2, it is according to a kind of specific embodiment composition schematic diagram of locating the device recommending user of the present invention, mainly comprises:
Acquisition module 1, in the present embodiment, acquisition module 1 is mainly used in gathering customer transaction data and user video viewing data in video system, during specific implementation, collection video system customer transaction data and user video viewing data realize by various mode, as a specific embodiment, with reference to figure 3, acquisition module can comprise:
First obtains submodule 11, obtains video system customer transaction data for obtaining the daily record of member's tran list;
Second obtains submodule 12, obtains video system customer transaction data for obtaining user video viewing daily record.
As aforementioned, the daily record of described member's tran list and described user video viewing daily record comprise personal computer and daily record corresponding to wireless side, and the analysis that the data of acquisition like this watch behavior for user is more accurate.
Extraction module 2, in the present embodiment, extraction module 2 is mainly used in from the customer transaction data of described collection and the user video viewing extracting data training specific characteristic data set of user and the specific characteristic data set of test subscriber, during specific implementation, in the present embodiment, specific characteristic can be one or more in following characteristics:
Movie channel, series channel, automobile channel, to make laughs channel, animation channel, XATV-6, fashion channel, parent-offspring's channel, GameChannel, original channel, advertisement channel, music channel, Info channel, sports channel, life channel, tourism channel, science and technology channel, education channel, entertainment channel, documentary film channel, other channel, android equipment, iphone equipment, ipad equipment, ipod equipment, miscellaneous equipment, member, non-member, paid video, free video, complete viewing and trying.
In addition, needs illustrate, extract main according to user ID, according to specific characteristic classification, and then gather the user behavior frequency data of each specific characteristic corresponding, such as, determine that user ID is user100, the data of the various viewing behaviors of collecting user100 can be watched daily record from user video, such as, watch the frequency of movie channel, the frequency etc. of viewing series channel, and then train user and test subscriber to form corresponding specific data collection respectively.
In addition, in order to realize training, the specific characteristic data set of described training user can comprise positive sample data and negative sample data, described positive sample data can be the specific characteristic data set of the user that fixed time point paying is bought, described negative sample data are put the specific characteristic data set of the user bought that do not pay the fixed time, general negative sample data bulk is greater than positive sample data quantity, such as, described negative sample data bulk is three times or other ratios of positive sample data quantity, here concrete restriction is not done, in addition, concrete restriction is not done to putting at fixed time in the present embodiment yet, such as, fixed time point can be the same day, the described user paying purchase such as can buy the user etc. of member, also can be other situations in reality, here only illustrate, and be not specifically limited.
Training module 3, in the present embodiment, training module 3 is mainly used in the weighted value training to obtain each specific characteristic according to training algorithm each specific characteristic to the specific characteristic data centralization of training user, as aforementioned, various existing training algorithm can be adopted in the application to train, here be not construed as limiting, only illustrate, such as, training algorithm can adopt L2 canonical logistic regression training algorithm.
Determination module 4, in the present embodiment, determination module 4 is mainly used in determining according to the weighted value of the specific characteristic data set of test subscriber and each specific characteristic of training acquisition the probabilistic forecasting data that each test subscriber pays; During specific implementation, as a specific embodiment, such as, with reference to figure 4, determination module can comprise:
Sorting sub-module 41, for sorting to the weighted value obtaining each specific characteristic;
Crucial specific characteristic determination submodule 42, specifies the crucial specific characteristic in sequencing horizon for determining according to ranking results;
Probabilistic forecasting data determination submodule 43, for determining according to the weight of crucial specific characteristic and the specific characteristic data set of test subscriber the probabilistic forecasting data that each test subscriber pays.
Locating module 5, user is recommended in the probabilistic forecasting data location that in the present embodiment, locating module 5 is mainly used in paying according to each test subscriber, and during specific implementation, as a specific embodiment, such as, with reference to figure 5, locating module can comprise:
First threshold curve determination submodule 51, for determining according to recommendation accuracy the first threshold curve recommending user according to the probabilistic forecasting data of paying according to each test subscriber in positioning time in section;
Second Threshold curve determination submodule 52, the probabilistic forecasting data of paying according to each test subscriber in positioning time section are according to the Second Threshold curve recommending efficiency to determine to recommend user;
Threshold value determination submodule 53, determines the threshold value of recommending user according to described first threshold curve and Second Threshold curve;
Recommend user's locator module 54, recommend user for the threshold value location according to the described recommendation user determined.
In above-mentioned provided instructions, describe a large amount of detail.But can understand, embodiments of the invention can be put into practice when not having these details.In some instances, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.
Similarly, be to be understood that, in order to simplify the disclosure and to help to understand in each inventive aspect one or more, in the description above to exemplary embodiment of the present invention, each feature of the present invention is grouped together in single embodiment, figure or the description to it sometimes.But, the method for the disclosure should be construed to the following intention of reflection: namely the present invention for required protection requires feature more more than the feature clearly recorded in each claim.Or rather, as claims below reflect, all features of disclosed single embodiment before inventive aspect is to be less than.Therefore, the claims following embodiment are incorporated to this embodiment thus clearly, and wherein each claim itself is as independent embodiment of the present invention.
The present invention will be described instead of limit the invention to it should be noted above-described embodiment, and those skilled in the art can design alternative embodiment when not departing from the scope of claims.