CN109242520A - A kind of potential user group localization method and device - Google Patents
A kind of potential user group localization method and device Download PDFInfo
- Publication number
- CN109242520A CN109242520A CN201710556934.7A CN201710556934A CN109242520A CN 109242520 A CN109242520 A CN 109242520A CN 201710556934 A CN201710556934 A CN 201710556934A CN 109242520 A CN109242520 A CN 109242520A
- Authority
- CN
- China
- Prior art keywords
- user
- content
- potential
- scoring
- weight
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
Landscapes
- Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Engineering & Computer Science (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Data Mining & Analysis (AREA)
- Economics (AREA)
- Marketing (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to data mining technology fields, more particularly to a kind of potential user group localization method and device, in order to improve the accuracy of existing potential user group localization method, this method is, using the potential user group location model pre-established, obtain the user content scoring of at least one user corresponding with object content to be positioned, then, the user content scoring that value meets preset threshold is filtered out from the scoring of each user content, and the corresponding each user of the user content filtered out scoring is determined as potential user group, wherein, potential user group location model is that the behavioral data for executing different behavior types for different content according to different user is established through multiple training, in this way, by comprehensive different user to the behavioral data of different content, it scores the correlation between different user and different content, and by Scoring height auxiliary carries out the positioning of potential user group, improves the accuracy of positioning, further improves user experience.
Description
Technical field
The present invention relates to data mining technology field more particularly to a kind of potential user group localization methods and device.
Background technique
Potential user group positioning is that the basis that product operation is promoted is carried out frequently with following two scheme in the prior art
Potential user group positioning.
Scheme one: being positioned manually mode, firstly, presupposing potential user group essential attribute feature that may be present, so
Afterwards, in a manner of database retrieval, the user of search and the attributive character exact matching assumed, and filter out satisfaction matching and require
All users, form potential user group.
However, the essential attribute feature of user can not represent the hobby and interest of the group, example completely in scheme one
Such as, by taking real estate sale as an example, subject of a sale is located in the successful personage group between 25 to 45 years old, if depositing
House property has been purchased in some people, may result in that above-mentioned some people interest-degree is not high, and therefore, the accuracy of this scheme is not
It is high.
Scheme two: tag location mode is respectively interior by pre-establishing tag library, and with the label in above-mentioned tag library
Rong Ku and user library are labeled, and by taking a content as an example, for said one content, circle selects the mark with said one content
Identical all users are signed, the potential user group of said one content is formed.
However, in scheme two, since the interest-degree of each user is not unalterable, accordingly, it is possible to can because with
The tag update of family interest setting not in time, influences the accuracy of positioning result, for example, liking seeing beauty before user's half a year
Play, system are American series fan to the label of this user, if user interest changes, and label fails to timely update,
Cause when carrying out potential user group positioning to " American series " this content, finally determining potential user group is comprising the user
, it is clear that the positioning result inaccuracy of above-mentioned potential user group.
In view of this, a kind of potential user group localization method need to be redesigned, to overcome drawbacks described above.
Summary of the invention
The embodiment of the present invention provides a kind of potential user group localization method and device, fixed to improve existing potential user group
The accuracy of position method.
Specific technical solution provided in an embodiment of the present invention is as follows:
In a first aspect, a kind of method for building up of potential user group location model, comprising:
Obtain several user content behavioral datas, wherein several user content behavioral datas include different user needle
The behavioral data of different behavior types is executed to different content;
Using default quantizing rule, quantization operation is executed to several user content behavioral datas, obtains corresponding use
Indoor appearance rating matrix;
According to preset ratio, the user content rating matrix is divided into training sample set and test sample collection;
Based on preset collaborative filtering, model training is carried out using the training sample set, until determining to use institute
Test sample collection is stated to carry out determining potential user group location model until evaluation index meets specified requirements obtained by model evaluation.
Optionally, using default quantizing rule, quantization operation, acquisition pair are executed to several user content behavioral datas
The user content rating matrix answered, comprising:
Based on several user content behavioral datas, following behaviour is executed for each corresponding content of each user
Make:
Numerical value of each behavior type of Current Content in each default weight factor is determined respectively, wherein described each
Default weight factor is including but not limited to following any one or combination: behavior number weight, behavior type weight and time decline
Subtract weight;
Following operation is executed for each behavior type: using pre-set user content scores rule, Behavior-based control number
The numerical value of the numerical value of weight, the numerical value of behavior type weight and time decaying weight determines corresponding user content scoring;
The scoring of the various actions type of acquisition corresponding user content is overlapped, it is corresponding to obtain the Current Content
User content scoring;
It scores the corresponding user content of each content of each user of acquisition and executes normalized, composition is used
Indoor appearance rating matrix.
Optionally, the pre-set user content scores regular expression are as follows:
User content scoring=behavior number weight * behavior type weight * time decaying weight;
The time decaying weight expression formula are as follows:
Time decaying weight=exp (- a* (t1-t0)), wherein the t1 is current point in time, and the t0 is behavior hair
Raw time point, a are preset specified attenuation amplitude.
Optionally, it is based on preset collaborative filtering, model training is carried out using the training sample set, comprising:
Based on preset collaborative filtering, matrix decomposition is carried out to the training sample set, obtains user characteristics vector
Matrix and content feature vector matrix;
Based on the user characteristics vector matrix and the content feature vector matrix, target user's content scores square is determined
Battle array.
Optionally, model evaluation is carried out using the test sample collection, comprising:
At least one test user is selected from test sample concentration, and determines at least one described test user couple
The user content scoring answered, as actual user's content scores;
From target user's content scores matrix, determine that at least one described corresponding user content of test user is commented
Point, as test user content scoring;
The standard error for calculating actual user's content scores and corresponding test user content scoring, refers to as evaluation
Mark.
Second aspect, a kind of potential user group localization method, using as the described in any item methods of first aspect obtain
Potential user group location model, comprising:
Using the potential user group location model, the use of at least one corresponding user of object content to be positioned is determined
Family content scores;
The user content scoring that value meets preset threshold is filtered out from the scoring of at least one user content, and will screening
The corresponding each user of user content scoring out is determined as potential user group.
Optionally, further comprise:
When the target user's number for determining that the potential user group includes is not up to expectation threshold value, mentioned from designated user pond
Take at least one candidate user;
Using preset user's similarity calculation algorithm, calculate in the potential user group each target user respectively with institute
State the similarity between each candidate user of at least one candidate user;
Based on the size of similarity value, at least one described candidate user is ranked up, and filter out meet it is described
Expectation threshold value and the preceding each candidate user of similarity value sequence, merge with initial potential user group, form new mesh
Mark user group.
Optionally, it is based on preset user's similarity calculation algorithm, calculates any one target in the potential user group
Similarity between user and any one candidate user of at least one candidate user, comprising:
The weighted value for each label that any one determining described target user carries, and determine any one described candidate
The weighted value for each label that user carries, wherein the tag class of target user and the tag class of candidate user are identical;
It is calculated using the following equation similar between any one described target user and any one described candidate user
Degree:
Wherein, i indicates predefined number of tags;A indicates any one target user;B indicates that any one candidate uses
Family;weight AiIndicate the weighted value of i-th of label of any one target user;weight BiIndicate described any
The weighted value of i-th of label of one candidate user;Indicate each label that any one described target user carries
Weighted value mean value;Indicate the mean value of the weighted value for each label that any one described candidate user carries.
The third aspect, a kind of potential user group location model establish device, comprising:
Acquiring unit, for obtaining several user content behavioral datas, wherein several user content behavioral data packets
The behavioral data of different behavior types is executed for different content containing different user;
Quantifying unit, for executing quantization operation to several user content behavioral datas using default quantizing rule,
Obtain corresponding user content rating matrix;
Division unit, for according to preset ratio, the user content rating matrix to be divided into training sample set and survey
Try sample set;
Training unit carries out model training using the training sample set, directly for being based on preset collaborative filtering
Until determining that carrying out evaluation index obtained by model evaluation using the test sample collection meets specified requirements, target user is determined
Group's location model.
Fourth aspect, a kind of potential user group positioning device are obtained using such as above-mentioned described in any item methods of first aspect
The potential user group location model obtained, comprising:
Determination unit determines that object content to be positioned is corresponding extremely for using the potential user group location model
The user content scoring of a few user;
Screening unit, the user content for meeting preset threshold for filtering out value from the scoring of at least one user content
Scoring, and the corresponding each user of the user content filtered out scoring is determined as potential user group.
5th aspect, a kind of electronic equipment, comprising: one or more processors;And
One or more computer-readable mediums are stored with for potential user group location model on the readable medium
The program of foundation, wherein when described program is executed by one or more of processors, realize as any in above-mentioned first aspect
The step of method described in item.
6th aspect, one or more computer-readable mediums are stored with for potential user group on the readable medium
The program of the foundation of location model, wherein when described program is executed by one or more processors, so that communication equipment executes such as
Method described in any one of above-mentioned first aspect.
7th aspect, a kind of electronic equipment, comprising: one or more processors;And
One or more computer-readable mediums are stored with the journey for potential user group positioning on the readable medium
Sequence, wherein when described program is executed by one or more of processors, realize as described in any one of above-mentioned second aspect
The step of method.
Eighth aspect, one or more computer-readable mediums are stored with for potential user group on the readable medium
The program of positioning, wherein when described program is executed by one or more processors, so that communication equipment executes such as above-mentioned second party
Method described in any one of face.
In the embodiment of the present invention, using the potential user group location model pre-established, obtain in target to be positioned
Hold the user content scoring of at least one corresponding user, then, filters out value from the scoring of each user content and meet in advance
If the user content of threshold value scores, and is determined as potential user group for the corresponding each user of the user content filtered out scoring,
Wherein, potential user group location model is the behavioral data warp for executing different behavior types for different content according to different user
Multiple training and establish, in this way, by comprehensive different user to the behavioral data of different content, in different user and difference
Correlation between appearance scores, and the positioning of potential user group is carried out by scoring height auxiliary, improves the accurate of positioning
Degree, further improves user experience.
Detailed description of the invention
Fig. 1 is potential user group location model training process flow chart in the embodiment of the present invention;
Fig. 2 is that time decaying weight changes schematic diagram in the embodiment of the present invention;
Fig. 3 is the first potential user group localization method flow chart in the embodiment of the present invention;
Fig. 4 is second of potential user group localization method flow chart in the embodiment of the present invention;
Fig. 5 is that second of potential user group positions schematic diagram in the embodiment of the present invention;
Fig. 6 is potential user group position fixing process schematic diagram in the embodiment of the present invention;
Fig. 7 is that potential user group location model establishes apparatus structure schematic diagram in the embodiment of the present invention;
Fig. 8 is target user's positioning device structure schematic diagram in the embodiment of the present invention.
Specific embodiment
In order to improve the accuracy of existing potential user group localization method, in the embodiment of the present invention, one kind has been redesigned
Potential user group localization method, this method are, using established potential user group location model, to determine in target to be positioned
Hold the user content scoring of at least one corresponding user, then, filtering out value from the scoring of at least one user content expires
The user content of sufficient preset threshold scores, and the corresponding each user of the user content filtered out scoring is determined as target user
Group.
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, is not whole embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
The solution of the present invention will be described in detail by specific embodiment below, certainly, the present invention is not limited to
Lower embodiment.
Embodiment one
As shown in fig.1, before carrying out potential user group positioning, need to first establish initial mesh in the embodiment of the present invention
User group location model is marked, and carries out model training, until training result meets specified requirements, just finally determines that target is used
Family group's location model, the detailed process about model training are as follows:
Step 100: obtaining several user content behavioral datas, wherein several user content behavioral datas include different use
Family executes the behavioral data of different behavior types for different content.
Specifically, first obtaining several user content behavioral datas, wherein several user content behavioral datas include different use
Family executes the behavioral data of different behavior types for different content, since the present embodiment is by analysis user when to which
A content executes the behavioral data of which kind of behavior type, and then it is interested in which content to analyze user, to be to be positioned
Object content carries out the positioning of potential user group, and usual User action log has recorded user and executes difference for different content
The behavioral data of behavior type in the embodiment of the present invention, extracts user content behavioral data from User action log.
For example, it is assumed that content is e-book, and the predefined behavior type that can be performed for e-book has and " reads online
Read, check introduction, collection and downloading " four classes, it is assumed that measurement period is one month, and ginseng is shown in Table 1.
Table 1
Step 110: using default quantizing rule, quantization operation being executed to several user content behavioral datas of acquisition, is obtained
Obtain corresponding user content rating matrix.
Specifically, after determining user content behavioral data, since the user content behavioral data of acquisition is not specific number
Therefore value need to execute quantization operation to several user content behavioral datas of acquisition, with acquisition pair using default quantizing rule
The user content rating matrix answered.
Further, firstly, several user content behavioral datas based on acquisition, corresponding each for each user
A content executes following operation: numerical value of each behavior type of Current Content in each default weight factor is determined respectively,
Wherein, above-mentioned each default weight factor is including but not limited to following any one or combination: behavior number weight, behavior type are weighed
Weight and time decaying weight.
In the embodiment of the present invention, behavior number weight characterizes a kind of number for behavior type that user executes for content;
Behavior type weight is the coefficient for being in advance the setting of behavior type, for example, it is assumed that being " e-book " curriculum offering
Behavior type be divided into " online reading checks introduction, collection and downloading " four classes, if all behavior type weights of a content
Summation is 10, and in general, the interest-degree that " downloading " is characterized is higher than " online reading " higher than " collection " and is higher than " checking introduction ",
The behavior type weight of " online reading " can be set as " 2 ", the behavior type weight of " checking introduction " is set as " 1 ", " will received
The behavior type weight of hiding " is set as " 3 ", and the behavior type weight of " downloading " is set as " 4 ";
Time decaying weight characterizes influence of the historical behavior data to user's current interest, that is, time behavior more remote
Data, a possibility that capable of characterizing user's current interest, with regard to smaller, in the present embodiment, calculate time decaying weight with following formula:
Time decaying weight value=exp (- a* (t1-t0)), wherein t1 is current point in time, and t0 is behavior time of origin point, and a is pre-
If specified attenuation amplitude, be previously known empirical value.
For example, it is assumed that a is 0.6, and decaying in time decaying weight every 10 days is primary, as shown in fig.2, with the increasing of number of days
Add, time decaying weight can be smaller and smaller.
Secondly, determining each corresponding behavior type of each content of each user in each default weight factor
After numerical value, following operation is executed for each above-mentioned behavior type: using pre-set user content scores rule, Behavior-based control
Numerical value, the numerical value of behavior type weight and the numerical value of time decaying weight of number weight determine corresponding user content scoring,
In, user content code of points expression formula are as follows: user content scoring=behavior number weight * behavior type weight * time decays
Weight.
It is then determined after the corresponding user content scoring of each behavior type of each content of each user,
Following operation is executed for each content of each user: being scored the corresponding user content of the various actions type of acquisition
It is overlapped, obtains the corresponding user content scoring of Current Content.
Finally, after determining the user content scoring of each content of each user, to each user's of acquisition
The corresponding user content scoring of each content executes normalization, and is scored and made with the user content of normalized each user
For element, user content rating matrix is determined.
Further, scoring each user content and executing normalization operation is with a user for an execution unit
, for example, if user " UID1 " corresponding content " CID1, CID2 and CID3 ", it is assumed that user " UID1 " corresponding content " CID1 "
User content scoring is " a ", and the user content scoring of corresponding content " CID2 " is " b ", the user content of corresponding content " CID1 "
Scoring is " c ", and after executing normalization operation, the user content scoring of corresponding content " CID1 " isIn correspondence
The user content for holding " CID2 ", which scores, isThe user content of corresponding content " CID3 " scores
Further, if hypothesis user is UID1-UIDm, content CID1-CIDn, the user content finally obtained commented
Sub-matrix is refering to shown in table 2.
Table 2
CID1 | CID2 | CID3 | …… | CIDn | |
UID1 | |||||
UID2 | |||||
UID3 | |||||
…… | |||||
UIDm |
For example, continuing to be illustrated with the example in table 1, it is assumed that current point in time 20161230, firstly, determining user
" 13452304628 " and user " 13628526456 ", wherein user " 13452304628 " and content " 88641 " and content
" 88642 " are related, and user " 13628526456 " is related to content " 88642 ";
Based on user content code of points expression formula: when user content scoring=behavior number weight * behavior type weight *
Between decaying weight, it is known that, the scoring of the user content of the corresponding content " 88641 " of user " 13452304628 " and content " 88642 "
Calculating process it is as follows:
Score (cid=88641)=2*2*0.3+1*1*0.3=1.5, wherein user " 13452304628 " is corresponding
In content " 88641 ", the behavior type weight of " online reading " is " 2 " (using aforementioned example values), corresponding behavior number power
Weight is " 2 ", and corresponding behavior time of origin point is " 20161215 ", is separated by 15 days with current point in time, provides according to fig. 2
Time decaying weight figure (also referred to as time attenuation coefficient figure), it may be determined that time decaying weight is " 0.3 ";The behavior of " checking introduction "
Type weight is " 1 " (use aforementioned example values), and corresponding behavior number weight is " 1 ", and corresponding behavior time of origin point
For " 20161215 ", it is separated by 15 days with current point in time, time decaying weight figure (the also referred to as time decaying system provided according to fig. 2
Number figure), it may be determined that time decaying weight is " 0.3 ".
Score (cid=88642)=2*1*0.5=1.5, wherein the corresponding content of user " 13452304628 "
In " 88642 ", the behavior type weight of " checking introduction " is " 1 ", and corresponding behavior time of origin point is " 20161220 ", and is worked as
Preceding time point is separated by 10 days, the time decaying weight figure provided according to fig. 2, it may be determined that time decaying weight is " 0.5 ";
It is found that the calculating process that the user content of the corresponding content " 88642 " of user " 13628526456 " scores is as follows:
Score (cid=88642)=1*3*0.5=1.5, wherein the corresponding content of user " 13628526456 "
In " 88642 ", the behavior type weight of " collection " is " 3 ", and corresponding behavior time of origin point is " 20161220 ", with it is current when
Between point be separated by 10 days, the time decaying weight figure provided according to fig. 2, it may be determined that time decaying weight be " 0.5 ";
Further, after executing normalization operation to above-mentioned each user content rating matrix, with each after normalization
User content scoring is used as matrix element, that is, can determine that user content rating matrix is as shown in table 3:
Table 3
CID1 | CID2 | |
UID1 | 0.7 | 0.7 |
UID2 | 0 | 1 |
Step 120: according to preset ratio, user content rating matrix being divided into training sample set and test sample collection.
Specifically, the user content rating matrix that combing has obtained, and according to preset ratio, above-mentioned user content is scored
Matrix is divided into training sample set and test sample collection, for example, can " 1:9 " ratio by 90% user's rating matrix determine
For training sample set, remaining 10% user's rating matrix is determined as test sample collection.
Step 130: being based on preset collaborative filtering, model training is carried out using training sample set.
Specifically, be based on preset collaborative filtering, to training sample set carry out matrix decomposition, obtain user characteristics to
Moment matrix and content feature vector matrix, and it is based on acquired user characteristics vector matrix and content feature vector matrix, really
Set the goal user content rating matrix, wherein and user characteristics vector matrix is matrix of the user to the preference of content hidden feature,
The matrix for the hidden feature that content feature vector matrix includes by content.
Preferably, collaborative filtering can be weighting regularization least square method in the embodiment of the present invention
(Alternating-Least-Squares with Weighted- λ-Regularization, ALS-WR).
In the embodiment of the present invention, ALS-WR why is used to carry out matrix to the user content rating matrix of training sample set
It decomposes, is because some users also just do not score accordingly there is no the preference clearly fed back to certain content, lead to initial use
Item is lacked because there is a large amount of scoring in indoor appearance rating matrix, matrix Sparse Problems occurs, therefore, by matrix decomposition, obtaining
Two low-dimensional matrixes (user characteristics vector matrix and content feature vector matrix) are obtained, then, utilize user characteristics vector matrix
Multiplication cross is carried out in lower dimensional space with content feature vector matrix, target user's content scores matrix is obtained, in this process
In, the scoring missing item in initial user content rating matrix can be filled, and solve the problems, such as that matrix is sparse.
So far, the first training of potential user group location model is completed, then, selectes at least one from test sample concentration
User is tested, and determines the corresponding user content scoring of test user, as actual user's content scores, then, from target
In user content rating matrix, the corresponding user content scoring of test user is determined, as test user content scoring, most
Afterwards, the standard error of calculating actual user's content scores and corresponding test user content scoring, as evaluation index, and in judgement
State whether evaluation index meets specified requirements, wherein standard error can be root-mean-square error (Root Mean Square
Error, RMSE), specified requirements can be RMSE and be in a certain range.
If trained evaluation index is unsatisfactory for specified requirements for the first time, then model instruction is continued using training sample set
Practice, in the embodiment of the present invention, every training for carrying out a potential user group location model can all use corresponding test sample collection
Trained potential user group location model is tested, until determining to carry out commenting obtained by model evaluation using test sample collection
Until valence index meets specified requirements, potential user group location model is just finally determined, in this way, subsequent can be based on target user
Target user's content scores matrix in group's location model recommends preferred contents to user, alternatively, potential based on content mining
User group.
Embodiment two
As shown in fig.3, in the embodiment of the present invention, after final determining potential user group location model, using above-mentioned model
The method flow for carrying out potential user group positioning is as follows:
Step 300: determining object content to be positioned.
Specifically, being based on the potential user group of content mining, therefore, it is first determined to be positioned in the embodiment of the present invention
Object content, for example, by taking music as an example, e.g., " American-European rock and roll style of song song ".
Step 310: use potential user group location model, obtain above-mentioned object content to be positioned it is corresponding at least one
The user content of user scores.
Specifically, using the target in established potential user group location model after determining object content to be positioned
User content rating matrix determines in target user's content scores matrix, the user of the corresponding each user of the object content
Content scores.
For example, it is assumed that " American-European rock and roll style of song song " corresponding each user has " user 1, user 2, user 3,4 and of user
User 5 ", wherein " user 1 " is " 0.6 " to the user content scoring of " American-European rock and roll style of song song ", and " user 2 " are to " America and Europe shakes
The user content scoring of rolling style of song song " is " 0.1 ", and " user 3 " are to the user content scoring of " American-European rock and roll style of song song "
" 0.3 ", " user 4 " are " 0.8 " to the user content scoring of " American-European rock and roll style of song song ", and " user 5 " are to " American-European rock and roll style of song
The user content scoring of song " is " 0.5 ".
Step 320: the user content scoring that value meets preset threshold is filtered out from the scoring of at least one user content,
And the corresponding each user of the user content filtered out scoring is determined as potential user group.
Specifically, the user content scoring for meeting preset threshold is filtered out from fixed each user content scoring,
And the corresponding each user of the user content filtered out scoring is determined as potential user group.
For example, being still illustrated with above-mentioned example, if preset threshold is " 0.5 ", the user of the satisfaction filtered out " 0.5 "
The user of content scores has: " user 1 ", " user 4 " and " user 5 ", that is, forms mesh by " user 1 ", " user 4 " and " user 5 "
Mark user group.
However, the above-mentioned method for carrying out potential user group positioning based on potential user group location model, finally filters out
The user content scoring of target user need to meet preset threshold, a member that can be just chosen to be in potential user group, therefore, accuracy
It is higher, however, in practical application, the number for the high target user that scores is limited, therefore, it is also desirable to excavate more potential
Valuable user, to expand marketing scope.
Embodiment three
As shown in fig.4, being based on the first potential user group localization method, determine that the potential user group for enclosing choosing includes
When target user's number is not up to expectation threshold value, in the embodiment of the present invention, a kind of potential user group localization method, the party are also provided
Method further excavates each target user's similarity for including with initial target user group based on the first localization method
High user expands potential user group, and detailed process is as follows:
Step 400: at least one candidate user is extracted from designated user pond.
Specifically, extracting at least one candidate user from designated user pond, wherein designated user pond stores a large amount of use
Family information, above-mentioned user information can come from User action log.
It is to excavate the high use of each target user's similarity for including with potential user group in the embodiment of the present invention
Family, therefore, in order to reduce calculation amount, the candidate user extracted in this step can not be target user.
Step 410: using preset user's similarity calculation algorithm, calculate each target user's difference in potential user group
Similarity between each candidate user of at least one candidate user of extraction.
Specifically, calculating each target user's difference in potential user group using preset user's similarity calculation algorithm
Similarity between each candidate user of extraction.
By in potential user group a target user and a candidate user for, firstly, determine said one target
The weighted value for each label that user carries, and determine the weighted value for each label that said one candidate user carries, wherein mesh
The tag class for marking user is identical with the tag class of candidate user, in the embodiment of the present invention, whether target user or waits
Family is selected, corresponding label has all been pre-set.
For example, it is assumed that content is music, wherein the label of target user and candidate user is 71 kinds, the power of each label
Weight values are respectively refering to shown in table 4 and table 5.
Table 4
Label 1 (Chinese) | Label 2 (English) | Label 3 (Japanese) | … | Label 71 (after 00) | |
Weighted value | 0.16 | 0.23 | 0.34 | … | 0.1 |
Table 5
Label 1 (Chinese) | Label 2 (English) | Label 3 (Japanese) | … | Label 71 (after 00) | |
Weighted value | 0.15 | 0.13 | 0.24 | … | 0.34 |
Preferably, being calculated by the following formula said one target user in the embodiment of the present invention and said one being candidate
Similarity between user:
Wherein, i indicates the number of predefined label;A indicates any one target user;B indicates that any one candidate uses
Family;weight AiIndicate the weighted value of i-th of label of any one target user;weight BiIndicate described any
The weighted value of i-th of label of one candidate user;Indicate each label that any one described target user carries
Weighted value mean value;Indicate the mean value of the weighted value for each label that any one described candidate user carries.
Step 420: the size based on similarity value is ranked up at least one above-mentioned candidate user, and filters out
Meet expectation threshold value and the preceding each candidate user of similarity value sequence, merges with initial potential user group, composition is new
Potential user group.
Specifically, determining in each candidate user extracted and initial potential user group between each target user
Similarity after, each candidate user can be ranked up, then, filter out and meet the phase based on the size of each similarity
It hopes threshold value and the preceding each candidate user of similarity value sequence, merges with initial potential user group, form new target
User group.
For example, it is assumed that initial potential user group co-exists in 1,000 target users, expectation threshold value is 10,000, then using upper
Method is stated, preceding 9,000 candidate users of similarity value sequence is filtered out, is formed with fixed initial potential user group
New potential user group.
As shown in fig.5, after carrying out user's extension to initial potential user group, being obtained new in the present embodiment
Potential user group.
Below with reference to specific implement scene, the above-mentioned three kinds of embodiments of the present invention are described in further detail, it is specific to join
It reads shown in Fig. 6, in the embodiment of the present invention, is based on User action log, carries out the training of potential user group location model, determine mesh
After marking user group location model, by the fixed target user's content scores matrix of model, preferred contents are recommended based on user, with
And it is based on commending contents potential user group, for the insufficient defect of the potential number of users of commending contents potential user group, into one
Step ground, excavates user similar with target user, carries out user's extension.
Based on embodiment one, as shown in fig.7, potential user group location model establishes device in the embodiment of the present invention,
It includes at least, acquiring unit 71, quantifying unit 72, division unit 73 and training unit 74, wherein
Acquiring unit 71, for obtaining several user content behavioral datas, wherein several user content behavioral datas
The behavioral data of different behavior types is executed for different content comprising different user;
Quantifying unit 72, for executing quantization behaviour to several user content behavioral datas using default quantizing rule
Make, obtains corresponding user content rating matrix;
Division unit 73, for according to preset ratio, by the user content rating matrix be divided into training sample set and
Test sample collection;
Training unit 74 carries out model training using the training sample set for being based on preset collaborative filtering,
Until determining that carrying out evaluation index obtained by model evaluation using the test sample collection meets specified requirements, determine that target is used
Family group's location model.
Optionally, using default quantizing rule, quantization operation, acquisition pair are executed to several user content behavioral datas
When the user content rating matrix answered, the quantifying unit is used for:
Based on several user content behavioral datas, following behaviour is executed for each corresponding content of each user
Make:
Numerical value of each behavior type of Current Content in each default weight factor is determined respectively, wherein described each
Default weight factor is including but not limited to following any one or combination: behavior number weight, behavior type weight and time decline
Subtract weight;
Following operation is executed for each behavior type: using pre-set user content scores rule, Behavior-based control number
The numerical value of the numerical value of weight, the numerical value of behavior type weight and time decaying weight determines corresponding user content scoring;
The scoring of the various actions type of acquisition corresponding user content is overlapped, it is corresponding to obtain the Current Content
User content scoring;
It scores the corresponding user content of each content of each user of acquisition and executes normalized, composition is used
Indoor appearance rating matrix.
Optionally, the pre-set user content scores regular expression are as follows:
User content scoring=behavior number weight * behavior type weight * time decaying weight;
The time decaying weight expression formula are as follows:
Time decaying weight=exp (- a* (t1-t0)), wherein the t1 is current point in time, and the t0 is behavior hair
Raw time point, a are preset specified attenuation amplitude.
Optionally, it is based on preset collaborative filtering, when carrying out model training using the training sample set, the instruction
Practice unit 74 to be used for:
Based on preset collaborative filtering, matrix decomposition is carried out to the training sample set, obtains user characteristics vector
Matrix and content feature vector matrix;
Based on the user characteristics vector matrix and the content feature vector matrix, target user's content scores square is determined
Battle array.
Optionally, when carrying out model evaluation using the test sample collection, the training unit is used for:
At least one test user is selected from test sample concentration, and determines at least one described test user couple
The user content scoring answered, as actual user's content scores;
From target user's content scores matrix, determine that at least one described corresponding user content of test user is commented
Point, as test user content scoring;
The standard error for calculating actual user's content scores and corresponding test user content scoring, refers to as evaluation
Mark.
Based on embodiment two and embodiment three, as shown in fig.8, in the embodiment of the present invention, potential user group positioning device,
It includes at least, determination unit 81 and screening unit 82, wherein
Determination unit 81 determines that object content to be positioned is corresponding for using the potential user group location model
The user content of at least one user scores;
Screening unit 82, for filtered out from the scoring of at least one user content value meet preset threshold with indoor
Hold scoring, and the corresponding each user of the user content filtered out scoring is determined as potential user group.
Optionally, described device further includes expanding element 83, and the expanding element 83 is used for:
When the target user's number for determining that the potential user group includes is not up to expectation threshold value, mentioned from designated user pond
Take at least one candidate user;
Using preset user's similarity calculation algorithm, calculate in the potential user group each target user respectively with institute
State the similarity between each candidate user of at least one candidate user;
Based on the size of similarity value, at least one described candidate user is ranked up, and filter out meet it is described
Expectation threshold value and the preceding each candidate user of similarity value sequence, merge with initial potential user group, form new mesh
Mark user group.
Optionally, it is based on preset user's similarity calculation algorithm, calculates any one target in the potential user group
When similarity between user and any one candidate user of at least one candidate user, the expanding element 83 is used
In:
The weighted value for each label that any one determining described target user carries, and determine any one described candidate
The weighted value for each label that user carries, wherein the tag class of target user and the tag class of candidate user are identical;
It is calculated using the following equation similar between any one described target user and any one described candidate user
Degree:
Wherein, i indicates predefined number of tags;A indicates any one target user;B indicates that any one candidate uses
Family;weight AiIndicate the weighted value of i-th of label of any one target user;weight BiIndicate described any
The weighted value of i-th of label of one candidate user;Indicate each label that any one described target user carries
Weighted value mean value;Indicate the mean value of the weighted value for each label that any one described candidate user carries.
In conclusion in the embodiment of the present invention, using the potential user group location model pre-established, obtain with it is to be positioned
The user content scoring of at least one corresponding user of object content then filter out and take from the scoring of each user content
Value meets the user content scoring of preset threshold, and the corresponding each user of the user content filtered out scoring is determined as target
User group, wherein potential user group location model is the row for executing different behavior types for different content according to different user
It is established for data through multiple training, in this way, by comprehensive different user to the behavioral data of different content, to different user
Correlation between different content scores, and the positioning of potential user group is carried out by scoring height auxiliary, and it is fixed to improve
The accuracy of position, further improves user experience.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic
Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as
It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, those skilled in the art can carry out various modification and variations without departing from this hair to the embodiment of the present invention
The spirit and scope of bright embodiment.In this way, if these modifications and variations of the embodiment of the present invention belong to the claims in the present invention
And its within the scope of equivalent technologies, then the present invention is also intended to include these modifications and variations.
Claims (14)
1. a kind of method for building up of potential user group location model characterized by comprising
Obtain several user content behavioral datas, wherein several user content behavioral datas include different user for not
The behavioral data of different behavior types is executed with content;
Using default quantizing rule, quantization operation is executed to several user content behavioral datas, is obtained corresponding with indoor
Hold rating matrix;
According to preset ratio, the user content rating matrix is divided into training sample set and test sample collection;
Based on preset collaborative filtering, model training is carried out using the training sample set, until determining to use the survey
Until evaluation index obtained by examination sample set progress model evaluation meets specified requirements, potential user group location model is determined.
2. the method as described in claim 1, which is characterized in that using default quantizing rule, to several user content rows
Quantization operation is executed for data, obtains corresponding user content rating matrix, comprising:
Based on several user content behavioral datas, following operation is executed for each corresponding content of each user:
Numerical value of each behavior type of Current Content in each default weight factor is determined respectively, wherein described each default
Weight factor is including but not limited to following any one or combination: behavior number weight, behavior type weight and time decaying power
Weight;
Following operation is executed for each behavior type: using pre-set user content scores rule, Behavior-based control number weight
Numerical value, the numerical value of behavior type weight and the numerical value of time decaying weight, determine the scoring of corresponding user content;
The scoring of the various actions type of acquisition corresponding user content is overlapped, the corresponding user of the Current Content is obtained
Content scores;
It scores the corresponding user content of each content of each user of acquisition and executes normalized, form with indoor
Hold rating matrix.
3. method according to claim 2, which is characterized in that the pre-set user content scores regular expression are as follows:
User content scoring=behavior number weight * behavior type weight * time decaying weight;
The time decaying weight expression formula are as follows:
Time decaying weight=exp (- a* (t1-t0)), wherein the t1 is current point in time, when the t0 is that behavior occurs
Between point, a be preset specified attenuation amplitude.
4. method as claimed in claim 1,2 or 3, which is characterized in that preset collaborative filtering is based on, using the instruction
Practice sample set and carry out model training, comprising:
Based on preset collaborative filtering, matrix decomposition is carried out to the training sample set, obtains user characteristics vector matrix
With content feature vector matrix;
Based on the user characteristics vector matrix and the content feature vector matrix, target user's content scores matrix is determined.
5. method as claimed in claim 4, which is characterized in that carry out model evaluation using the test sample collection, comprising:
At least one test user is selected from test sample concentration, and determines that at least one described test user is corresponding
User content scoring, as actual user's content scores;
From target user's content scores matrix, the corresponding user content scoring of at least one described test user is determined,
As test user content scoring;
The standard error for calculating actual user's content scores and corresponding test user content scoring, as evaluation index.
6. a kind of potential user group localization method, which is characterized in that obtained using the method according to claim 1 to 5
Potential user group location model, comprising:
Using the potential user group location model, determine at least one corresponding user of object content to be positioned with indoor
Hold scoring;
Value is filtered out from the scoring of at least one user content and meets the user content scoring of preset threshold, and will be filtered out
The corresponding each user of user content scoring is determined as potential user group.
7. method as claimed in claim 6, which is characterized in that further comprise:
When the target user's number for determining that the potential user group includes is not up to expectation threshold value, from designated user pond extract to
A few candidate user;
Using preset user's similarity calculation algorithm, calculate in the potential user group each target user respectively with it is described extremely
Similarity between each candidate user of a few candidate user;
Based on the size of similarity value, at least one described candidate user is ranked up, and filters out and meets the expectation
Threshold value and the preceding each candidate user of similarity value sequence, merge with initial potential user group, form new target and use
Family group.
8. the method for claim 7, which is characterized in that preset user's similarity calculation algorithm is based on, described in calculating
In potential user group between any one target user and any one candidate user of at least one candidate user
Similarity, comprising:
It determines the weighted value for each label that any one described target user carries, and determines any one described candidate user
The weighted value of each label carried, wherein the tag class of target user and the tag class of candidate user are identical;
The similarity being calculated using the following equation between any one described target user and any one described candidate user:
Wherein, i indicates predefined number of tags;A indicates any one target user;B indicates any one candidate user;
weight AiIndicate the weighted value of i-th of label of any one target user;weight BiIndicate it is described any one
The weighted value of i-th of label of candidate user;Indicate the power for each label that any one described target user carries
The mean value of weight values;Indicate the mean value of the weighted value for each label that any one described candidate user carries.
9. a kind of potential user group location model establishes device characterized by comprising
Acquiring unit, for obtaining several user content behavioral datas, wherein several user content behavioral datas include not
The behavioral data of different behavior types is executed for different content with user;
Quantifying unit, for executing quantization operation to several user content behavioral datas, obtaining using default quantizing rule
Corresponding user content rating matrix;
Division unit, for according to preset ratio, the user content rating matrix to be divided into training sample set and test specimens
This collection;
Training unit carries out model training using the training sample set, until sentencing for being based on preset collaborative filtering
Surely until meeting specified requirements using evaluation index obtained by test sample collection progress model evaluation, determine that potential user group is fixed
Bit model.
10. a kind of potential user group positioning device, which is characterized in that obtained using the method according to claim 1 to 5
The potential user group location model obtained, comprising:
Determination unit determines object content to be positioned corresponding at least one for using the potential user group location model
The user content of a user scores;
Screening unit, the user content that preset threshold is met for filtering out value from the scoring of at least one user content are commented
Point, and the corresponding each user of the user content filtered out scoring is determined as potential user group.
11. a kind of electronic equipment characterized by comprising one or more processors;And
One or more computer-readable mediums are stored with the foundation for potential user group location model on the readable medium
Program, wherein when described program is executed by one or more of processors, realize such as any one of claims 1 to 5 institute
The step of method stated.
12. one or more computer-readable mediums, which is characterized in that be stored on the readable medium for potential user group
The program of the foundation of location model, wherein when described program is executed by one or more processors, so that communication equipment executes such as
Method described in any one of claims 1 to 5.
13. a kind of electronic equipment characterized by comprising one or more processors;And
One or more computer-readable mediums are stored with the program for potential user group positioning on the readable medium,
In, when described program is executed by one or more of processors, realize the method as described in any one of claim 6 to 8
The step of.
14. one or more computer-readable mediums, which is characterized in that be stored on the readable medium for potential user group
The program of positioning, wherein when described program is executed by one or more processors, so that communication equipment executes such as claim 6
To method described in any one of 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710556934.7A CN109242520A (en) | 2017-07-10 | 2017-07-10 | A kind of potential user group localization method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710556934.7A CN109242520A (en) | 2017-07-10 | 2017-07-10 | A kind of potential user group localization method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109242520A true CN109242520A (en) | 2019-01-18 |
Family
ID=65083435
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710556934.7A Pending CN109242520A (en) | 2017-07-10 | 2017-07-10 | A kind of potential user group localization method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109242520A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111160975A (en) * | 2019-12-30 | 2020-05-15 | 中国移动通信集团黑龙江有限公司 | Target user determination method, device, equipment and computer storage medium |
CN111861065A (en) * | 2019-04-30 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | User data management method and device, electronic equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103377250A (en) * | 2012-04-27 | 2013-10-30 | 杭州载言网络技术有限公司 | Top-k recommendation method based on neighborhood |
US20140188627A1 (en) * | 2009-12-23 | 2014-07-03 | 140 Proof, Inc. | Method and system for creating user based summaries for content distribution |
CN105183748A (en) * | 2015-07-13 | 2015-12-23 | 电子科技大学 | Combined forecasting method based on content and score |
CN105183925A (en) * | 2015-10-30 | 2015-12-23 | 合一网络技术(北京)有限公司 | Content association recommending method and content association recommending device |
CN105488216A (en) * | 2015-12-17 | 2016-04-13 | 上海中彦信息科技有限公司 | Recommendation system and method based on implicit feedback collaborative filtering algorithm |
CN106022865A (en) * | 2016-05-10 | 2016-10-12 | 江苏大学 | Goods recommendation method based on scores and user behaviors |
-
2017
- 2017-07-10 CN CN201710556934.7A patent/CN109242520A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140188627A1 (en) * | 2009-12-23 | 2014-07-03 | 140 Proof, Inc. | Method and system for creating user based summaries for content distribution |
CN103377250A (en) * | 2012-04-27 | 2013-10-30 | 杭州载言网络技术有限公司 | Top-k recommendation method based on neighborhood |
CN105183748A (en) * | 2015-07-13 | 2015-12-23 | 电子科技大学 | Combined forecasting method based on content and score |
CN105183925A (en) * | 2015-10-30 | 2015-12-23 | 合一网络技术(北京)有限公司 | Content association recommending method and content association recommending device |
CN105488216A (en) * | 2015-12-17 | 2016-04-13 | 上海中彦信息科技有限公司 | Recommendation system and method based on implicit feedback collaborative filtering algorithm |
CN106022865A (en) * | 2016-05-10 | 2016-10-12 | 江苏大学 | Goods recommendation method based on scores and user behaviors |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111861065A (en) * | 2019-04-30 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | User data management method and device, electronic equipment and storage medium |
CN111160975A (en) * | 2019-12-30 | 2020-05-15 | 中国移动通信集团黑龙江有限公司 | Target user determination method, device, equipment and computer storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xu et al. | Imagereward: Learning and evaluating human preferences for text-to-image generation | |
Castella et al. | Participatory simulation of land-use changes in the northern mountains of Vietnam: the combined use of an agent-based model, a role-playing game, and a geographic information system | |
da Silva et al. | Vegetation indices for discrimination of soybean areas: A new approach | |
Du Plessis et al. | What makes South African tourism competitive? | |
CN109472004A (en) | Comprehensive estimation method, the apparatus and system that climate change and mankind's activity influence Hydrologic Drought | |
CN109583468A (en) | Training sample acquisition methods, sample predictions method and corresponding intrument | |
Fischer et al. | Simulating carbon stocks and fluxes of an African tropical montane forest with an individual-based forest model | |
KR20200003109A (en) | Method and apparatus for setting sample weight, electronic device | |
CN106951571A (en) | A kind of method and apparatus for giving application mark label | |
CN110222940A (en) | A kind of crowdsourcing test platform tester's proposed algorithm | |
CN109918561A (en) | A kind of study recommended method of library's (studying space) | |
CN107545038A (en) | A kind of file classification method and equipment | |
CN109063755A (en) | Clothes recognition methods and device | |
Standovár et al. | A novel forest state assessment methodology to support conservation and forest management planning | |
Khiyavi et al. | Investigation of factors affecting the international trade of agricultural products in developing countries | |
CN109242520A (en) | A kind of potential user group localization method and device | |
CN109214634A (en) | A kind of information processing method, device and information processing readable medium | |
Perotti et al. | Innovation and nested preferential growth in chess playing behavior | |
CN103577541B (en) | The ranking fraud detection method and ranking fraud detection system of application program | |
CN103577543B (en) | The ranking fraud detection method and ranking fraud detection system of application program | |
KR101561669B1 (en) | Mobile electronic field node device for phytosociological vegetation structure investigation | |
Armstrong et al. | The application of data mining techniques to characterize agricultural soil profiles. | |
Khodaparast Sareshkeh et al. | Evaluating the components of marketing mix (7Ps) of Iran’s volleyball super league | |
Wang et al. | A Web3D forest geo-visualization and user interface evaluation | |
Selim et al. | Determination of the optimum number of sample points to classify land cover types and estimate the contribution of trees on ecosystem services using the I‐Tree Canopy tool |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190118 |