CN108415913A - Crowd's orientation method based on uncertain neighbours - Google Patents
Crowd's orientation method based on uncertain neighbours Download PDFInfo
- Publication number
- CN108415913A CN108415913A CN201710072222.8A CN201710072222A CN108415913A CN 108415913 A CN108415913 A CN 108415913A CN 201710072222 A CN201710072222 A CN 201710072222A CN 108415913 A CN108415913 A CN 108415913A
- Authority
- CN
- China
- Prior art keywords
- user
- similarity
- crowd
- interest
- media
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Abstract
The present invention is crowd's orientation method based on uncertain neighbours, belong to the research category that crowd orients in Internet advertising, it is related to recommending based on user, attacks the technical fields such as general picture prevention and similarity calculation, primarily directed to because being influenced by many factors, the phenomenon that poor quality that crowd orients in causing advertisement to be launched, the feature prediction model of user is established in access behavior based on user.According to user behavior, the similar crowd of behavior of seed crowd is selected, and using user behavior and user characteristics as foundation, the neighbours of seed user are selected in the similar crowd of subordinate act, using the neighbours of all seed users as candidate crowd.Then the method oriented by crowd, dynamic select go out the higher user of similarity as potential target group.The user that method in the present invention can be widely applied to electric business system recommends, the crowd of advertisement delivery system orientation etc., improves the quality of crowd's recommendation to a certain extent.
Description
Technical field
The invention belongs to the research category that crowd in Internet advertising orients, it is related to the recommendation based on user, attacks general picture
Prevent and the technical fields such as similarity obtains, a kind of crowd's orientation method based on uncertain neighbours especially set out.
Background technology
Commending system:This project belongs to the research category of recommended technology.In recent years, commending system increasingly closes as scholars
The focus of note, and many recommended technologies are proposed, the recommendation such as Cempetency-based education and the recommended technology based on collaborative filtering.
PopesculA etc. extends model in terms of Hofmarm ' s, and integrates three kinds of user, product and product content data, then utilizes
These data are by orientating products to consumer.Arora etc. has studied the individualized content of user, the i.e. interest of user, user's
The different aspects such as the position of history and user, and recommend film to similar other users by these personalized contents.
The problem of Ekstrand etc. has studied the specific tasks in commending system, information requirement and project fields etc., serious analysis
The target of potential user and these users, and select the recommendation of a variety of methods progress users.Linden etc. is using clustering and search
Rope algorithm generates the user recommended and product, and these recommendations are expanded to mass data and are concentrated, generates high in real time in line computation
The recommendation of quality.Sequence is learnt to incorporate in commending system by Huang Zhenhua etc., and the feature by integrating a large amount of user and article,
User preference demand model is built, to improve the performance and user satisfaction of proposed algorithm.Guo Lei etc. proposes a kind of combination and pushes away
The algorithm that incidence relation between object is recommended is recommended, the social relationships between user are not only allowed for, and also contemplates recommendation pair
As incidence relation.Rong Huigui etc. is proposed based on user's similarity collaborative filtering recommending method, and by between user
Different social networks calculate the similarity between user.Chen Kehan etc. proposes the proposed algorithm of 2 Stage Clusterings, and figure is made a summary
Method and algorithm based on content similarity combine, and realize the recommendation based on user interest.Wang etc. first carries out user
Classification, and different weights is distributed different behaviors, the similarity between user is then calculated, and according to the similar row between user
To generate corresponding user and recommending set.Koren proposes the proposed algorithm based on matrix decomposition.Recommended technology it is related at
Fruit provides theoretical foundation for this item purpose research.
General picture is attacked to prevent:Mobasher etc. proposes the recommendation based on PLSA models based on the influence that user profile is attacked
Algorithm clusters user by PLSA models.Mehta etc. proposes the proposed algorithm based on singular value decomposition, and weakens
Influence of the general picture to recommendation is attacked, to improve the anti-attack ability of system.Sandivg etc. proposes the collaboration based on correlation rule
Filter algorithm enhances the stability of commending system.Jamali etc. introduces the trusting relationship between user, it is proposed that random walk
Model.Ma etc. proposes the method recommended based on matrix decomposition by mosaic society's information.Jia Dongyan etc. passes through user's
Degree of belief proposes a kind of collaborative filtering based on dual neighbours' Selection Strategy, and the recommendation to target user is completed.This
Project will be on the basis of work on hand, using user characteristics and user behavior, by the similarity between user, to find kind
The neighbours of child user, and using all neighbours as candidate crowd.
The calculating of similarity:About the computational methods of similarity, has a large amount of research work.Nearest research includes:
Relationship between the Zhong Zhao users having studied in microblogging such as full, and the phase between user is calculated by the concern of user and bean vermicelli
Like degree.Liu Ming etc. proposes a kind of similarity calculating method of feature based weight quantization, and solves the problems, such as that data are inconsistent.
Li Hailin etc. proposes two kinds of normal cloud model similarity calculating methods, and passes through the expectation curve of normal cloud model and maximum side
Boundary's curve describes the general characteristic of normal cloud model.Xu Zhiming etc. is given by the relationship in community network based on use
User's similarity calculating method of the various attribute informations (background information, microblogging text, social information) at family.Wu Yitao etc. will be from
Scattered piece is blurred into Trapezoid Fuzzy Number, and calculates user's similarity by Trapezoid Fuzzy Number.In fact, this project is for use
The characteristic of family behavior and user characteristics, it is proposed that different similarity calculating methods, and user is merged by the method for weighting
Behavior similarity and user characteristics similarity, and then obtain the similarity between user.
Invention content
For because being influenced by a variety of elements, the user quality recommended in being oriented so as to cause crowd is not high, and current
The relevant technologies are weaker to the processing of problems, the present invention is directed to design crowd's orientation method based on uncertain neighbours,
User characteristics are predicted by the web page resources of browsing and the online media sites of access, and kind of a Ziren is selected according to user behavior is similar
The similar crowd of behavior of group.Then using user behavior and user characteristics as foundation, seed use is selected in the similar crowd of subordinate act
The neighbours at family, and using the neighbours of all seed users as candidate crowd.Finally, the method oriented by crowd, dynamic select
Go out the higher user of similarity as target group.
To complete the above target, the present invention proposes a kind of crowd's orientation method based on uncertain neighbours, this method packet
Include following steps:
A:Obtain the feature (ascribed characteristics of population and interest tags) of user;
B:The similar crowd of housing choice behavior, wherein according to given seed crowd, the online media sites accessed by user obtain
Behavior similarity between user, and corresponding threshold value is set, select the user that similarity is not less than threshold value, the use selected
Family set is used as the similar crowd of behavior;
C:The candidate crowd of selection, wherein according to user characteristics and user behavior, by user's similarity acquisition methods, from row
To select the neighbor user of each seed in similar crowd, and using the seed-bearing neighbor user of institute as candidate crowd;With
D:For candidate crowd in step C, the method dynamic select oriented by crowd goes out the higher user of similarity and collects
It closes, and using user as potential target group.
Step A further comprises following sub-step:
A1:According to the online media sites of access, the ascribed characteristics of population of user is predicted;With
A2:According to the webpage that user browses, the interest tags of user are predicted.
In the step A1, ascribed characteristics of population feature be divided into gender, the age, marital status, personal income, educational background, occupation and
7 subcharacters of industry, and the acquisition of subcharacter is mainly predicted by the following method:
Wherein M1, M2..., MnIndicate n media,Indicate that the classification j of k-th of subcharacter is user,It indicates
Have accessed media MiAnd the user number counting of the classification j of k-th of subcharacter,For k-th of subcharacter of user
Classification j probability.
The step B then uses following methods to obtain when obtaining the behavior similarity of user u and user v:
Wherein DKL(Pu||Pv) indicate PuAnd PvDivergence, DKL(Pv||Pu) indicate PvAnd PuDivergence, PuIndicate user u's
Media density, PvThe media density for indicating user v, since divergence has asymmetry, DKL(Pu||Pv) and DKL(Pv||
Pu) may be inconsistent.
In addition, the acquisition about divergence, uses following methods:
Assuming that PuAnd PvIt is the cuclear density distribution for being user u, user v respectively, then PuAnd PvDivergence be:
Wherein M indicates the media collection accessed, Pu(i) and Pv(i) indicate that user u and user v access media M respectivelyiIt is close
Degree.
When estimating that user accesses media density, following methods are used to obtain:
And
Wherein M (u) indicates that the media collection that user u is accessed, h indicate window width,Indicate that user u accesses media MjMeter
Number,Indicate media MiWith media MjThe distance between, UiExpression has accessed media MiUser set, UjExpression has accessed
Media MjUser set.
User characteristics similarity is utilized when obtaining user's similarity in the step C, and user characteristics similarity obtains
It takes, uses following methods:
Wherein simP(u, v) is the ascribed characteristics of population similarity of two users, simI(u, v) is that the interest of two users is similar
Degree.
The value of the ascribed characteristics of population is broadly divided into two kinds of numeric type and title type, when obtaining the similarity of the ascribed characteristics of population, makes
With distance of two users on numeric type and title type.Then the similarity of the ascribed characteristics of population mainly obtains by the following method
It takes:
Wherein DnumberIndicate distances of the user u and user v in all numeric type features, DnominalIndicate user u and use
Distances of the family v in all title type features.
For the range measurement in numeric type feature, then following methods are used to be obtained:
Wherein djThe distance in two users on subcharacter j is indicated, if all djAll be 0, then DnumberDefault value
It is 1.
For the range measurement in title type feature, then following methods are used to be obtained:
Assuming that the value number of title type attribute is N, then manually graded in order to all values, i.e., all comments
Grade is r1, r2..., rNIf grading of two users on the attribute is respectively riAnd rj, then two users are on the attribute
Distance is | ri-rj|, distance of two users in all title type features is:
Wherein d'jThe distance in two users on subcharacter j is indicated, if all d'jBe all 0, then DnominalAcquiescence
Value is 1.
When obtaining the similarity of interest, the interest fingerprint of user is generated according to the interest of user first, then by emerging
Interesting fingerprint obtains the Interest Similarity between user.The generating process of interest fingerprint is specific as follows:
1. hashing, wherein being hashed to all interest, several K hashed value is obtained.
2. it weights, wherein all hobbies of user are extracted, and each probability right of interest, and dissipated with corresponding
Train value is multiplied, if certain position of hashed value is 1, which is multiplied with probability right, if the position is 0, which is -1 and probability is weighed
The product of weight.
3. adding up, wherein all of the above hashed value to each progress accumulation operations, only there are one sequences for generation
Numeric string.
4. dimensionality reduction, wherein the numeric string that above-mentioned accumulation step obtains is become 0 and 1 character string, i.e., final interest refers to
Line.
If each is more than 0, which is denoted as 1, if being less than 0, which is denoted as 0.Finally this K number is connected in order
It picks up and, as interest fingerprint.
Assuming that the interest fingerprint of user u and user v is respectively fuAnd fv, the measurement of Interest Similarity, then by the following method
To obtain:
Wherein fuiAnd fviThe interest fingerprint of user u and user v in i-th bit is indicated respectively.
In the similarity between obtaining user, user characteristics similarity and user behavior similarity is utilized:
Sim (u, v)=α simB(u,v)+(1-α)simF(u,v)
Wherein α is the weight of behavior similarity, and 1- α are characterized the weight of similarity.
Compared with prior art, the present invention has the advantages that:
1) present invention can be potential target group with automatic identification, can effectively improve the quality of recommendation crowd.
2) present invention has carried out filter operation to the attack of large-scale user profile, has saved certain manpower.
3) method in the present invention can be widely applied to user's recommendation of electric business system, the crowd of advertisement delivery system determines
To etc., the quality of crowd's recommendation is improved to a certain extent.
Description of the drawings
Fig. 1 is the schematic diagram according to crowd orientation method of a preferred embodiment of the present invention one based on uncertain neighbours.
Fig. 2 is the interest classification schematic diagram according to the above preferred embodiment of the present invention.
Fig. 3 is the interest fingerprint generating principle figure according to the above preferred embodiment of the present invention.
Specific implementation mode
It is described below for disclosing the present invention so that those skilled in the art can realize the present invention.It is excellent in being described below
Embodiment is selected to be only used as illustrating, it may occur to persons skilled in the art that other obvious modifications.It defines in the following description
The present invention basic principle can be applied to other embodiments, deformation scheme, improvement project, equivalent program and do not carry on the back
Other technologies scheme from the spirit and scope of the present invention.
When it is implemented, technical solution provided by the present invention can use computer software technology by those skilled in the art
Automatic running flow is realized, below in conjunction with the drawings and examples technical solution that the present invention will be described in detail.
Fig. 1 is the embodiment party according to crowd's orientation method based on uncertain neighbours of a preferred embodiment of the present invention
Case is divided into following procedure:The feature of user, the i.e. ascribed characteristics of population and interest of user are obtained first, mainly according to the behavior of user
(URL of access) establishes user characteristics prediction model, and user characteristics prediction model is divided into ascribed characteristics of population prediction model and interest point
Class model goes out the feature of user by model prediction.Then according to the behavior of user, select has similar row to seed crowd
For crowd, and according to user characteristics and its behaviors, the neighbours of seed user are selected in the similar crowd of subordinate act, will be owned
Neighbours as candidate crowd.Finally by the method that crowd orients, target user is selected from candidate crowd automatically.
Specific implementation step is as follows:
Step 1, user characteristics prediction model is established:The URL accessed according to user establishes the ascribed characteristics of population prediction of user
Model and interest disaggregated model, and then predict the ascribed characteristics of population and interest preference of user.
Step 1.1 predicts the ascribed characteristics of population of user, from the URL that user accesses, extracts the online media sites of user's access,
And according to the online media sites of access, establish the prediction model of the ascribed characteristics of population.
The ascribed characteristics of population is the description of user's inherent attribute, i.e. gender, age, personal income, marriage, education degree, occupation
With 7 subcharacters of industry.By taking gender as an example, it is however generally that, often browse buying car (www.haomaiche.com), net game
(www.youxi.com) user is mostly male, and the user overwhelming majority user for often accessing amusement variety is women.
Then, it when predicting the ascribed characteristics of population of user, uses user and accesses the domain name (i.e. website) of URL to establish the pre- of the ascribed characteristics of population
Survey model.For predicting subcharacter k, specific prediction model is as follows:
Assuming that some user has accessed n different media, respectively M1, M2..., Mn, andExpression has accessed
Media MiAnd the user number counting of the classification j of k-th of subcharacter,Indicate that the classification j of k-th of subcharacter is user, then the user
The probability that the classification for belonging to subcharacter k is j is:
It can determine whether through above-mentioned model, when predicting subcharacter k, select classes of the higher j of class probability as subcharacter k
Distinguishing label.Such as when predicting this subcharacter of gender of user, if the probability of male is more than the probability of women, the user's
This subcharacter of gender is male.
Step 1.2, the URL that user accesses can not only reflect the ascribed characteristics of population of user, but also can reflect use
The category of interest at family.This is because the content of the different URL pages, has reacted different interest topics, such as the page of good buying car
The theme of face content reaction is automobile, and the theme that the content of pages played is biased to is amusement.Then, in the page of URL
Appearance establishes Topic Profile, and is predicted by interest disaggregated model not marking the category of interest URL pages, waits for that interest is pre-
After the completion of survey, and the category of interest of mark is given to the user for accessing URL.
According to this preferred embodiment of the invention, interest can be divided into amusement, finance and economics finance, movement, digital product, tourism,
Automobile, literature and art, the political situation of the time, health care and military 10 classifications.As shown in Fig. 2, interest classification is main including training pattern and emerging
Interest 2 stages of prediction:LR graders are trained by sample set first, then use the LR graders of training to the page of access into
Row interest classifies in the training pattern stage, first by the text of the crawler capturing sample data URL pages, and the sample to crawling
The pretreatment operations such as this text segmented, filtering useless word form the training sample after participle;Then by treated sample
This training LR sorter models is in interest forecast period, it is necessary first to capturing the web page contents of URL to be sorted, and be divided
The pretreatment operations such as word, filtering useless word;Then predict that the URL pages carry out category of interest by LR sorter models, and will
Category of interest is as the hobby for accessing the URL user.
Step 2 selects the behavior phase of seed crowd according to the behavior similarity calculating method of user from all groups
Like crowd.
Target group be essentially all with seed crowd have similar user behavior, therefore choose recommendation crowd when first
According to user behavior, the behavior similar crowd of seed crowd is selected.Since user behavior is all one that user once accessed
Media (or website) information of series, according to traditional method for measuring similarity, such as cosine similarity, Pearson correlation coefficient
Deng, and these methods those of only only account for accessing between two users media jointly, have ignored the influence of other media.If
Can estimate user entire mediaspace Density Distribution, then according to user mediaspace density, to calculate two
The behavior similarity of user can be more in line with reality.
According to this preferred embodiment of the invention, the thought of cuclear density method is used to estimate user in mediaspace
Density.Common kernel function has uniform kernel function, triangle kernel function, gaussian kernel function etc., but influence of the shape of core to result
Smaller than window width is more, then use gaussian kernel function in embodiment estimate user mediaspace density.
Defined in embodiment:Assuming that M (u) indicates that the media collection that user u is accessed, h indicate window width,Indicate user u
Access media MjCounting,Indicate media MiWith media MjThe distance between, UiExpression has accessed media MiUser collection
It closes, UjExpression has accessed media MjUser set, then user access media density be:
And
Density Estimator is carried out by the above method, the Density Distribution of entire mediaspace can be obtained.Then pass through matchmaker
The cuclear density of body is distributed to calculate the behavior similarity between two users.According to this preferred embodiment of the invention, it uses
KL divergences calculate the behavior similarity of two users.
Defined in embodiment:Assuming that PuAnd PvIt is the cuclear density distribution for being user u, user v respectively, then PuAnd PvDivergence
For:
Since there is KL divergences asymmetry to calculate two by following formula according to this preferred embodiment of the invention
The behavior similarity of a user, i.e.,:
According to this preferred embodiment of the invention, in the similar crowd of housing choice behavior, the media of user's access are first depending on,
User is estimated in the density of mediaspace, the behavior similarity of seed user and other users is then calculated, phase is finally set
The threshold value answered, and select behavior similar crowd of user set of the behavior similarity not less than threshold value as seed crowd.
When step 3 selects candidate crowd, the method based on user's similarity is used first, in the similar crowd of subordinate act,
Calculate the similarity of each seed user and other users.Then certain threshold value is set, and select similarity and be more than threshold value
Neighbours of the user as the seed user.Finally using the neighborhood of the seed user of left and right as candidate crowd.
According to this preferred embodiment of the invention, when selecting potential target user, there is no direct subordinate act is similar
Directly go to choose in crowd, be on the one hand because when the media that user accesses are less, cannot using the method for behavior similarity
Behavior accurately between measure user is similar.On the other hand it is because being highly susceptible to other users during selection
The influence of general picture attack.Then, according to this preferred embodiment of the invention, pass through selected seed user in the similar crowd of subordinate act
Neighbours, the lower user of those similitudes is filtered out according to this, and using all neighbours as candidate crowd, to enhance referrer
The quality of group.When choosing candidate crowd, the method that uses user's similarity.User's similarity be then by user behavior and
User characteristics weigh the similarity degree between user, it is the behavior of the feature and user according to user, calculate the spy of user
The behavior similarity of similarity and user is levied, and corresponding weight is arranged to characteristic similarity and behavior similarity, is then passed through
The method of weighting calculates the similarity between user.
Due to user feature mainly include the ascribed characteristics of population and category of interest, calculate user characteristic similarity
When, the method for measuring similarity of different characteristic need to be studied.Thus according to presently preferred embodiment of the invention, according to the feature of user
Difference calculates separately the ascribed characteristics of population similarity and Interest Similarity of user.
When calculating the similarity of the ascribed characteristics of population, the value type for considering the ascribed characteristics of population is needed.The value master of the ascribed characteristics of population
It is divided into two kinds of numeric type and title type, then, according to this preferred embodiment of the invention, by user in numeric type and title
Distance in type calculates the similarity of user property.
Distance D in numeric type featurenumber, then following methods are used to measure:
Wherein djThe distance in two users on subcharacter j is indicated, if all djAll be 0, then DnumberDefault value
It is 1.
Distance D in title type featurenominal, then following methods are used to measure:
Assuming that the value number of title type attribute is N, then manually graded in order to all values, i.e., all comments
Grade is r1, r2..., rNIf grading of two users on the attribute is respectively riAnd rj, then two users are on the attribute
Distance is | ri-rj|, therefore distance of two users in all title type features is:
Wherein d'jThe distance in two users on subcharacter j is indicated, if all d'jBe all 0, then DnominalAcquiescence
Value is 1.
Defined in embodiment:Assuming that there are user u and user v, DnumberIt is two users in all numeric type features
Distance, DnominalFor distance of two users in all title type features, then the ascribed characteristics of population similarity of user u and user v
For:
Measurement for Interest Similarity, presently preferred embodiment of the invention use the similarity meter based on interest fingerprint
Calculation method.As shown in figure 3, for the hobby of each user, the interest fingerprint of user is generated.The specific generation of interest fingerprint
Process is as follows:
1. hashing.All interest is hashed, several K hashed value is obtained.
2. weighting.Extract all hobbies of user, and the probability right of each interest, and with corresponding hashed value
It is multiplied, if certain position of hashed value is 1, which is multiplied with probability right, if the position is 0, this is -1 and probability right
Product.
3. adding up.All of the above hashed value to each progress accumulation operations, the number only there are one sequence is generated
String.
4. dimensionality reduction.The character string for the numeric string that above-mentioned accumulation step obtains being become 0 and 1, forms final interest fingerprint.
If each is more than 0, which is denoted as 1, if being less than 0, which is denoted as 0.Finally this K number is linked in sequence,
As interest fingerprint.
Defined in embodiment:Assuming that the interest fingerprint of user u and user v is respectively fuAnd fv, then the interest phase of two users
It is like degree:
Wherein fuiAnd fviThe interest fingerprint of user u and user v in i-th bit is indicated respectively.
User characteristics are the inherent attributes of user, and user characteristics contain two aspects of the ascribed characteristics of population and interest, because
This user characteristics similarity includes similarity two parts of the similarity and interest of the ascribed characteristics of population.Since the ascribed characteristics of population and user are emerging
Interest is to describe user characteristics from different aspect, belongs to different dimensional spaces, ascribed characteristics of population similarity between user and emerging
Interesting similarity is different, can all influence the similarity of user characteristics, then presently preferred embodiment of the invention uses harmonic average
Method calculate the characteristic similarity of user.
Defined in embodiment:Assuming that there are user u and user v, simP(u, v) is that the ascribed characteristics of population of two users is similar
Degree, simI(u, v) is the Interest Similarity of two users, then the characteristic similarity of user u and user v are:
User not only has inherent user characteristics, but also includes dynamic user behavior.User's similarity be from
Family characteristic similarity and user behavior similarity two dimensions weigh the similarity degree between user, are weighed due to each dimension
Degree is different, therefore when similarity between measure user, uses the method for weighting to calculate, i.e., by similar to two
Corresponding weight is arranged in degree, is then combined with the result of two similarities.
Defined in embodiment:Assuming that there are user u and user v, simB(u, v) is the behavior similarity of two users, simF
(u, v) is the characteristic similarity of two users, then the similarity of user u and user v are:
Sim (u, v)=α simB(u,v)+(1-α)simF(u,v)
Wherein α is the weight of behavior similarity, and 1- α are characterized the weight of similarity.
The target that candidate's mass selection takes finds out higher kind of similarity mainly using user behavior and user characteristics as foundation
The neighbours of child user.The process includes mainly following two stages:
1. first against each seed user, the similarity of each user in crowd similar to behavior is calculated.
2. corresponding threshold value is arranged, the candidate crowd of seed crowd is selected.In this stage, similarity is set first
Threshold value, and it is directed to each seed user, select neighbour of the similarity not less than those of threshold value user's set as seed user
It occupies.Finally using all neighborhoods selected as the candidate crowd of seed crowd.
Step 4 is not since the user in candidate crowd is the neighborhood selected for each seed user, but not
Be each user has higher similitude with all seed users, then, according to this preferred embodiment of the invention, from
The whole angle of seed crowd is set out, the method oriented by crowd, this method with user characteristics and user behavior be choose according to
According to the similarity of each user and seed crowd in the candidate crowd of calculating, the higher user of dynamic select similarity is as latent
Target group.
Crowd's orientation method mainly dynamic select from candidate crowd goes out potential target group, includes mainly three ranks
Section:
1. the similarity of each user and seed user in candidate crowd are calculated first, then according to user and all kinds
The similarity of child user calculates the average value of similarity, and as the similarity of user and seed crowd.
2. according to the similarity of all users and seed crowd, calculate the average value of similarity, and using this average value as
The threshold value of similarity.
The user that 3. user and seed crowd's similarity are selected from candidate crowd not less than threshold value gathers, and by these
User is as potential target group.
To ensure the performance of crowd's orientation, model evaluation can be carried out:
(1) performance evaluation
Index evaluation is carried out to system performance.Index includes:Precision, recall rate and anti-attack ability etc..In addition to research is
Except the precision and recall rate of system, it is also added into the user of general picture attack in systems, and is to study by anti-attack ability
The quality that system is recommended.
It (2) can performance and complexity analyzing
Computability analysis mainly analyzes whether this method is that can calculate, can be achieved in the case where not considering complexity
's.To the np complete problem of appearance, approximate computational methods are proposed.Analysis of complexity is mainly, under the premise of computable, point
Time complexity of the model in calculating process is analysed, the efficiency of model is weighed in the complexity estimation modeled.
Specific embodiments are merely illustrative of the spirit of the present invention described in this project.Technology belonging to the present invention
The technical staff in field can make various modifications or additions to the described embodiments or by a similar method
It substitutes, however, it does not deviate from the spirit of the invention or beyond the scope of the appended claims.
It should be understood by those skilled in the art that the embodiment of the present invention shown in foregoing description and attached drawing is only used as illustrating
And it is not intended to limit the present invention.The purpose of the present invention has been fully and effectively achieved.The function and structural principle of the present invention exists
It shows and illustrates in embodiment, under without departing from the principle, embodiments of the present invention can have any deformation or modification.
Claims (10)
1. crowd's orientation method based on uncertain neighbours, which is characterized in that include the following steps:
A:Obtain the feature of user comprising the ascribed characteristics of population and interest tags;
B:The similar crowd of housing choice behavior obtains user wherein according to given seed crowd by the online media sites that user accesses
Between behavior similarity, and corresponding threshold value is set, and select similarity and be not less than the user of threshold value, wherein selecting
User's set is used as the similar crowd of behavior;
C:The candidate crowd of selection passes through user's similarity acquisition methods, subordinate act phase wherein according to user characteristics and user behavior
Like the neighbor user for selecting each seed in crowd, and using the seed-bearing neighbor user of institute as candidate crowd;With
D:For candidate crowd in the step C, the method dynamic select oriented by crowd goes out the higher user of similarity and collects
It closes, and using user as potential target group;
The wherein described step A includes the following steps:
A1:According to the online media sites of access, the ascribed characteristics of population of user is predicted;With
A2:According to the webpage that user browses, the interest tags of user are predicted.
2. crowd's orientation method according to claim 1 based on uncertain neighbours, which is characterized in that the ascribed characteristics of population is special
Sign includes gender, age, marital status, personal income, educational background, 7 subcharacters of occupation and industry, wherein the population attributive character
The acquisition of the subcharacter predicted by the following method:
Wherein M1, M2..., MnIndicate n media, wherein Cj kIndicate that the classification j of k-th of subcharacter is user, wherein Cj k(Mi) table
Show and has accessed media MiAnd the user number counting of the classification j of k-th of subcharacter, wherein p (Cj k|M1M2…Mn) it is k-th of user son
The probability of the classification j of feature.
3. crowd's orientation method according to claim 1 based on uncertain neighbours, which is characterized in that the step B exists
When obtaining the behavior similarity of user u and user v, obtained using following methods:
Wherein DKL(Pu||Pv) indicate PuAnd PvDivergence, wherein DKL(Pv||Pu) indicate PvAnd PuDivergence, wherein PuIndicate user
The media density of u, wherein PvIndicate the media density of user v;
The wherein acquisition of divergence, using following methods:
Assuming that PuAnd PvIt is the cuclear density distribution for being user u, user v respectively, then PuAnd PvDivergence be:
Wherein M indicates the media collection accessed, wherein Pu(i) and Pv(i) indicate that user u and user v access media M respectivelyiIt is close
Degree;
Wherein when estimating that user accesses media density, obtained using following methods:
And
Wherein M (u) indicates that the media collection that user u is accessed, wherein h indicate window width, whereinIndicate that user u accesses media Mj
Counting, whereinIndicate media MiWith media MjThe distance between, wherein UiExpression has accessed media MiUser set,
Wherein UjExpression has accessed media MjUser set.
4. crowd's orientation method according to claim 1 based on uncertain neighbours, which is characterized in that the step C exists
When obtaining user's similarity, user characteristics similarity is utilized, wherein user characteristics similarity is obtained by following formula:
Wherein simP(u, v) is the ascribed characteristics of population similarity of two users, wherein simI(u, v) is that the interest of two users is similar
Degree;
The value of the wherein ascribed characteristics of population is broadly divided into two kinds of numeric type and title type, wherein in the similarity for obtaining the ascribed characteristics of population
When, distance of two users on numeric type and title type is used, then the similarity of the ascribed characteristics of population is prepared by the following:
Wherein DnumberIndicate distances of the user u and user v in all numeric type features, wherein DnominalIndicate user u and use
Distances of the family v in all title type features;
Wherein for the range measurement in numeric type feature, obtained as the following formula:
Wherein djThe distance in two users on subcharacter j is indicated, if wherein all djAll be 0, then DnumberDefault value
It is 1;
Wherein for the range measurement in title type feature, obtained using following methods:
Assuming that the value number of title type attribute is N, then manually graded in order to all values, i.e., all is rated
r1, r2..., rNWherein if grading of two users on the attribute is respectively riAnd rj, then two users are on the attribute
Distance is | ri-rj|, distance of the two of which user in all title type features is:
Wherein d'jThe distance in two users on subcharacter j is indicated, if wherein all d'jBe all 0, then DnominalAcquiescence
Value is 1;
Wherein when obtaining the similarity of interest, the interest fingerprint of user is generated according to the interest of user first, then by emerging
Interesting fingerprint obtains the Interest Similarity between user, and the generating process of wherein interest fingerprint is specific as follows:
1. hashing, wherein being hashed to all interest, several K hashed value is obtained;
2. weight, wherein extract all hobbies of user, and each interest probability right, and with corresponding hashed value
Be multiplied, if wherein certain position of hashed value be 1, which is multiplied with probability right, if wherein the position be 0, the position be -1 and generally
The product of rate weight;
3. adding up, wherein all of the above hashed value to each progress accumulation operations, to generate the only number there are one sequence
Word string;With
4. dimensionality reduction, wherein the numeric string that above-mentioned accumulation step obtains is become 0 and 1 character string, i.e., final interest fingerprint,
If each in is more than 0, which is denoted as 1, if wherein being less than 0, which is denoted as 0, and finally this K number is linked in sequence
Get up, as interest fingerprint;
Wherein assume that the interest fingerprint of user u and user v is respectively fuAnd fv, then the measurement of Interest Similarity obtained by following formula:
Wherein fuiAnd fviThe interest fingerprint of user u and user v in i-th bit is indicated respectively.
5. crowd's orientation method according to claim 1 based on uncertain neighbours, which is characterized in that obtain user it
Between similarity when, utilize user characteristics similarity and user behavior similarity:
Sim (u, v)=α simB(u,v)+(1-α)simF(u,v)
Wherein α is the weight of behavior similarity, and wherein 1- α are characterized the weight of similarity.
6. crowd's orientation method based on uncertain neighbours, which is characterized in that include the following steps:
A:Obtain the feature of user comprising the ascribed characteristics of population and interest tags;
B:The similar crowd of housing choice behavior obtains user wherein according to given seed crowd by the online media sites that user accesses
Between behavior similarity, and corresponding threshold value is set, and select similarity and be not less than the user of threshold value, wherein selecting
User's set is used as the similar crowd of behavior;
C:The candidate crowd of selection passes through user's similarity acquisition methods, subordinate act phase wherein according to user characteristics and user behavior
Like the neighbor user for selecting each seed in crowd, and using the seed-bearing neighbor user of institute as candidate crowd;With
D:For candidate crowd in the step C, the method dynamic select oriented by crowd goes out the higher user of similarity and collects
It closes, and using user as potential target group.
7. crowd's orientation method according to claim 6 based on uncertain neighbours, which is characterized in that the step A packets
Include following steps:
A1:According to the online media sites of access, the ascribed characteristics of population of user is predicted;With
A2:According to the webpage that user browses, the interest tags of user are predicted;
Wherein the population attributive character includes following subcharacter:Gender, the age, marital status, personal income, educational background, occupation and
Industry.
8. crowd's orientation method according to claim 7 based on uncertain neighbours, which is characterized in that the ascribed characteristics of population is special
The acquisition of the subcharacter of sign is predicted by the following method:
Wherein M1, M2..., MnIndicate n media, wherein Cj kIndicate that the classification j of k-th of subcharacter is user, wherein Cj k(Mi) table
Show and has accessed media MiAnd the user number counting of the classification j of k-th of subcharacter, wherein p (Cj k|M1M2…Mn) it is k-th of user son
The probability of the classification j of feature;
The wherein described step B is obtained when obtaining the behavior similarity of user u and user v using following methods:
Wherein DKL(Pu||Pv) indicate PuAnd PvDivergence, wherein DKL(Pv||Pu) indicate PvAnd PuDivergence, wherein PuIndicate user
The media density of u, wherein PvIndicate the media density of user v;
The wherein acquisition of divergence, using following methods:
Assuming that PuAnd PvIt is the cuclear density distribution for being user u, user v respectively, then PuAnd PvDivergence be:
Wherein M indicates the media collection accessed, wherein Pu(i) and Pv(i) indicate that user u and user v access media M respectivelyiIt is close
Degree;
Wherein when estimating that user accesses media density, obtained using following methods:
And
Wherein M (u) indicates that the media collection that user u is accessed, wherein h indicate window width, whereinIndicate that user u accesses media Mj
Counting, whereinIndicate media MiWith media MjThe distance between, wherein UiExpression has accessed media MiUser set,
Wherein UjExpression has accessed media MjUser set.
9. crowd's orientation method according to claim 8 based on uncertain neighbours, which is characterized in that the step C exists
When obtaining user's similarity, user characteristics similarity is utilized, wherein user characteristics similarity is obtained by following formula:
Wherein simP(u, v) is the ascribed characteristics of population similarity of two users, wherein simI(u, v) is that the interest of two users is similar
Degree;
The value of the wherein ascribed characteristics of population is broadly divided into two kinds of numeric type and title type, wherein in the similarity for obtaining the ascribed characteristics of population
When, distance of two users on numeric type and title type is used, then the similarity of the ascribed characteristics of population is prepared by the following:
Wherein DnumberIndicate distances of the user u and user v in all numeric type features, wherein DnominalIndicate user u and use
Distances of the family v in all title type features;
Wherein for the range measurement in numeric type feature, obtained as the following formula:
Wherein djThe distance in two users on subcharacter j is indicated, if wherein all djAll be 0, then DnumberDefault value
It is 1;
Wherein for the range measurement in title type feature, obtained using following methods:
Assuming that the value number of title type attribute is N, then manually graded in order to all values, i.e., all is rated
r1, r2..., rNWherein if grading of two users on the attribute is respectively riAnd rj, then two users are on the attribute
Distance is | ri-rj|, distance of the two of which user in all title type features is:
Wherein d'jThe distance in two users on subcharacter j is indicated, if wherein all d'jBe all 0, then DnominalAcquiescence
Value is 1;
Wherein when obtaining the similarity of interest, the interest fingerprint of user is generated according to the interest of user first, then by emerging
Interesting fingerprint obtains the Interest Similarity between user, and the generating process of wherein interest fingerprint is specific as follows:
1. hashing, wherein being hashed to all interest, several K hashed value is obtained;
2. weight, wherein extract all hobbies of user, and each interest probability right, and with corresponding hashed value
Be multiplied, if wherein certain position of hashed value be 1, which is multiplied with probability right, if wherein the position be 0, the position be -1 and generally
The product of rate weight;
3. adding up, wherein all of the above hashed value to each progress accumulation operations, to generate the only number there are one sequence
Word string;With
4. dimensionality reduction, wherein the numeric string that above-mentioned accumulation step obtains is become 0 and 1 character string, i.e., final interest fingerprint,
If each in is more than 0, which is denoted as 1, if wherein being less than 0, which is denoted as 0, and finally this K number is linked in sequence
Get up, as interest fingerprint;
Wherein assume that the interest fingerprint of user u and user v is respectively fuAnd fv, then the measurement of Interest Similarity obtained by following formula:
Wherein fuiAnd fviThe interest fingerprint of user u and user v in i-th bit is indicated respectively.
10. crowd's orientation method according to claim 9 based on uncertain neighbours, which is characterized in that obtaining user
Between similarity when, utilize user characteristics similarity and user behavior similarity:
Sim (u, v)=α simB(u,v)+(1-α)simF(u,v)
Wherein α is the weight of behavior similarity, and wherein 1- α are characterized the weight of similarity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710072222.8A CN108415913A (en) | 2017-02-09 | 2017-02-09 | Crowd's orientation method based on uncertain neighbours |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710072222.8A CN108415913A (en) | 2017-02-09 | 2017-02-09 | Crowd's orientation method based on uncertain neighbours |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108415913A true CN108415913A (en) | 2018-08-17 |
Family
ID=63124763
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710072222.8A Pending CN108415913A (en) | 2017-02-09 | 2017-02-09 | Crowd's orientation method based on uncertain neighbours |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108415913A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109767267A (en) * | 2018-12-29 | 2019-05-17 | 微梦创科网络科技(中国)有限公司 | A kind of target user's recommended method and device for advertisement dispensing |
CN109903086A (en) * | 2019-02-14 | 2019-06-18 | 北京奇艺世纪科技有限公司 | A kind of similar crowd's extended method, device and electronic equipment |
CN110135916A (en) * | 2019-05-23 | 2019-08-16 | 北京优网助帮信息技术有限公司 | A kind of similar crowd recognition method and system |
CN110458432A (en) * | 2019-07-30 | 2019-11-15 | 国网福建省电力有限公司 | A kind of electric power Optical Transmission Network OTN reliability diagnostic method based on cloud model |
WO2020192013A1 (en) * | 2019-03-27 | 2020-10-01 | 平安科技(深圳)有限公司 | Directional advertisement delivery method and apparatus, and device and storage medium |
CN112445985A (en) * | 2019-08-27 | 2021-03-05 | 上海开域信息科技有限公司 | Similar population acquisition method based on browsing behavior optimization |
CN113011922A (en) * | 2021-03-18 | 2021-06-22 | 北京百度网讯科技有限公司 | Similar population determination method and device, electronic equipment and storage medium |
CN114048294A (en) * | 2022-01-11 | 2022-02-15 | 智者四海(北京)技术有限公司 | Similar population extension model training method, similar population extension method and device |
CN116484115A (en) * | 2023-05-17 | 2023-07-25 | 北京淘友天下技术有限公司 | Friend-making recommendation system and method with intelligent analysis function |
CN116823360A (en) * | 2023-07-13 | 2023-09-29 | 天津瀛智科技有限公司 | Intelligent advertisement plan generation method and system based on user behaviors |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103209342A (en) * | 2013-04-01 | 2013-07-17 | 电子科技大学 | Collaborative filtering recommendation method introducing video popularity and user interest change |
CN104317822A (en) * | 2014-09-29 | 2015-01-28 | 新浪网技术(中国)有限公司 | Population property prediction method and device of network user |
CN104751354A (en) * | 2015-04-13 | 2015-07-01 | 合一信息技术(北京)有限公司 | Advertisement cluster screening method |
CN105447730A (en) * | 2015-12-25 | 2016-03-30 | 腾讯科技(深圳)有限公司 | Target user orientation method and device |
CN105893609A (en) * | 2016-04-26 | 2016-08-24 | 南通大学 | Mobile APP recommendation method based on weighted mixing |
US20160343026A1 (en) * | 2015-05-19 | 2016-11-24 | Facebook, Inc. | Adaptive advertisement targeting based on performance objectives |
-
2017
- 2017-02-09 CN CN201710072222.8A patent/CN108415913A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103209342A (en) * | 2013-04-01 | 2013-07-17 | 电子科技大学 | Collaborative filtering recommendation method introducing video popularity and user interest change |
CN104317822A (en) * | 2014-09-29 | 2015-01-28 | 新浪网技术(中国)有限公司 | Population property prediction method and device of network user |
CN104751354A (en) * | 2015-04-13 | 2015-07-01 | 合一信息技术(北京)有限公司 | Advertisement cluster screening method |
US20160343026A1 (en) * | 2015-05-19 | 2016-11-24 | Facebook, Inc. | Adaptive advertisement targeting based on performance objectives |
CN105447730A (en) * | 2015-12-25 | 2016-03-30 | 腾讯科技(深圳)有限公司 | Target user orientation method and device |
CN105893609A (en) * | 2016-04-26 | 2016-08-24 | 南通大学 | Mobile APP recommendation method based on weighted mixing |
Non-Patent Citations (2)
Title |
---|
DONG LIU等: "Mining micro-blogging users’interest features via fingerprint generation", 《PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ELECTRONICS ENIGINEERING(ICCSEE 2013)》 * |
荣辉桂等: "基于用户相似度的协同过滤推荐算法", 《通信学报》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109767267A (en) * | 2018-12-29 | 2019-05-17 | 微梦创科网络科技(中国)有限公司 | A kind of target user's recommended method and device for advertisement dispensing |
CN109767267B (en) * | 2018-12-29 | 2020-12-01 | 微梦创科网络科技(中国)有限公司 | Target user recommendation method and device for advertisement delivery |
CN109903086B (en) * | 2019-02-14 | 2020-12-18 | 北京奇艺世纪科技有限公司 | Similar crowd expansion method and device and electronic equipment |
CN109903086A (en) * | 2019-02-14 | 2019-06-18 | 北京奇艺世纪科技有限公司 | A kind of similar crowd's extended method, device and electronic equipment |
WO2020192013A1 (en) * | 2019-03-27 | 2020-10-01 | 平安科技(深圳)有限公司 | Directional advertisement delivery method and apparatus, and device and storage medium |
CN110135916A (en) * | 2019-05-23 | 2019-08-16 | 北京优网助帮信息技术有限公司 | A kind of similar crowd recognition method and system |
CN110458432A (en) * | 2019-07-30 | 2019-11-15 | 国网福建省电力有限公司 | A kind of electric power Optical Transmission Network OTN reliability diagnostic method based on cloud model |
CN110458432B (en) * | 2019-07-30 | 2022-10-04 | 国网福建省电力有限公司 | Cloud model-based reliability diagnosis method for electric power optical transmission network |
CN112445985A (en) * | 2019-08-27 | 2021-03-05 | 上海开域信息科技有限公司 | Similar population acquisition method based on browsing behavior optimization |
CN113011922A (en) * | 2021-03-18 | 2021-06-22 | 北京百度网讯科技有限公司 | Similar population determination method and device, electronic equipment and storage medium |
CN113011922B (en) * | 2021-03-18 | 2023-08-04 | 北京百度网讯科技有限公司 | Method and device for determining similar crowd, electronic equipment and storage medium |
CN114048294A (en) * | 2022-01-11 | 2022-02-15 | 智者四海(北京)技术有限公司 | Similar population extension model training method, similar population extension method and device |
CN116484115A (en) * | 2023-05-17 | 2023-07-25 | 北京淘友天下技术有限公司 | Friend-making recommendation system and method with intelligent analysis function |
CN116823360A (en) * | 2023-07-13 | 2023-09-29 | 天津瀛智科技有限公司 | Intelligent advertisement plan generation method and system based on user behaviors |
CN116823360B (en) * | 2023-07-13 | 2024-02-06 | 天津瀛智科技有限公司 | Intelligent advertisement plan generation method and system based on user behaviors |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108415913A (en) | Crowd's orientation method based on uncertain neighbours | |
Fayazi et al. | Uncovering crowdsourced manipulation of online reviews | |
Yang et al. | Friend or frenemy? Predicting signed ties in social networks | |
CN102902691B (en) | Recommend method and system | |
Shinde et al. | Hybrid personalized recommender system using centering-bunching based clustering algorithm | |
CN110162703A (en) | Content recommendation method, training method, device, equipment and storage medium | |
CN106022800A (en) | User feature data processing method and device | |
CN107835113A (en) | Abnormal user detection method in a kind of social networks based on network mapping | |
CN103678618A (en) | Web service recommendation method based on socializing network platform | |
CN107577682A (en) | Users' Interests Mining and user based on social picture recommend method and system | |
CN108805598A (en) | Similarity information determines method, server and computer readable storage medium | |
CN111898031A (en) | Method and device for obtaining user portrait | |
CN108053050A (en) | Clicking rate predictor method, device, computing device and storage medium | |
Yu et al. | Spectrum-enhanced pairwise learning to rank | |
CN103136309A (en) | Method for carrying out modeling on social intensity through learning based on core | |
CN116823410B (en) | Data processing method, object processing method, recommending method and computing device | |
Cai et al. | An extension of social network group decision-making based on trustrank and personas | |
CN111611469A (en) | Identification information determination method and device, electronic equipment and storage medium | |
Gavrilev et al. | Anomaly detection in networks via score-based generative models | |
Liao et al. | Accumulative Time Based Ranking Method to Reputation Evaluation in Information Networks | |
CN112632275B (en) | Crowd clustering data processing method, device and equipment based on personal text information | |
Sun | Topic modeling and spam detection for short text segments in web forums | |
Rozario et al. | Community detection in social network using temporal data | |
Xu et al. | Identify user variants based on user behavior on social media | |
Ding et al. | Clustering Merchants and Accurate Marketing of Products Using the Segmentation Tree Vector Space Model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180817 |