CN108415913A - Crowd's orientation method based on uncertain neighbours - Google Patents

Crowd's orientation method based on uncertain neighbours Download PDF

Info

Publication number
CN108415913A
CN108415913A CN201710072222.8A CN201710072222A CN108415913A CN 108415913 A CN108415913 A CN 108415913A CN 201710072222 A CN201710072222 A CN 201710072222A CN 108415913 A CN108415913 A CN 108415913A
Authority
CN
China
Prior art keywords
user
similarity
crowd
interest
media
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710072222.8A
Other languages
Chinese (zh)
Inventor
周孟
朱福喜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201710072222.8A priority Critical patent/CN108415913A/en
Publication of CN108415913A publication Critical patent/CN108415913A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Abstract

The present invention is crowd's orientation method based on uncertain neighbours, belong to the research category that crowd orients in Internet advertising, it is related to recommending based on user, attacks the technical fields such as general picture prevention and similarity calculation, primarily directed to because being influenced by many factors, the phenomenon that poor quality that crowd orients in causing advertisement to be launched, the feature prediction model of user is established in access behavior based on user.According to user behavior, the similar crowd of behavior of seed crowd is selected, and using user behavior and user characteristics as foundation, the neighbours of seed user are selected in the similar crowd of subordinate act, using the neighbours of all seed users as candidate crowd.Then the method oriented by crowd, dynamic select go out the higher user of similarity as potential target group.The user that method in the present invention can be widely applied to electric business system recommends, the crowd of advertisement delivery system orientation etc., improves the quality of crowd's recommendation to a certain extent.

Description

Crowd's orientation method based on uncertain neighbours
Technical field
The invention belongs to the research category that crowd in Internet advertising orients, it is related to the recommendation based on user, attacks general picture Prevent and the technical fields such as similarity obtains, a kind of crowd's orientation method based on uncertain neighbours especially set out.
Background technology
Commending system:This project belongs to the research category of recommended technology.In recent years, commending system increasingly closes as scholars The focus of note, and many recommended technologies are proposed, the recommendation such as Cempetency-based education and the recommended technology based on collaborative filtering. PopesculA etc. extends model in terms of Hofmarm ' s, and integrates three kinds of user, product and product content data, then utilizes These data are by orientating products to consumer.Arora etc. has studied the individualized content of user, the i.e. interest of user, user's The different aspects such as the position of history and user, and recommend film to similar other users by these personalized contents. The problem of Ekstrand etc. has studied the specific tasks in commending system, information requirement and project fields etc., serious analysis The target of potential user and these users, and select the recommendation of a variety of methods progress users.Linden etc. is using clustering and search Rope algorithm generates the user recommended and product, and these recommendations are expanded to mass data and are concentrated, generates high in real time in line computation The recommendation of quality.Sequence is learnt to incorporate in commending system by Huang Zhenhua etc., and the feature by integrating a large amount of user and article, User preference demand model is built, to improve the performance and user satisfaction of proposed algorithm.Guo Lei etc. proposes a kind of combination and pushes away The algorithm that incidence relation between object is recommended is recommended, the social relationships between user are not only allowed for, and also contemplates recommendation pair As incidence relation.Rong Huigui etc. is proposed based on user's similarity collaborative filtering recommending method, and by between user Different social networks calculate the similarity between user.Chen Kehan etc. proposes the proposed algorithm of 2 Stage Clusterings, and figure is made a summary Method and algorithm based on content similarity combine, and realize the recommendation based on user interest.Wang etc. first carries out user Classification, and different weights is distributed different behaviors, the similarity between user is then calculated, and according to the similar row between user To generate corresponding user and recommending set.Koren proposes the proposed algorithm based on matrix decomposition.Recommended technology it is related at Fruit provides theoretical foundation for this item purpose research.
General picture is attacked to prevent:Mobasher etc. proposes the recommendation based on PLSA models based on the influence that user profile is attacked Algorithm clusters user by PLSA models.Mehta etc. proposes the proposed algorithm based on singular value decomposition, and weakens Influence of the general picture to recommendation is attacked, to improve the anti-attack ability of system.Sandivg etc. proposes the collaboration based on correlation rule Filter algorithm enhances the stability of commending system.Jamali etc. introduces the trusting relationship between user, it is proposed that random walk Model.Ma etc. proposes the method recommended based on matrix decomposition by mosaic society's information.Jia Dongyan etc. passes through user's Degree of belief proposes a kind of collaborative filtering based on dual neighbours' Selection Strategy, and the recommendation to target user is completed.This Project will be on the basis of work on hand, using user characteristics and user behavior, by the similarity between user, to find kind The neighbours of child user, and using all neighbours as candidate crowd.
The calculating of similarity:About the computational methods of similarity, has a large amount of research work.Nearest research includes: Relationship between the Zhong Zhao users having studied in microblogging such as full, and the phase between user is calculated by the concern of user and bean vermicelli Like degree.Liu Ming etc. proposes a kind of similarity calculating method of feature based weight quantization, and solves the problems, such as that data are inconsistent. Li Hailin etc. proposes two kinds of normal cloud model similarity calculating methods, and passes through the expectation curve of normal cloud model and maximum side Boundary's curve describes the general characteristic of normal cloud model.Xu Zhiming etc. is given by the relationship in community network based on use User's similarity calculating method of the various attribute informations (background information, microblogging text, social information) at family.Wu Yitao etc. will be from Scattered piece is blurred into Trapezoid Fuzzy Number, and calculates user's similarity by Trapezoid Fuzzy Number.In fact, this project is for use The characteristic of family behavior and user characteristics, it is proposed that different similarity calculating methods, and user is merged by the method for weighting Behavior similarity and user characteristics similarity, and then obtain the similarity between user.
Invention content
For because being influenced by a variety of elements, the user quality recommended in being oriented so as to cause crowd is not high, and current The relevant technologies are weaker to the processing of problems, the present invention is directed to design crowd's orientation method based on uncertain neighbours, User characteristics are predicted by the web page resources of browsing and the online media sites of access, and kind of a Ziren is selected according to user behavior is similar The similar crowd of behavior of group.Then using user behavior and user characteristics as foundation, seed use is selected in the similar crowd of subordinate act The neighbours at family, and using the neighbours of all seed users as candidate crowd.Finally, the method oriented by crowd, dynamic select Go out the higher user of similarity as target group.
To complete the above target, the present invention proposes a kind of crowd's orientation method based on uncertain neighbours, this method packet Include following steps:
A:Obtain the feature (ascribed characteristics of population and interest tags) of user;
B:The similar crowd of housing choice behavior, wherein according to given seed crowd, the online media sites accessed by user obtain Behavior similarity between user, and corresponding threshold value is set, select the user that similarity is not less than threshold value, the use selected Family set is used as the similar crowd of behavior;
C:The candidate crowd of selection, wherein according to user characteristics and user behavior, by user's similarity acquisition methods, from row To select the neighbor user of each seed in similar crowd, and using the seed-bearing neighbor user of institute as candidate crowd;With
D:For candidate crowd in step C, the method dynamic select oriented by crowd goes out the higher user of similarity and collects It closes, and using user as potential target group.
Step A further comprises following sub-step:
A1:According to the online media sites of access, the ascribed characteristics of population of user is predicted;With
A2:According to the webpage that user browses, the interest tags of user are predicted.
In the step A1, ascribed characteristics of population feature be divided into gender, the age, marital status, personal income, educational background, occupation and 7 subcharacters of industry, and the acquisition of subcharacter is mainly predicted by the following method:
Wherein M1, M2..., MnIndicate n media,Indicate that the classification j of k-th of subcharacter is user,It indicates Have accessed media MiAnd the user number counting of the classification j of k-th of subcharacter,For k-th of subcharacter of user Classification j probability.
The step B then uses following methods to obtain when obtaining the behavior similarity of user u and user v:
Wherein DKL(Pu||Pv) indicate PuAnd PvDivergence, DKL(Pv||Pu) indicate PvAnd PuDivergence, PuIndicate user u's Media density, PvThe media density for indicating user v, since divergence has asymmetry, DKL(Pu||Pv) and DKL(Pv|| Pu) may be inconsistent.
In addition, the acquisition about divergence, uses following methods:
Assuming that PuAnd PvIt is the cuclear density distribution for being user u, user v respectively, then PuAnd PvDivergence be:
Wherein M indicates the media collection accessed, Pu(i) and Pv(i) indicate that user u and user v access media M respectivelyiIt is close Degree.
When estimating that user accesses media density, following methods are used to obtain:
And
Wherein M (u) indicates that the media collection that user u is accessed, h indicate window width,Indicate that user u accesses media MjMeter Number,Indicate media MiWith media MjThe distance between, UiExpression has accessed media MiUser set, UjExpression has accessed Media MjUser set.
User characteristics similarity is utilized when obtaining user's similarity in the step C, and user characteristics similarity obtains It takes, uses following methods:
Wherein simP(u, v) is the ascribed characteristics of population similarity of two users, simI(u, v) is that the interest of two users is similar Degree.
The value of the ascribed characteristics of population is broadly divided into two kinds of numeric type and title type, when obtaining the similarity of the ascribed characteristics of population, makes With distance of two users on numeric type and title type.Then the similarity of the ascribed characteristics of population mainly obtains by the following method It takes:
Wherein DnumberIndicate distances of the user u and user v in all numeric type features, DnominalIndicate user u and use Distances of the family v in all title type features.
For the range measurement in numeric type feature, then following methods are used to be obtained:
Wherein djThe distance in two users on subcharacter j is indicated, if all djAll be 0, then DnumberDefault value It is 1.
For the range measurement in title type feature, then following methods are used to be obtained:
Assuming that the value number of title type attribute is N, then manually graded in order to all values, i.e., all comments Grade is r1, r2..., rNIf grading of two users on the attribute is respectively riAnd rj, then two users are on the attribute Distance is | ri-rj|, distance of two users in all title type features is:
Wherein d'jThe distance in two users on subcharacter j is indicated, if all d'jBe all 0, then DnominalAcquiescence Value is 1.
When obtaining the similarity of interest, the interest fingerprint of user is generated according to the interest of user first, then by emerging Interesting fingerprint obtains the Interest Similarity between user.The generating process of interest fingerprint is specific as follows:
1. hashing, wherein being hashed to all interest, several K hashed value is obtained.
2. it weights, wherein all hobbies of user are extracted, and each probability right of interest, and dissipated with corresponding Train value is multiplied, if certain position of hashed value is 1, which is multiplied with probability right, if the position is 0, which is -1 and probability is weighed The product of weight.
3. adding up, wherein all of the above hashed value to each progress accumulation operations, only there are one sequences for generation Numeric string.
4. dimensionality reduction, wherein the numeric string that above-mentioned accumulation step obtains is become 0 and 1 character string, i.e., final interest refers to Line.
If each is more than 0, which is denoted as 1, if being less than 0, which is denoted as 0.Finally this K number is connected in order It picks up and, as interest fingerprint.
Assuming that the interest fingerprint of user u and user v is respectively fuAnd fv, the measurement of Interest Similarity, then by the following method To obtain:
Wherein fuiAnd fviThe interest fingerprint of user u and user v in i-th bit is indicated respectively.
In the similarity between obtaining user, user characteristics similarity and user behavior similarity is utilized:
Sim (u, v)=α simB(u,v)+(1-α)simF(u,v)
Wherein α is the weight of behavior similarity, and 1- α are characterized the weight of similarity.
Compared with prior art, the present invention has the advantages that:
1) present invention can be potential target group with automatic identification, can effectively improve the quality of recommendation crowd.
2) present invention has carried out filter operation to the attack of large-scale user profile, has saved certain manpower.
3) method in the present invention can be widely applied to user's recommendation of electric business system, the crowd of advertisement delivery system determines To etc., the quality of crowd's recommendation is improved to a certain extent.
Description of the drawings
Fig. 1 is the schematic diagram according to crowd orientation method of a preferred embodiment of the present invention one based on uncertain neighbours.
Fig. 2 is the interest classification schematic diagram according to the above preferred embodiment of the present invention.
Fig. 3 is the interest fingerprint generating principle figure according to the above preferred embodiment of the present invention.
Specific implementation mode
It is described below for disclosing the present invention so that those skilled in the art can realize the present invention.It is excellent in being described below Embodiment is selected to be only used as illustrating, it may occur to persons skilled in the art that other obvious modifications.It defines in the following description The present invention basic principle can be applied to other embodiments, deformation scheme, improvement project, equivalent program and do not carry on the back Other technologies scheme from the spirit and scope of the present invention.
When it is implemented, technical solution provided by the present invention can use computer software technology by those skilled in the art Automatic running flow is realized, below in conjunction with the drawings and examples technical solution that the present invention will be described in detail.
Fig. 1 is the embodiment party according to crowd's orientation method based on uncertain neighbours of a preferred embodiment of the present invention Case is divided into following procedure:The feature of user, the i.e. ascribed characteristics of population and interest of user are obtained first, mainly according to the behavior of user (URL of access) establishes user characteristics prediction model, and user characteristics prediction model is divided into ascribed characteristics of population prediction model and interest point Class model goes out the feature of user by model prediction.Then according to the behavior of user, select has similar row to seed crowd For crowd, and according to user characteristics and its behaviors, the neighbours of seed user are selected in the similar crowd of subordinate act, will be owned Neighbours as candidate crowd.Finally by the method that crowd orients, target user is selected from candidate crowd automatically.
Specific implementation step is as follows:
Step 1, user characteristics prediction model is established:The URL accessed according to user establishes the ascribed characteristics of population prediction of user Model and interest disaggregated model, and then predict the ascribed characteristics of population and interest preference of user.
Step 1.1 predicts the ascribed characteristics of population of user, from the URL that user accesses, extracts the online media sites of user's access, And according to the online media sites of access, establish the prediction model of the ascribed characteristics of population.
The ascribed characteristics of population is the description of user's inherent attribute, i.e. gender, age, personal income, marriage, education degree, occupation With 7 subcharacters of industry.By taking gender as an example, it is however generally that, often browse buying car (www.haomaiche.com), net game (www.youxi.com) user is mostly male, and the user overwhelming majority user for often accessing amusement variety is women. Then, it when predicting the ascribed characteristics of population of user, uses user and accesses the domain name (i.e. website) of URL to establish the pre- of the ascribed characteristics of population Survey model.For predicting subcharacter k, specific prediction model is as follows:
Assuming that some user has accessed n different media, respectively M1, M2..., Mn, andExpression has accessed Media MiAnd the user number counting of the classification j of k-th of subcharacter,Indicate that the classification j of k-th of subcharacter is user, then the user The probability that the classification for belonging to subcharacter k is j is:
It can determine whether through above-mentioned model, when predicting subcharacter k, select classes of the higher j of class probability as subcharacter k Distinguishing label.Such as when predicting this subcharacter of gender of user, if the probability of male is more than the probability of women, the user's This subcharacter of gender is male.
Step 1.2, the URL that user accesses can not only reflect the ascribed characteristics of population of user, but also can reflect use The category of interest at family.This is because the content of the different URL pages, has reacted different interest topics, such as the page of good buying car The theme of face content reaction is automobile, and the theme that the content of pages played is biased to is amusement.Then, in the page of URL Appearance establishes Topic Profile, and is predicted by interest disaggregated model not marking the category of interest URL pages, waits for that interest is pre- After the completion of survey, and the category of interest of mark is given to the user for accessing URL.
According to this preferred embodiment of the invention, interest can be divided into amusement, finance and economics finance, movement, digital product, tourism, Automobile, literature and art, the political situation of the time, health care and military 10 classifications.As shown in Fig. 2, interest classification is main including training pattern and emerging Interest 2 stages of prediction:LR graders are trained by sample set first, then use the LR graders of training to the page of access into Row interest classifies in the training pattern stage, first by the text of the crawler capturing sample data URL pages, and the sample to crawling The pretreatment operations such as this text segmented, filtering useless word form the training sample after participle;Then by treated sample This training LR sorter models is in interest forecast period, it is necessary first to capturing the web page contents of URL to be sorted, and be divided The pretreatment operations such as word, filtering useless word;Then predict that the URL pages carry out category of interest by LR sorter models, and will Category of interest is as the hobby for accessing the URL user.
Step 2 selects the behavior phase of seed crowd according to the behavior similarity calculating method of user from all groups Like crowd.
Target group be essentially all with seed crowd have similar user behavior, therefore choose recommendation crowd when first According to user behavior, the behavior similar crowd of seed crowd is selected.Since user behavior is all one that user once accessed Media (or website) information of series, according to traditional method for measuring similarity, such as cosine similarity, Pearson correlation coefficient Deng, and these methods those of only only account for accessing between two users media jointly, have ignored the influence of other media.If Can estimate user entire mediaspace Density Distribution, then according to user mediaspace density, to calculate two The behavior similarity of user can be more in line with reality.
According to this preferred embodiment of the invention, the thought of cuclear density method is used to estimate user in mediaspace Density.Common kernel function has uniform kernel function, triangle kernel function, gaussian kernel function etc., but influence of the shape of core to result Smaller than window width is more, then use gaussian kernel function in embodiment estimate user mediaspace density.
Defined in embodiment:Assuming that M (u) indicates that the media collection that user u is accessed, h indicate window width,Indicate user u Access media MjCounting,Indicate media MiWith media MjThe distance between, UiExpression has accessed media MiUser collection It closes, UjExpression has accessed media MjUser set, then user access media density be:
And
Density Estimator is carried out by the above method, the Density Distribution of entire mediaspace can be obtained.Then pass through matchmaker The cuclear density of body is distributed to calculate the behavior similarity between two users.According to this preferred embodiment of the invention, it uses KL divergences calculate the behavior similarity of two users.
Defined in embodiment:Assuming that PuAnd PvIt is the cuclear density distribution for being user u, user v respectively, then PuAnd PvDivergence For:
Since there is KL divergences asymmetry to calculate two by following formula according to this preferred embodiment of the invention The behavior similarity of a user, i.e.,:
According to this preferred embodiment of the invention, in the similar crowd of housing choice behavior, the media of user's access are first depending on, User is estimated in the density of mediaspace, the behavior similarity of seed user and other users is then calculated, phase is finally set The threshold value answered, and select behavior similar crowd of user set of the behavior similarity not less than threshold value as seed crowd.
When step 3 selects candidate crowd, the method based on user's similarity is used first, in the similar crowd of subordinate act, Calculate the similarity of each seed user and other users.Then certain threshold value is set, and select similarity and be more than threshold value Neighbours of the user as the seed user.Finally using the neighborhood of the seed user of left and right as candidate crowd.
According to this preferred embodiment of the invention, when selecting potential target user, there is no direct subordinate act is similar Directly go to choose in crowd, be on the one hand because when the media that user accesses are less, cannot using the method for behavior similarity Behavior accurately between measure user is similar.On the other hand it is because being highly susceptible to other users during selection The influence of general picture attack.Then, according to this preferred embodiment of the invention, pass through selected seed user in the similar crowd of subordinate act Neighbours, the lower user of those similitudes is filtered out according to this, and using all neighbours as candidate crowd, to enhance referrer The quality of group.When choosing candidate crowd, the method that uses user's similarity.User's similarity be then by user behavior and User characteristics weigh the similarity degree between user, it is the behavior of the feature and user according to user, calculate the spy of user The behavior similarity of similarity and user is levied, and corresponding weight is arranged to characteristic similarity and behavior similarity, is then passed through The method of weighting calculates the similarity between user.
Due to user feature mainly include the ascribed characteristics of population and category of interest, calculate user characteristic similarity When, the method for measuring similarity of different characteristic need to be studied.Thus according to presently preferred embodiment of the invention, according to the feature of user Difference calculates separately the ascribed characteristics of population similarity and Interest Similarity of user.
When calculating the similarity of the ascribed characteristics of population, the value type for considering the ascribed characteristics of population is needed.The value master of the ascribed characteristics of population It is divided into two kinds of numeric type and title type, then, according to this preferred embodiment of the invention, by user in numeric type and title Distance in type calculates the similarity of user property.
Distance D in numeric type featurenumber, then following methods are used to measure:
Wherein djThe distance in two users on subcharacter j is indicated, if all djAll be 0, then DnumberDefault value It is 1.
Distance D in title type featurenominal, then following methods are used to measure:
Assuming that the value number of title type attribute is N, then manually graded in order to all values, i.e., all comments Grade is r1, r2..., rNIf grading of two users on the attribute is respectively riAnd rj, then two users are on the attribute Distance is | ri-rj|, therefore distance of two users in all title type features is:
Wherein d'jThe distance in two users on subcharacter j is indicated, if all d'jBe all 0, then DnominalAcquiescence Value is 1.
Defined in embodiment:Assuming that there are user u and user v, DnumberIt is two users in all numeric type features Distance, DnominalFor distance of two users in all title type features, then the ascribed characteristics of population similarity of user u and user v For:
Measurement for Interest Similarity, presently preferred embodiment of the invention use the similarity meter based on interest fingerprint Calculation method.As shown in figure 3, for the hobby of each user, the interest fingerprint of user is generated.The specific generation of interest fingerprint Process is as follows:
1. hashing.All interest is hashed, several K hashed value is obtained.
2. weighting.Extract all hobbies of user, and the probability right of each interest, and with corresponding hashed value It is multiplied, if certain position of hashed value is 1, which is multiplied with probability right, if the position is 0, this is -1 and probability right Product.
3. adding up.All of the above hashed value to each progress accumulation operations, the number only there are one sequence is generated String.
4. dimensionality reduction.The character string for the numeric string that above-mentioned accumulation step obtains being become 0 and 1, forms final interest fingerprint. If each is more than 0, which is denoted as 1, if being less than 0, which is denoted as 0.Finally this K number is linked in sequence, As interest fingerprint.
Defined in embodiment:Assuming that the interest fingerprint of user u and user v is respectively fuAnd fv, then the interest phase of two users It is like degree:
Wherein fuiAnd fviThe interest fingerprint of user u and user v in i-th bit is indicated respectively.
User characteristics are the inherent attributes of user, and user characteristics contain two aspects of the ascribed characteristics of population and interest, because This user characteristics similarity includes similarity two parts of the similarity and interest of the ascribed characteristics of population.Since the ascribed characteristics of population and user are emerging Interest is to describe user characteristics from different aspect, belongs to different dimensional spaces, ascribed characteristics of population similarity between user and emerging Interesting similarity is different, can all influence the similarity of user characteristics, then presently preferred embodiment of the invention uses harmonic average Method calculate the characteristic similarity of user.
Defined in embodiment:Assuming that there are user u and user v, simP(u, v) is that the ascribed characteristics of population of two users is similar Degree, simI(u, v) is the Interest Similarity of two users, then the characteristic similarity of user u and user v are:
User not only has inherent user characteristics, but also includes dynamic user behavior.User's similarity be from Family characteristic similarity and user behavior similarity two dimensions weigh the similarity degree between user, are weighed due to each dimension Degree is different, therefore when similarity between measure user, uses the method for weighting to calculate, i.e., by similar to two Corresponding weight is arranged in degree, is then combined with the result of two similarities.
Defined in embodiment:Assuming that there are user u and user v, simB(u, v) is the behavior similarity of two users, simF (u, v) is the characteristic similarity of two users, then the similarity of user u and user v are:
Sim (u, v)=α simB(u,v)+(1-α)simF(u,v)
Wherein α is the weight of behavior similarity, and 1- α are characterized the weight of similarity.
The target that candidate's mass selection takes finds out higher kind of similarity mainly using user behavior and user characteristics as foundation The neighbours of child user.The process includes mainly following two stages:
1. first against each seed user, the similarity of each user in crowd similar to behavior is calculated.
2. corresponding threshold value is arranged, the candidate crowd of seed crowd is selected.In this stage, similarity is set first Threshold value, and it is directed to each seed user, select neighbour of the similarity not less than those of threshold value user's set as seed user It occupies.Finally using all neighborhoods selected as the candidate crowd of seed crowd.
Step 4 is not since the user in candidate crowd is the neighborhood selected for each seed user, but not Be each user has higher similitude with all seed users, then, according to this preferred embodiment of the invention, from The whole angle of seed crowd is set out, the method oriented by crowd, this method with user characteristics and user behavior be choose according to According to the similarity of each user and seed crowd in the candidate crowd of calculating, the higher user of dynamic select similarity is as latent Target group.
Crowd's orientation method mainly dynamic select from candidate crowd goes out potential target group, includes mainly three ranks Section:
1. the similarity of each user and seed user in candidate crowd are calculated first, then according to user and all kinds The similarity of child user calculates the average value of similarity, and as the similarity of user and seed crowd.
2. according to the similarity of all users and seed crowd, calculate the average value of similarity, and using this average value as The threshold value of similarity.
The user that 3. user and seed crowd's similarity are selected from candidate crowd not less than threshold value gathers, and by these User is as potential target group.
To ensure the performance of crowd's orientation, model evaluation can be carried out:
(1) performance evaluation
Index evaluation is carried out to system performance.Index includes:Precision, recall rate and anti-attack ability etc..In addition to research is Except the precision and recall rate of system, it is also added into the user of general picture attack in systems, and is to study by anti-attack ability The quality that system is recommended.
It (2) can performance and complexity analyzing
Computability analysis mainly analyzes whether this method is that can calculate, can be achieved in the case where not considering complexity 's.To the np complete problem of appearance, approximate computational methods are proposed.Analysis of complexity is mainly, under the premise of computable, point Time complexity of the model in calculating process is analysed, the efficiency of model is weighed in the complexity estimation modeled.
Specific embodiments are merely illustrative of the spirit of the present invention described in this project.Technology belonging to the present invention The technical staff in field can make various modifications or additions to the described embodiments or by a similar method It substitutes, however, it does not deviate from the spirit of the invention or beyond the scope of the appended claims.
It should be understood by those skilled in the art that the embodiment of the present invention shown in foregoing description and attached drawing is only used as illustrating And it is not intended to limit the present invention.The purpose of the present invention has been fully and effectively achieved.The function and structural principle of the present invention exists It shows and illustrates in embodiment, under without departing from the principle, embodiments of the present invention can have any deformation or modification.

Claims (10)

1. crowd's orientation method based on uncertain neighbours, which is characterized in that include the following steps:
A:Obtain the feature of user comprising the ascribed characteristics of population and interest tags;
B:The similar crowd of housing choice behavior obtains user wherein according to given seed crowd by the online media sites that user accesses Between behavior similarity, and corresponding threshold value is set, and select similarity and be not less than the user of threshold value, wherein selecting User's set is used as the similar crowd of behavior;
C:The candidate crowd of selection passes through user's similarity acquisition methods, subordinate act phase wherein according to user characteristics and user behavior Like the neighbor user for selecting each seed in crowd, and using the seed-bearing neighbor user of institute as candidate crowd;With
D:For candidate crowd in the step C, the method dynamic select oriented by crowd goes out the higher user of similarity and collects It closes, and using user as potential target group;
The wherein described step A includes the following steps:
A1:According to the online media sites of access, the ascribed characteristics of population of user is predicted;With
A2:According to the webpage that user browses, the interest tags of user are predicted.
2. crowd's orientation method according to claim 1 based on uncertain neighbours, which is characterized in that the ascribed characteristics of population is special Sign includes gender, age, marital status, personal income, educational background, 7 subcharacters of occupation and industry, wherein the population attributive character The acquisition of the subcharacter predicted by the following method:
Wherein M1, M2..., MnIndicate n media, wherein Cj kIndicate that the classification j of k-th of subcharacter is user, wherein Cj k(Mi) table Show and has accessed media MiAnd the user number counting of the classification j of k-th of subcharacter, wherein p (Cj k|M1M2…Mn) it is k-th of user son The probability of the classification j of feature.
3. crowd's orientation method according to claim 1 based on uncertain neighbours, which is characterized in that the step B exists When obtaining the behavior similarity of user u and user v, obtained using following methods:
Wherein DKL(Pu||Pv) indicate PuAnd PvDivergence, wherein DKL(Pv||Pu) indicate PvAnd PuDivergence, wherein PuIndicate user The media density of u, wherein PvIndicate the media density of user v;
The wherein acquisition of divergence, using following methods:
Assuming that PuAnd PvIt is the cuclear density distribution for being user u, user v respectively, then PuAnd PvDivergence be:
Wherein M indicates the media collection accessed, wherein Pu(i) and Pv(i) indicate that user u and user v access media M respectivelyiIt is close Degree;
Wherein when estimating that user accesses media density, obtained using following methods:
And
Wherein M (u) indicates that the media collection that user u is accessed, wherein h indicate window width, whereinIndicate that user u accesses media Mj Counting, whereinIndicate media MiWith media MjThe distance between, wherein UiExpression has accessed media MiUser set, Wherein UjExpression has accessed media MjUser set.
4. crowd's orientation method according to claim 1 based on uncertain neighbours, which is characterized in that the step C exists When obtaining user's similarity, user characteristics similarity is utilized, wherein user characteristics similarity is obtained by following formula:
Wherein simP(u, v) is the ascribed characteristics of population similarity of two users, wherein simI(u, v) is that the interest of two users is similar Degree;
The value of the wherein ascribed characteristics of population is broadly divided into two kinds of numeric type and title type, wherein in the similarity for obtaining the ascribed characteristics of population When, distance of two users on numeric type and title type is used, then the similarity of the ascribed characteristics of population is prepared by the following:
Wherein DnumberIndicate distances of the user u and user v in all numeric type features, wherein DnominalIndicate user u and use Distances of the family v in all title type features;
Wherein for the range measurement in numeric type feature, obtained as the following formula:
Wherein djThe distance in two users on subcharacter j is indicated, if wherein all djAll be 0, then DnumberDefault value It is 1;
Wherein for the range measurement in title type feature, obtained using following methods:
Assuming that the value number of title type attribute is N, then manually graded in order to all values, i.e., all is rated r1, r2..., rNWherein if grading of two users on the attribute is respectively riAnd rj, then two users are on the attribute Distance is | ri-rj|, distance of the two of which user in all title type features is:
Wherein d'jThe distance in two users on subcharacter j is indicated, if wherein all d'jBe all 0, then DnominalAcquiescence Value is 1;
Wherein when obtaining the similarity of interest, the interest fingerprint of user is generated according to the interest of user first, then by emerging Interesting fingerprint obtains the Interest Similarity between user, and the generating process of wherein interest fingerprint is specific as follows:
1. hashing, wherein being hashed to all interest, several K hashed value is obtained;
2. weight, wherein extract all hobbies of user, and each interest probability right, and with corresponding hashed value Be multiplied, if wherein certain position of hashed value be 1, which is multiplied with probability right, if wherein the position be 0, the position be -1 and generally The product of rate weight;
3. adding up, wherein all of the above hashed value to each progress accumulation operations, to generate the only number there are one sequence Word string;With
4. dimensionality reduction, wherein the numeric string that above-mentioned accumulation step obtains is become 0 and 1 character string, i.e., final interest fingerprint, If each in is more than 0, which is denoted as 1, if wherein being less than 0, which is denoted as 0, and finally this K number is linked in sequence Get up, as interest fingerprint;
Wherein assume that the interest fingerprint of user u and user v is respectively fuAnd fv, then the measurement of Interest Similarity obtained by following formula:
Wherein fuiAnd fviThe interest fingerprint of user u and user v in i-th bit is indicated respectively.
5. crowd's orientation method according to claim 1 based on uncertain neighbours, which is characterized in that obtain user it Between similarity when, utilize user characteristics similarity and user behavior similarity:
Sim (u, v)=α simB(u,v)+(1-α)simF(u,v)
Wherein α is the weight of behavior similarity, and wherein 1- α are characterized the weight of similarity.
6. crowd's orientation method based on uncertain neighbours, which is characterized in that include the following steps:
A:Obtain the feature of user comprising the ascribed characteristics of population and interest tags;
B:The similar crowd of housing choice behavior obtains user wherein according to given seed crowd by the online media sites that user accesses Between behavior similarity, and corresponding threshold value is set, and select similarity and be not less than the user of threshold value, wherein selecting User's set is used as the similar crowd of behavior;
C:The candidate crowd of selection passes through user's similarity acquisition methods, subordinate act phase wherein according to user characteristics and user behavior Like the neighbor user for selecting each seed in crowd, and using the seed-bearing neighbor user of institute as candidate crowd;With
D:For candidate crowd in the step C, the method dynamic select oriented by crowd goes out the higher user of similarity and collects It closes, and using user as potential target group.
7. crowd's orientation method according to claim 6 based on uncertain neighbours, which is characterized in that the step A packets Include following steps:
A1:According to the online media sites of access, the ascribed characteristics of population of user is predicted;With
A2:According to the webpage that user browses, the interest tags of user are predicted;
Wherein the population attributive character includes following subcharacter:Gender, the age, marital status, personal income, educational background, occupation and Industry.
8. crowd's orientation method according to claim 7 based on uncertain neighbours, which is characterized in that the ascribed characteristics of population is special The acquisition of the subcharacter of sign is predicted by the following method:
Wherein M1, M2..., MnIndicate n media, wherein Cj kIndicate that the classification j of k-th of subcharacter is user, wherein Cj k(Mi) table Show and has accessed media MiAnd the user number counting of the classification j of k-th of subcharacter, wherein p (Cj k|M1M2…Mn) it is k-th of user son The probability of the classification j of feature;
The wherein described step B is obtained when obtaining the behavior similarity of user u and user v using following methods:
Wherein DKL(Pu||Pv) indicate PuAnd PvDivergence, wherein DKL(Pv||Pu) indicate PvAnd PuDivergence, wherein PuIndicate user The media density of u, wherein PvIndicate the media density of user v;
The wherein acquisition of divergence, using following methods:
Assuming that PuAnd PvIt is the cuclear density distribution for being user u, user v respectively, then PuAnd PvDivergence be:
Wherein M indicates the media collection accessed, wherein Pu(i) and Pv(i) indicate that user u and user v access media M respectivelyiIt is close Degree;
Wherein when estimating that user accesses media density, obtained using following methods:
And
Wherein M (u) indicates that the media collection that user u is accessed, wherein h indicate window width, whereinIndicate that user u accesses media Mj Counting, whereinIndicate media MiWith media MjThe distance between, wherein UiExpression has accessed media MiUser set, Wherein UjExpression has accessed media MjUser set.
9. crowd's orientation method according to claim 8 based on uncertain neighbours, which is characterized in that the step C exists When obtaining user's similarity, user characteristics similarity is utilized, wherein user characteristics similarity is obtained by following formula:
Wherein simP(u, v) is the ascribed characteristics of population similarity of two users, wherein simI(u, v) is that the interest of two users is similar Degree;
The value of the wherein ascribed characteristics of population is broadly divided into two kinds of numeric type and title type, wherein in the similarity for obtaining the ascribed characteristics of population When, distance of two users on numeric type and title type is used, then the similarity of the ascribed characteristics of population is prepared by the following:
Wherein DnumberIndicate distances of the user u and user v in all numeric type features, wherein DnominalIndicate user u and use Distances of the family v in all title type features;
Wherein for the range measurement in numeric type feature, obtained as the following formula:
Wherein djThe distance in two users on subcharacter j is indicated, if wherein all djAll be 0, then DnumberDefault value It is 1;
Wherein for the range measurement in title type feature, obtained using following methods:
Assuming that the value number of title type attribute is N, then manually graded in order to all values, i.e., all is rated r1, r2..., rNWherein if grading of two users on the attribute is respectively riAnd rj, then two users are on the attribute Distance is | ri-rj|, distance of the two of which user in all title type features is:
Wherein d'jThe distance in two users on subcharacter j is indicated, if wherein all d'jBe all 0, then DnominalAcquiescence Value is 1;
Wherein when obtaining the similarity of interest, the interest fingerprint of user is generated according to the interest of user first, then by emerging Interesting fingerprint obtains the Interest Similarity between user, and the generating process of wherein interest fingerprint is specific as follows:
1. hashing, wherein being hashed to all interest, several K hashed value is obtained;
2. weight, wherein extract all hobbies of user, and each interest probability right, and with corresponding hashed value Be multiplied, if wherein certain position of hashed value be 1, which is multiplied with probability right, if wherein the position be 0, the position be -1 and generally The product of rate weight;
3. adding up, wherein all of the above hashed value to each progress accumulation operations, to generate the only number there are one sequence Word string;With
4. dimensionality reduction, wherein the numeric string that above-mentioned accumulation step obtains is become 0 and 1 character string, i.e., final interest fingerprint, If each in is more than 0, which is denoted as 1, if wherein being less than 0, which is denoted as 0, and finally this K number is linked in sequence Get up, as interest fingerprint;
Wherein assume that the interest fingerprint of user u and user v is respectively fuAnd fv, then the measurement of Interest Similarity obtained by following formula:
Wherein fuiAnd fviThe interest fingerprint of user u and user v in i-th bit is indicated respectively.
10. crowd's orientation method according to claim 9 based on uncertain neighbours, which is characterized in that obtaining user Between similarity when, utilize user characteristics similarity and user behavior similarity:
Sim (u, v)=α simB(u,v)+(1-α)simF(u,v)
Wherein α is the weight of behavior similarity, and wherein 1- α are characterized the weight of similarity.
CN201710072222.8A 2017-02-09 2017-02-09 Crowd's orientation method based on uncertain neighbours Pending CN108415913A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710072222.8A CN108415913A (en) 2017-02-09 2017-02-09 Crowd's orientation method based on uncertain neighbours

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710072222.8A CN108415913A (en) 2017-02-09 2017-02-09 Crowd's orientation method based on uncertain neighbours

Publications (1)

Publication Number Publication Date
CN108415913A true CN108415913A (en) 2018-08-17

Family

ID=63124763

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710072222.8A Pending CN108415913A (en) 2017-02-09 2017-02-09 Crowd's orientation method based on uncertain neighbours

Country Status (1)

Country Link
CN (1) CN108415913A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109767267A (en) * 2018-12-29 2019-05-17 微梦创科网络科技(中国)有限公司 A kind of target user's recommended method and device for advertisement dispensing
CN109903086A (en) * 2019-02-14 2019-06-18 北京奇艺世纪科技有限公司 A kind of similar crowd's extended method, device and electronic equipment
CN110135916A (en) * 2019-05-23 2019-08-16 北京优网助帮信息技术有限公司 A kind of similar crowd recognition method and system
CN110458432A (en) * 2019-07-30 2019-11-15 国网福建省电力有限公司 A kind of electric power Optical Transmission Network OTN reliability diagnostic method based on cloud model
WO2020192013A1 (en) * 2019-03-27 2020-10-01 平安科技(深圳)有限公司 Directional advertisement delivery method and apparatus, and device and storage medium
CN112445985A (en) * 2019-08-27 2021-03-05 上海开域信息科技有限公司 Similar population acquisition method based on browsing behavior optimization
CN113011922A (en) * 2021-03-18 2021-06-22 北京百度网讯科技有限公司 Similar population determination method and device, electronic equipment and storage medium
CN114048294A (en) * 2022-01-11 2022-02-15 智者四海(北京)技术有限公司 Similar population extension model training method, similar population extension method and device
CN116484115A (en) * 2023-05-17 2023-07-25 北京淘友天下技术有限公司 Friend-making recommendation system and method with intelligent analysis function
CN116823360A (en) * 2023-07-13 2023-09-29 天津瀛智科技有限公司 Intelligent advertisement plan generation method and system based on user behaviors

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103209342A (en) * 2013-04-01 2013-07-17 电子科技大学 Collaborative filtering recommendation method introducing video popularity and user interest change
CN104317822A (en) * 2014-09-29 2015-01-28 新浪网技术(中国)有限公司 Population property prediction method and device of network user
CN104751354A (en) * 2015-04-13 2015-07-01 合一信息技术(北京)有限公司 Advertisement cluster screening method
CN105447730A (en) * 2015-12-25 2016-03-30 腾讯科技(深圳)有限公司 Target user orientation method and device
CN105893609A (en) * 2016-04-26 2016-08-24 南通大学 Mobile APP recommendation method based on weighted mixing
US20160343026A1 (en) * 2015-05-19 2016-11-24 Facebook, Inc. Adaptive advertisement targeting based on performance objectives

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103209342A (en) * 2013-04-01 2013-07-17 电子科技大学 Collaborative filtering recommendation method introducing video popularity and user interest change
CN104317822A (en) * 2014-09-29 2015-01-28 新浪网技术(中国)有限公司 Population property prediction method and device of network user
CN104751354A (en) * 2015-04-13 2015-07-01 合一信息技术(北京)有限公司 Advertisement cluster screening method
US20160343026A1 (en) * 2015-05-19 2016-11-24 Facebook, Inc. Adaptive advertisement targeting based on performance objectives
CN105447730A (en) * 2015-12-25 2016-03-30 腾讯科技(深圳)有限公司 Target user orientation method and device
CN105893609A (en) * 2016-04-26 2016-08-24 南通大学 Mobile APP recommendation method based on weighted mixing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DONG LIU等: "Mining micro-blogging users’interest features via fingerprint generation", 《PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ELECTRONICS ENIGINEERING(ICCSEE 2013)》 *
荣辉桂等: "基于用户相似度的协同过滤推荐算法", 《通信学报》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109767267A (en) * 2018-12-29 2019-05-17 微梦创科网络科技(中国)有限公司 A kind of target user's recommended method and device for advertisement dispensing
CN109767267B (en) * 2018-12-29 2020-12-01 微梦创科网络科技(中国)有限公司 Target user recommendation method and device for advertisement delivery
CN109903086B (en) * 2019-02-14 2020-12-18 北京奇艺世纪科技有限公司 Similar crowd expansion method and device and electronic equipment
CN109903086A (en) * 2019-02-14 2019-06-18 北京奇艺世纪科技有限公司 A kind of similar crowd's extended method, device and electronic equipment
WO2020192013A1 (en) * 2019-03-27 2020-10-01 平安科技(深圳)有限公司 Directional advertisement delivery method and apparatus, and device and storage medium
CN110135916A (en) * 2019-05-23 2019-08-16 北京优网助帮信息技术有限公司 A kind of similar crowd recognition method and system
CN110458432A (en) * 2019-07-30 2019-11-15 国网福建省电力有限公司 A kind of electric power Optical Transmission Network OTN reliability diagnostic method based on cloud model
CN110458432B (en) * 2019-07-30 2022-10-04 国网福建省电力有限公司 Cloud model-based reliability diagnosis method for electric power optical transmission network
CN112445985A (en) * 2019-08-27 2021-03-05 上海开域信息科技有限公司 Similar population acquisition method based on browsing behavior optimization
CN113011922A (en) * 2021-03-18 2021-06-22 北京百度网讯科技有限公司 Similar population determination method and device, electronic equipment and storage medium
CN113011922B (en) * 2021-03-18 2023-08-04 北京百度网讯科技有限公司 Method and device for determining similar crowd, electronic equipment and storage medium
CN114048294A (en) * 2022-01-11 2022-02-15 智者四海(北京)技术有限公司 Similar population extension model training method, similar population extension method and device
CN116484115A (en) * 2023-05-17 2023-07-25 北京淘友天下技术有限公司 Friend-making recommendation system and method with intelligent analysis function
CN116823360A (en) * 2023-07-13 2023-09-29 天津瀛智科技有限公司 Intelligent advertisement plan generation method and system based on user behaviors
CN116823360B (en) * 2023-07-13 2024-02-06 天津瀛智科技有限公司 Intelligent advertisement plan generation method and system based on user behaviors

Similar Documents

Publication Publication Date Title
CN108415913A (en) Crowd's orientation method based on uncertain neighbours
Fayazi et al. Uncovering crowdsourced manipulation of online reviews
Yang et al. Friend or frenemy? Predicting signed ties in social networks
CN102902691B (en) Recommend method and system
Shinde et al. Hybrid personalized recommender system using centering-bunching based clustering algorithm
CN110162703A (en) Content recommendation method, training method, device, equipment and storage medium
CN106022800A (en) User feature data processing method and device
CN107835113A (en) Abnormal user detection method in a kind of social networks based on network mapping
CN103678618A (en) Web service recommendation method based on socializing network platform
CN107577682A (en) Users' Interests Mining and user based on social picture recommend method and system
CN108805598A (en) Similarity information determines method, server and computer readable storage medium
CN111898031A (en) Method and device for obtaining user portrait
CN108053050A (en) Clicking rate predictor method, device, computing device and storage medium
Yu et al. Spectrum-enhanced pairwise learning to rank
CN103136309A (en) Method for carrying out modeling on social intensity through learning based on core
CN116823410B (en) Data processing method, object processing method, recommending method and computing device
Cai et al. An extension of social network group decision-making based on trustrank and personas
CN111611469A (en) Identification information determination method and device, electronic equipment and storage medium
Gavrilev et al. Anomaly detection in networks via score-based generative models
Liao et al. Accumulative Time Based Ranking Method to Reputation Evaluation in Information Networks
CN112632275B (en) Crowd clustering data processing method, device and equipment based on personal text information
Sun Topic modeling and spam detection for short text segments in web forums
Rozario et al. Community detection in social network using temporal data
Xu et al. Identify user variants based on user behavior on social media
Ding et al. Clustering Merchants and Accurate Marketing of Products Using the Segmentation Tree Vector Space Model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180817