CN111506833A - Friend recommendation method based on single-source SimRank accurate solution - Google Patents
Friend recommendation method based on single-source SimRank accurate solution Download PDFInfo
- Publication number
- CN111506833A CN111506833A CN202010536506.XA CN202010536506A CN111506833A CN 111506833 A CN111506833 A CN 111506833A CN 202010536506 A CN202010536506 A CN 202010536506A CN 111506833 A CN111506833 A CN 111506833A
- Authority
- CN
- China
- Prior art keywords
- node
- probability
- slave
- calculation
- source
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 239000013598 vector Substances 0.000 claims abstract description 77
- 238000004364 calculation method Methods 0.000 claims abstract description 50
- 239000011159 matrix material Substances 0.000 claims abstract description 35
- 235000002566 Capsicum Nutrition 0.000 claims abstract description 10
- 239000006002 Pepper Substances 0.000 claims abstract description 10
- 241000722363 Piper Species 0.000 claims abstract description 10
- 235000016761 Piper aduncum Nutrition 0.000 claims abstract description 10
- 235000017804 Piper guineense Nutrition 0.000 claims abstract description 10
- 235000008184 Piper nigrum Nutrition 0.000 claims abstract description 10
- 238000005295 random walk Methods 0.000 claims description 65
- 230000008569 process Effects 0.000 claims description 11
- 238000013215 result calculation Methods 0.000 claims description 5
- 238000012546 transfer Methods 0.000 claims description 5
- 230000007704 transition Effects 0.000 claims description 5
- 230000035515 penetration Effects 0.000 claims description 4
- 230000000694 effects Effects 0.000 abstract description 5
- 230000006870 function Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 9
- 238000004590 computer program Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 244000046052 Phaseolus vulgaris Species 0.000 description 1
- 235000010627 Phaseolus vulgaris Nutrition 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000000523 sample Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a friend recommendation method based on a single-source SimRank accurate solution, which comprises the following steps of: converting the target user, the user and the relationship among the users into a graph structure G; compute source node viConstructing a personalized Pepper rank vector with respect to the personalized Pepper ranks for all nodes on the graphCalculating the probability of no-meeting of all nodes on the graph structure G to form a probability matrix of no-meetingAccording to n-dimensional vectorsAnd no longer meeting probability matrixComputing on a source node viObtaining n-dimensional vector according to the SimRank similarityL rounds of SimRank similarity calculation are repeatedly performed and the n-dimensional vector is subjected toUpdating is carried out; finding n-dimensional vectorsAnd recommending the user corresponding to the t-dimension with the largest value as a result to the target user. The friend recommendation method based on the single-source SimRank accurate solution can ensure that the accurate solution of the single-source SimRank similarity on a large-scale user group can be obtained within effective time, and the quality and effect of a friend recommendation function are improved.
Description
Technical Field
The invention relates to a recommendation method, in particular to a friend recommendation method based on a single-source SimRank accurate solution.
Background
With the accelerated penetration of the China Mobile Internet into the national world, the mobile social users have seen a massive growth. The Chinese mobile social contact user scale in 2018 reaches 7.37 hundred million as shown in a Chinese mobile social contact industry special report published in 2019 by media consultation, the future two years are expected to be steadily increased, 8 hundred million people are expected to be broken through in 2020, huge user groups show huge market space, and various social contact software is also stimulated. According to the functions of social software, mainstream software can be roughly divided into instant messaging social applications represented by WeChat and QQ; a media social application represented by a microblog; interest type social applications represented by bean and known friends and friend type social applications represented by strange and probe.
The functions of various types of social software are different, but the social software generally supports a friend recommendation function, namely, the similarity between specified users is calculated according to the existing friend relationship network, and the users with high similarity but not friends are recommended to the specified users, so that the users are helped to find friends related to the interests of the users.
In the friend recommendation process, the accuracy of similarity calculation between users can directly influence the quality of the recommendation result. A reasonable similarity measurement mode and an accurate similarity calculation result are necessary conditions for accurate friend recommendation. The SimRank similarity is distinguished from a plurality of similarity calculation methods by the intuitive design idea, the actual recursive definition method and the high-quality calculation result, and gradually becomes a widely applied similarity measurement method. Therefore, in an application scenario of friend recommendation, people generally measure the similarity degree between users by means of a SimRank similarity result.
At present, the algorithm in the prior art can realize an accurate solution of the SimRank similarity on a small-scale user group, but cannot calculate an accurate solution on a large graph.
Based on the discovery of the inventor of the application, the algorithm in the prior art can realize friend recommendation of SimRank similarity on a small-scale user group. Along with the rapid expansion of the user scale, the calculation time of the SimRank similarity is increased, and in the face of ultra-large-scale user groups (such as social applications with tens of millions of users, such as twitter and microblog), the existing method cannot accurately calculate the SimRank similarity between the user to be recommended and all users within effective time, only certain accuracy is sacrificed to obtain an estimated value of the SimRank similarity, and the quality and the effect of a friend recommendation function are influenced to a great extent.
The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.
Disclosure of Invention
The invention aims to provide a friend recommendation method based on a single-source SimRank accurate solution, which can improve the quality and effect of a friend recommendation function.
In order to achieve the purpose, the invention provides a friend recommendation method based on a single-source SimRank accurate solution, which comprises the following steps: converting a target user, a user and a relation among the users into a graph structure G, wherein the graph structure G comprises nodes corresponding to the users and edges corresponding to the relation among the users, and the target user is a source node v of the graph structureiThe graph structure G comprises n nodes; in graph structure G, a source node v is computediConstructing a personalized Pepper rank vector with respect to the personalized Pepper ranks for all nodes on the graph wherein ,is a vector of dimensions n to n,is the slave source node viThe random walk of the departure eventually stops at node vkProbability of vkThe random walk is any node in the graph structure and takes at each stepIs stopped at a probability ofThe probability of the node randomly goes to any neighbor node of the current node; calculating the probability of no-meeting of all nodes on the graph structure G to form a probability matrix of no-meetingThe no longer meeting probability matrixThe k-th element on the diagonal stores a value of node vkNo longer meeting probabilityAccording to n-dimensional vectorsAnd no longer meeting probability matrixComputing on a source node viObtaining the n-dimensional vector by the single source SimRank similarityL rounds of SimRank similarity calculation are repeatedly performed and the n-dimensional vector is subjected toUpdating to obtain n-dimensional vector after L updated rounds wherein ,c is the attenuation coefficient, c ∈ [0, 1 ]]Calculating the absolute error of the result for the SimRank; finding n-dimensional vectorsAnd recommending the user corresponding to the t-dimension with the largest value as a result to the target user.
In one possible implementation, in the graph structure G, the source node v is calculatediConstructing an n-dimensional vector with respect to personalized Pepper's ranking of all nodes on the graphThe method comprises the following steps: according to source node viTo node vkCalculates the source node v according to the personalized Peck ranking vector of the neighbor nodeiTo node vkWherein node v is a set of nodeskIs any node in the graph structure G.
In one possible implementation, the rootAccording to source node viTo node vkCalculates the source node v according to the personalized Peck ranking vector of the neighbor nodeiTo node vkThe personalized Peltier ranking vector comprises a probability transition matrix P of a graph structure G, wherein the probability transition matrix P is a matrix with n × n dimensions, and the value recorded at the jth row and the ith column is a slave node viOne-step transfer to node v along incoming edgejThe probability of (d); according to a formulaUpdate and count the vectorStoring, wherein the first formula comprises:
wherein l is an intermediate variable, l is 0, 1,.., L, and are n-dimensional vectors, i is more than or equal to 0 and less than or equal to n-1,andis initialized to Is an n-dimensional vector of 0 in all dimensions except the ith dimension of 1,
repeatedly L times, one pair by onePerforming an updating process to obtain an updated n-dimensional vector
In one possible implementation, the vector is based on n dimensionsAnd no longer meeting probability matrixComputing on a source node viObtaining the n-dimensional vector by the single source SimRank similarityL rounds of SimRank similarity calculation are repeatedly performed and the n-dimensional vector is subjected toUpdating to obtain n-dimensional vector after L updated roundsThe method comprises the following steps: when the first round of calculation is performed, the SimRank similarity is calculated according to the following formula:
when performing the second to L th calculation, two pairs of n-dimensional vectors are calculated according to the formulaUpdating, repeating the updating process until the L th round of calculation to obtain the n-dimensional vector after L updated roundsThe second formula is:
wherein ,for the no-meet probability matrix, l is an intermediate variable, l is 0, 1 Is an n-dimensional vector.
In a possible implementation manner, the probability of no-encounter of all nodes in the graph structure G is calculated to form a probability matrix of no-encounterThe method comprises the following steps: obtaining a node v in a graph structurek(ii) a Judging node vkWhether the degree of income belongs to a preset condition or not, wherein the preset condition comprises a node vkThe degree of penetration of (a) is 0 or 1; if yes, returning to the node v according to the preset conditionkWhen v is no longer metkWhen the degree of income is 0, the node vkNo longer meeting probabilityWhen v iskWhen the degree of income is 1, the node vkNo longer meeting probabilityIf not, calculating the slave node vkTwo starting random walks firstly meet a node v in the first stepqProbability of Zl(k, q), repeating the calculation of Zl(k, q) up to slave node vkSum of lengths of all extended paths EkGreater than or equal to a preset value, wherein the preset value isR (k) is a required slave node vkThe number of random walks generated by the station, as the n-dimensional vectorA value of the kth dimension of (1); obtaining a slave node vkSum of lengths of all extended paths EkWhen the number of the total layers is larger than or equal to a preset value, the total layers l (k) of the two random walk layers are obtained; compute slave vkStarting two random walks to generate the probability that the random walks meet after the step l (k), and repeating the calculation of the probability that the random walks meet after the step l (k) for R (k) times, wherein the random walks are slave nodes vkStarting to generate two random walks which do not stop in the first step (l), (k), and starting from the first step (l), (k) +1, the walking time of each step is countedIs stopped at a probability ofThe probability of the node randomly goes to any neighbor of the current node; generating the calculation result of random walk according to R (k) times and the slave node vkSum of lengths of all extended paths EkZ is greater than or equal to the preset valuel(k, q) calculation result calculation node vkNo longer meeting probabilityAnd node vkNo longer meeting probabilityStore to no longer meet matrixThe kth element on the diagonal; updating the value of k and repeating the above calculation until the node vkAll nodes in the graph structure are traversed.
In one possible implementation, the compute slave vkTwo starting random walks firstly meet a node v in the first stepqProbability of Zl(k, q), repeating the calculation of Zl(k, q) up to slave node vkSum of lengths of all extended paths EkThe preset value or more comprises the following steps: according to node vkTransferring to a node v along an incoming edge through step IqProbability (P) ofT)l(k, q), calculating the slave node vkTwo starting random walks firstly meet a node v in the first stepqProbability of Zl(k, q); for all (P)T)l-l′Node v with (k, q') > 0q′To (P)T)l′+1(q′,q)、EkAnd l 'are updated until l' ═ l, where (P)T)l-l′(k, q') represents a representation node vkIs transferred to a node v through l-l' stepsq′Probability of (P)T)l′+1(q', q) represents node vkIs transferred to a node v through l' +1 stepq′The probability of (d); updating the value of l; repeating the above steps until EkGreater than or equal to the preset value.
In a possible implementation, the node v is a node bkTransferring to the node v after l steps along the incoming edgeqProbability (P) ofT)l(k, q), calculating the slave node vkTwo starting random walks firstly meet a node v in the first stepqProbability of Zl(k, q) comprises:
performing Z according to the formulal(k, q), the formula three is:
wherein ,cl((PT)l(k,q))2To be driven fromNode vkTwo starting random walks meet a node v in the first stepqThe probability of (a) of (b) being,to the slave node vkTwo random walks that start meet at node v before the first stepqAnd reaches node vqProbability of having met before, (P)T)l(k, q) represents a node vkTransferring to a node v along an incoming edge through step IqProbability of (P)T)0(k, k) is initialized to 1, pair(PT)0(k, x) is initialized to 0, for(PT)l(k, x) is initialized to 0; l is an intermediate variable, initialized to 0.
In one possible implementation, all (P)T)l-l′Node v with (k, q') > 0q′To (P)T)l′+1(q′,q)、EkAnd l' updating includes:
according to the formula four pairs of all (P)T)l-l′Node v with (k, q') > 0qOver (P)T)l′+1(q′,q)、EkAnd l' are updated, and the formula four is:
(PT)l′+1(q′,q)=(PT)l′+1(q′,q)+(PT)l′(q′,x)/din(vx)
Ek=Ek+1
l′=l′+1
wherein, the node vxMeans all (P)T)l′(q', x) > 0.
In one possible implementation, the calculation result of the random walk generated according to R (k) times and the slave node vkExtension ofSum of lengths of all paths taken out EkZ is greater than or equal to the preset valuel(k, q) calculation result calculation node vkNo longer meeting probabilityThe method comprises the following steps:
calculating the probability of no-more-meeting of the node k according to the formula fiveThe fifth formula is:
wherein ,representing the probability that two resulting random walks originating from node vk meet after step l (k), cl(k)Representing the probability that two random walks will not stop in the first step, l (k) being EkValue of variable l, Z, at or above a predetermined valuel(k, q) is a slave node vkTwo starting random walks firstly meet a node v in the first stepqThe probability of (d); and i (w) is an indicator variable used for counting whether the random walks meet in the w-th generation process, w is less than or equal to r (k), when two random walks produced in the w-th generation meet, i (w) is 1, otherwise, i (w) is 0.
In a possible implementation manner, the recommending users corresponding to the t nodes as a result to a target user includes: the found t nodes correspond to users in the social network; and eliminating users who have friend relations with the user to be recommended, and recommending the rest users to the target user.
Compared with the prior art, the friend recommendation method based on the single-source SimRank accurate solution provided by the embodiment repeatedly performs L rounds of single-source SimRank similarity calculation and performs n-dimensional vector calculationUpdating to find n-dimensional vectorRecommending the user corresponding to the t-dimension with the largest value as a result to the target user, wherein the value of the t-dimension is larger than the value of the corresponding node, and the value of the t-dimension can be judged according to the judgment result of the judgment result and can be judged according to the judgment result of O (logn @)2And obtaining estimated values of the SimRank similarity between all users and the users to be recommended under the time complexity of + m log (1/), wherein the absolute error between the estimated values of the SimRank similarity between the users and the users to be recommended and the true values is not more than. When set to 10-7And when the sink variable type is used for storing the sink result, an accurate solution of the sink similarity can be obtained in an effective time, and the quality and effect of the friend recommendation function are improved.
Further, the friend recommendation method based on the single-source SimRank accurate solution provided by this embodiment does not need to perform preprocessing of a graph structure, and can accurately calculate the single-source SimRank similarity of a dynamically changing group (such as the appearance of a new user, the logout of an existing user, the change of a friend relationship, and the like), thereby implementing friend recommendation for the dynamically changing user group.
Drawings
Fig. 1 is a flowchart of a friend recommendation method based on a single-source SimRank accurate solution according to an embodiment of the present invention;
fig. 2 is a flowchart of one implementation of step S3 provided according to an embodiment of the present invention.
Detailed Description
The following detailed description of the present invention is provided in conjunction with the accompanying drawings, but it should be understood that the scope of the present invention is not limited to the specific embodiments.
Throughout the specification and claims, unless explicitly stated otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element or component but not the exclusion of any other element or component.
As shown in fig. 1, which is a flowchart of a friend recommendation method based on a single-source SimRank accurate solution according to a preferred embodiment of the present invention, the method includes: steps S1-S5.
In step S1, a target user, a user, and a relationship between users are converted into a graph structure G, where the graph structure G includes a node corresponding to the user and an edge corresponding to the relationship between the users, and the target user is a source node v of the graph structureiThe graph structure G includes n nodes.
The users in this embodiment refer to all registered users on the platform, and the relationship between the users may specifically be a concern relationship between the users. For example, all registered buddies on facebook and the buddy relationship network.
Specifically, for social networks with concern relationships, such as microblogs, facebooks, instagrams, and the like, users of the social networks correspond to nodes on a graph structure, and concern relationships among the users correspond to edges on the graph structure. Specifically, if the a user pays attention to the B user, a directed edge from the B user node to the a user node needs to be established on the graph structure. (example B- > A), where B is the in-neighbor node of A, A is the out-neighbor node of B, and this edge is the out-edge of node B and is the in-edge of node A. The number of outgoing edges owned by a node is referred to as "outgoing degree", and the number of incoming edges owned by the node is referred to as "incoming degree".
For a social network with friend relationships such as WeChat and QQ, users on the social network correspond to graph nodes, and friend relationships correspond to edges on a graph structure. Specifically, if there is a buddy relationship between the a-user and the B-user (i.e., A, B are buddies of each other), a directed edge from the a-user node to the B-user node and a directed edge from the B-user node to the a-user node are established on the graph structure.
In step S2, in the graph structure G, the source node v is calculatediConstructing a personalized Pepper rank vector with respect to the personalized Pepper ranks for all nodes on the graph wherein ,is a vector of dimensions n to n,is the slave source node viThe random walk of the departure eventually stops at node vkProbability of vkThe random walk is any node in the graph structure and takes at each stepIs stopped at a probability ofThe probability of the node randomly goes to any neighbor node of the current node;
in step S3, the non-encounter probability of all nodes on the graph structure G is calculated to form a non-encounter probability matrixThe no longer meeting probability matrixThe k-th element on the diagonal stores a value of node vkNo longer meeting probability
In step S4, a vector is generated from the n-dimensional vectorAnd no longer meeting probability matrixCalculating the single-source SimRank similarity of the source node vi to obtain an n-dimensional vectorL rounds of SimRank similarity calculation are repeatedly performed and the n-dimensional vector is subjected toUpdating to obtain n-dimensional vector after L updated rounds wherein ,c is the attenuation coefficient, c ∈ [0, 1 ]]Calculating the absolute error of the result for the SimRank;
in step S5, an n-dimensional vector is foundAnd recommending the user corresponding to the t-dimension with the largest value as a result to the target user.
It should be noted that the value stored in each dimension of the SimRank vector corresponds to one node on the graph structure and the source node viAnd further obtaining the estimated value of the SimRank similarity between a certain user and the target user in the social network.
Therefore, by the friend recommendation method based on the single-source SimRank accurate solution provided by the embodiment, L rounds of SimRank similarity calculation are repeatedly performed, and an n-dimensional vector is calculatedUpdating to find n-dimensional vectorRecommending the user corresponding to the t-dimension node as a result to the target user, wherein the t-dimension node has the largest value, and the value can be judged according to the judgment result of the T-dimension node2And obtaining estimated values of the SimRank similarity between all users and the users to be recommended under the time complexity of + m log (1/), wherein the absolute error between the estimated values of the SimRank similarity between the users and the users to be recommended and the true values is not more than. When set to 10-7And when the sink variable type is used for storing the SimRank result, the large-scale user group can be obtained in the effective timeThe single-source SimRank similarity accurate solution improves the quality and effect of friend recommendation functions.
The present embodiment provides a recommendation method based on the complexity O (log n @)2+ m log (1/)), and the complexity of most other algorithmsIn contrast, the complexity of the method avoids larger n and smaller2The simultaneous occurrence of the numerator denominator makes the complexity result large, so that 10 can be calculated in the effective time-7Single source SimRank results.
In addition, the recommendation method provided by the embodiment does not need to perform preprocessing of the graph structure, can accurately calculate the SimRank similarity of dynamically changing groups (such as appearance of new users, logout of existing users, change of friend relationships, and the like), and realizes friend recommendation for the dynamically changing user groups.
In one implementation, step S5 may include: the found t nodes correspond to users in the social network; and eliminating users who have friend relations with the user to be recommended, and recommending the rest users to the target user.
In one implementation, step S2 may further include:
according to source node viTo node vkCalculates the source node v according to the personalized Peck ranking vector of the neighbor nodeiTo node vkWherein node v is a set of nodeskIs any node in the graph structure G.
Specifically, the method can be realized by the following steps:
obtaining a probability transition matrix P of the graph structure G, wherein P is a matrix with n × n dimensions, and the value recorded at the jth row and the ith column is the slave node viOne-step transfer to node v along incoming edgejThe probability of (d);
according to the formula (1) toUpdate and count the vectorStoring, wherein the formula (1) comprises:
wherein l is an intermediate variable, l is 0, 1,.., L, and are n-dimensional vectors, i is more than or equal to 0 and less than or equal to n-1,andis initialized to Is an n-dimensional vector of 0 in all dimensions except the ith dimension of 1,
repeatedly L times, one pair by onePerforming an updating process to obtain an updated n-dimensional vector
As shown in fig. 2, which is a flowchart of an implementation manner of step S3 in this embodiment, the method includes: step S31-step S38.
In the step ofIn S31, a node v in the graph structure is obtainedk。
In step S32, it is judged that the node vkWhether the degree of income belongs to a preset condition or not, wherein the preset condition comprises a node vkThe degree of penetration of (a) is 0 or 1;
in step S33, if yes, returning to the node v according to the preset conditionkWhen d isin(vk) When equal to 0, node vkNo longer meeting probabilityWhen d isin(vk) When 1, node vkNo longer meeting probabilitydin(vk) Representing a structural node v of the diagramkThe degree of entry of (c).
If not, in step S34, the slave node v is calculatedkTwo starting random walks firstly meet a node v in the first stepqProbability of Zl(k, q), repeating the calculation of Zl(k, q) up to slave node vkSum of lengths of all extended paths EkGreater than or equal to a preset value, wherein the preset value isR (k) is a required slave node vkThe number of random walks generated by the station, as the n-dimensional vectorA value of the kth dimension of (1);
in step S35, the slave node v is acquiredkSum of lengths of all extended paths EkWhen the number of the main points is larger than or equal to the preset value, two follow-up points followThe total number of layers l (k) that the machine walks;
in step S36, the slave node v is calculatedkStarting two random walks to generate the probability that the random walks meet after the step l (k), and repeating the calculation of the probability that the random walks meet after the step l (k) for R (k) times, wherein the random walks are slave nodes vkStarting to generate two random walks which do not stop in the first step (l), (k), and starting from the first step (l), (k) +1, the walking time of each step is countedIs stopped at a probability ofThe probability of the node randomly goes to any neighbor of the current node;
in step S37, according to the calculation results of the random walk generated R (k) times and the slave node vkSum of lengths of all extended paths EkZ is greater than or equal to the preset valuel(k, q) calculation result calculation node vkNo longer meeting probabilityAnd node vkNo longer meeting probabilityStore to no longer meet matrixThe kth element on the diagonal;
the implementation manner of step S34-step S37 may include: calculating the probability of no-more-meeting of the node k according to the formula (5)The formula (5) is:
wherein ,representing a slave node vkTwo starting random walks generating the probability that they meet after step l (k), cl(k)Representing the probability that two random walks do not stop in the previous step; l (k) is EkValue of variable l, Z, at or above a predetermined valuel(k, q) is a slave node vkTwo starting random walks firstly meet a node v in the first stepqThe probability of (d); i (w) is an indicator variable, i (w) is 1 when the two random walks generated at the w-th time meet, otherwise i (w) is 0.
In step S38, the value of k is updated, and the above calculation is repeated until the node vkAll nodes in the graph structure are traversed.
l (k) is EkAnd when the value of the variable l is larger than or equal to the preset value, the number of layers of the probability of no meeting after the deterministic calculation is represented. After the step S34 is completed, the value of l (k) is also determined, and thus can be used in the calculation of formula (5). As can be seen from equation (5), we will no longer meet the probabilityThe probability of no-meeting of the first (l), (k) layers is deterministically calculated by step S34 (i.e., formula (5) includes Z)l(k, q) one), the probability of no-more-meet at layers l (k) through L is obtained by the method of generating random walks in step S36The purpose of splitting into two parts is to balance the advantages and the disadvantages of the two parts, and a balance point with the shortest time and the most accurate result is found through the selection of l (k).
It should be noted that the above steps calculate each node v on the graphkNo longer meeting probabilityIs further not obtainedRe-encounter probability matrixMatrix arrayIs a diagonal matrix, i.e. only the elements on the diagonal are not 0, the matrixThe value stored by the k-th element on the diagonal is the node vkNo longer meeting probabilityMatrix arrayThe estimated value of (c) is used in the calculation of the SimRank similarity.
In one implementation, step S34 may include: step S341-step S344.
Step S341, according to the node vkTransferring to a node v along an incoming edge through step IqProbability (P) ofT)l(k, q), calculating the slave node vkTwo starting random walks firstly meet a node v in the first stepqProbability of Zl(k,q)。
wherein ,PTIs the transpose of a matrix P, wherein P is a matrix with n × n dimensions, and the value recorded at the ith column of the jth row is the slave node viOne-step transfer to node v along incoming edgejProbability of (P)', PT(i,j)=P(j,i),PTThe value recorded at the ith row and the jth column is a slave node viOne-step transfer to node v along incoming edgejThe probability of (c).
In this embodiment, the random walk is from vkWalk to node v through l steps along the incoming edgeqMeaning that the walking is performed at each stepIs stopped, and the transition in step S341 refers to the slave node vkGo to node v along incoming edge without stoppingqThe probability of (c).
Performing Z according to equation (3)l(k, q), and the formula (3) is:
wherein ,Zl(k, q) denotes the slave node vkTwo starting random walks firstly meet a node v in the first stepqProbability of (Z)0(k, k) is initialized to 1, pairZ0(k, x) is initialized to 0; c. Cl((PT)l(k,q))2To the slave node vkTwo starting random walks meet a node v in the first stepqThe probability of (a) of (b) being,to the slave node vkTwo random walks that start meet at node v before the first stepqAnd reaches node vqProbability of having met before, (P)T)l(k, q) represents a node vkTransferring to a node v along an incoming edge through step IqProbability of (P)T)0(k, k) is initialized to 1, pair(PT)0(k, x) is initialized to 0, for(PT)l(k, x) is initialized to 0; l is an intermediate variable, initialized to 0.
Equation (3) indicates that for current l, according to (P)T)l(k,q)、(PT)l′(q', q) and Zl-l′Value calculation of (k, q') Zl(k,q)。
Step S342, for all (P)T)l-l′Node v with (k, q') > 0q′To (P)T)l′+1(q′,q)、EkAnd l 'are updated until l' ═ l, where (P)T)l-l′(k, q') represents a node vkIs transferred to a node v through l-l' stepsq′Probability of (P)l′+1(q', q) represents node vq′Is transferred to a node v through l' +1 stepkThe probability of (c).
For all (P) according to formula (4)T)l-l′Node v with (k, q') > 0q′To (P)T)l′+1(q′,q)、EkAnd l', the formula (4) is:
(PT)l′+1(q′,q)=(PT)l′+1(q′,q)+(PT)l′(q′,x)/din(vx)
(PT)l-l′(k, q') represents a node vkIs transferred to a node v along an incoming edge through steps l-lq′The probability of (c). (P)T)l′(q', x) represents node vg′Transferring to a node v through l' step along the incoming edgexThe probability of (c). (P) of each point in the update processT)l-l′(k, q') are different, where (P)T)l-l′(k, q') > 0 indicates that (P) is selectedT)l-l′(k, q') > 0. (P)T)l′+1(q', q) represents node vq′Transferring to a node v along an incoming edge through steps l' +1q′Probability of, node vxMeans all (P)T)l′(q', x) > 0.
Step S343, updating the value of l, l ═ l + 1;
step S344, repeat the above steps until EkGreater than or equal to the preset value.
In one implementation, step S4 may include:
when the first round of calculation is performed, the SimRank similarity is calculated according to the following formula:
when performing the second to L th calculation, two pairs of n-dimensional vectors are calculated according to the formulaUpdating, repeating the updating process until the L th round of calculation to obtain the n-dimensional vector after L updated roundsThe formula (2) is:
wherein ,for the no-meet probability matrix, l is an intermediate variable, l is 0, 1 Is an n-dimensional vector.The values stored during the calculation of step 2.
The formula (2) needs to repeatedly execute L rounds to ensure that the finally obtained SimRank vector is under the absolute error, and the SimRank vector obtained after l rounds of calculation is recorded as
From this, an estimation result of the SimRank vector under the absolute error, SimRan, can be obtainedThe value stored in each dimension of the k vector corresponds to one node on the graph structure and a source node viAnd further obtaining the estimated value of the SimRank similarity between a certain user and the target user in the social network.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing descriptions of specific exemplary embodiments of the present invention have been presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain certain principles of the invention and its practical application to enable one skilled in the art to make and use various exemplary embodiments of the invention and various alternatives and modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.
Claims (10)
1. A friend recommendation method based on a single-source SimRank accurate solution is characterized by comprising the following steps:
converting a target user, a user and a relation among the users into a graph structure G, wherein the graph structure G comprises nodes corresponding to the users and edges corresponding to the relation among the users, and the target user is a source node v of the graph structureiThe graph structure G comprises n nodes;
in graph structure G, a source node v is computediConstructing a personalized Pepper rank vector with respect to the personalized Pepper ranks for all nodes on the graph wherein ,is a vector of dimensions n to n,is the slave source node viThe random walk of the departure eventually stops at node vkProbability of vkThe random walk is any node in the graph structure and takes at each stepIs stopped at a probability ofThe probability of the node randomly goes to any neighbor node of the current node;
calculating the probability of no-meeting of all nodes on the graph structure G to form a probability matrix of no-meetingThe no longer meeting probability matrixThe k-th element on the diagonal stores a value of node vkNo longer meeting probability
According to n-dimensional vectorsAnd no longer meeting probability matrixComputing on a source node viObtaining the n-dimensional vector by the single source SimRank similarityL rounds of SimRank similarity calculation are repeatedly performed and the n-dimensional vector is subjected toUpdating to obtain n-dimensional vector after L updated rounds wherein ,c is the attenuation coefficient, c ∈ [0, 1 ]]Calculating the absolute error of the result for the SimRank;
2. The friend recommendation method of claim 1, wherein in graph structure G, a compute source node viConstructing an n-dimensional vector with respect to personalized Pepper's ranking of all nodes on the graphThe method comprises the following steps:
according to source node viTo node vkCalculates the source node v according to the personalized Peck ranking vector of the neighbor nodeiTo node vkWherein node v is a set of nodeskIs any node in the graph structure G.
3. The friend recommendation method of claim 2, wherein the source node v is based oniTo node vkCalculates the source node v according to the personalized Peck ranking vector of the neighbor nodeiTo node vkThe personalized peck rank vector of (a) comprises:
obtaining a probability transition matrix P of the graph structure G, wherein P is a matrix with n × n dimensions, and the value recorded at the jth row and the ith column is the slave node viOne-step transfer to node v along incoming edgejThe probability of (d);
wherein l is an intermediate variable, l is 0, 1,.., L, and are n-dimensional vectors, i is more than or equal to 0 and less than or equal to n-1,andis initialized to Is an n-dimensional vector of 0 in all dimensions except the ith dimension of 1,
4. The friend recommendation method of claim 3, wherein the vector is based on n-dimensionsAnd no longer meeting probability matrixComputing on a source node viObtaining the n-dimensional vector by the single source SimRank similarityL rounds of SimRank similarity calculation are repeatedly performed and the n-dimensional vector is subjected toUpdating to obtain n-dimensional vector after L updated roundsThe method comprises the following steps:
when the first round of calculation is performed, the SimRank similarity is calculated according to the following formula:
when performing the second to L th calculation, two pairs of n-dimensional vectors are calculated according to the formulaUpdating, repeating the updating process until the L th round of calculation to obtain the n-dimensional vector after L updated roundsThe second formula is:
5. The friend recommendation method of claim 2, wherein the no-encounter probabilities for all nodes in graph structure G are computed to form a no-encounter probability matrixThe method comprises the following steps:
obtaining a node v in a graph structurek;
Judging node vkWhether the degree of income belongs to a preset condition or not, wherein the preset condition comprises a node vkThe degree of penetration of (a) is 0 or 1;
if yes, returning to the node v according to the preset conditionkWhen v is no longer metkWhen the degree of income is 0, the node vkNo longer meeting probabilityWhen v iskWhen the degree of income is 1, the node vkNo longer meeting probability
If not, calculating the slave node vkTwo points of departureThe random walk first meets the node v in the first stepqProbability of Zl(k, q), repeating the calculation of Zl(k, q) up to slave node vkSum of lengths of all extended paths EkGreater than or equal to a preset value, wherein the preset value isR (k) is a required slave node vkThe number of random walks generated by the station, as the n-dimensional vectorA value of the kth dimension of (1);
obtaining a slave node vkSum of lengths of all extended paths EkWhen the number of the total layers is larger than or equal to a preset value, the total layers l (k) of the two random walk layers are obtained;
compute slave vkStarting two random walks to generate the probability that the random walks meet after the step l (k), and repeating the calculation of the probability that the random walks meet after the step l (k) for R (k) times, wherein the random walks are slave nodes vkStarting to generate two random walks which do not stop in the first step (l), (k), and starting from the first step (l), (k) +1, the walking time of each step is countedIs stopped at a probability ofThe probability of the node randomly goes to any neighbor of the current node;
generating the calculation result of random walk according to R (k) times and the slave node vkSum of lengths of all extended paths EkZ is greater than or equal to the preset valuel(k, q) calculation result calculation node vkNo longer meeting probabilityAnd node vkNo longer meeting probabilityStore to no longer meet matrixThe kth element on the diagonal;
updating the value of k and repeating the above calculation until the node vkAll nodes in the graph structure are traversed.
6. The friend recommendation method of claim 5, wherein the compute slave node vkTwo starting random walks firstly meet a node v in the first stepqProbability of Zl(k, q), repeating the calculation of Zl(k, q up to slave node v)kSum of lengths of all extended paths EkThe preset value or more comprises the following steps:
according to node vkTransferring to a node v along an incoming edge through step IqProbability of (2)Compute slave vkTwo starting random walks firstly meet a node v in the first stepqProbability of Zl(k,q);
For allNode v ofq′To, forEkAnd l 'are updated until l', where,representing a node vkIs transferred to a node v through l-l' stepsq′The probability of (a) of (b) being,representing a node vkIs transferred to a node v through l' +1 stepq′The probability of (d);
updating the value of l;
repeating the above steps until EkGreater than or equal to the preset value.
7. The friend recommendation method of claim 6, wherein the function according to node vkTransferring to the node v after l steps along the incoming edgeqProbability of (2)Compute slave vkTwo starting random walks firstly meet a node v in the first stepqProbability of Zl(k, q) comprises:
performing Z according to the formulal(k, q), the formula three is:
wherein ,to the slave node vkTwo starting random walks meet a node v in the first stepqThe probability of (a) of (b) being,to the slave node vkTwo starting random walks meet a node v in the first stepqAnd reaches node vqThe probability of having been met before,representing a node vkTransferring to a node v along an incoming edge through step IqThe probability of (a) of (b) being,is initialized to 1, pair Is initialized to 0, pair Is initialized to 0; l is an intermediate variable, initialized to 0.
9. The friend recommendation method of claim 5, wherein the calculation result of the random walk generated according to R (k) times and the slave node vkSum of lengths of all extended paths EkZ is greater than or equal to the preset valuel(k, q) calculation result calculation node vkNo longer meeting probabilityThe method comprises the following steps:
calculating the probability of no-more-meeting of the node k according to the formula fiveThe fifth formula is:
wherein ,representing a slave node vkTwo starting random walks generating the probability that they meet after step l (k), cl(k)Representing the probability that two random walks will not stop in the first step, l (k) being EkValue of variable l, Z, at or above a predetermined valuel(k, q) is a slave node vkTwo starting random walks firstly meet a node v in the first stepqThe probability of (d); i (w) is an indicator variable for counting whether the random walks meet in the process of generating the random walk for the w time, wherein w is less than or equal to R (k), and when two random walks generated for the w time meetWhen the machine walks, i (w) is 1, otherwise, i (w) is 0.
10. The friend recommendation method of claim 1, wherein said recommending users corresponding to the t nodes to target users as a result comprises:
the found t nodes correspond to users in the social network;
and eliminating users who have friend relations with the user to be recommended, and recommending the rest users to the target user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010536506.XA CN111506833B (en) | 2020-06-12 | 2020-06-12 | Friend recommendation method based on single-source SimRank accurate solution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010536506.XA CN111506833B (en) | 2020-06-12 | 2020-06-12 | Friend recommendation method based on single-source SimRank accurate solution |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111506833A true CN111506833A (en) | 2020-08-07 |
CN111506833B CN111506833B (en) | 2023-05-02 |
Family
ID=71878836
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010536506.XA Active CN111506833B (en) | 2020-06-12 | 2020-06-12 | Friend recommendation method based on single-source SimRank accurate solution |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111506833B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111984832A (en) * | 2020-08-21 | 2020-11-24 | 中国人民大学 | Friend recommendation method based on personalized Page ranking |
CN112507245A (en) * | 2020-12-03 | 2021-03-16 | 中国人民大学 | Social network friend recommendation method based on graph neural network |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103345513A (en) * | 2013-07-09 | 2013-10-09 | 清华大学 | Friend recommendation method based on friend relationship spread in social network |
CN105512242A (en) * | 2015-11-30 | 2016-04-20 | 浙江工业大学 | Parallel recommend method based on social network structure |
CN107423308A (en) * | 2016-05-24 | 2017-12-01 | 华为技术有限公司 | subject recommending method and device |
CN109726336A (en) * | 2018-12-21 | 2019-05-07 | 长安大学 | A kind of POI recommended method of combination trip interest and social preference |
CN110287424A (en) * | 2019-06-28 | 2019-09-27 | 中国人民大学 | Collaborative filtering recommending method based on single source SimRank |
-
2020
- 2020-06-12 CN CN202010536506.XA patent/CN111506833B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103345513A (en) * | 2013-07-09 | 2013-10-09 | 清华大学 | Friend recommendation method based on friend relationship spread in social network |
CN105512242A (en) * | 2015-11-30 | 2016-04-20 | 浙江工业大学 | Parallel recommend method based on social network structure |
CN107423308A (en) * | 2016-05-24 | 2017-12-01 | 华为技术有限公司 | subject recommending method and device |
US20190087884A1 (en) * | 2016-05-24 | 2019-03-21 | Huawei Technologies Co., Ltd. | Theme recommendation method and apparatus |
CN109726336A (en) * | 2018-12-21 | 2019-05-07 | 长安大学 | A kind of POI recommended method of combination trip interest and social preference |
CN110287424A (en) * | 2019-06-28 | 2019-09-27 | 中国人民大学 | Collaborative filtering recommending method based on single source SimRank |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111984832A (en) * | 2020-08-21 | 2020-11-24 | 中国人民大学 | Friend recommendation method based on personalized Page ranking |
CN111984832B (en) * | 2020-08-21 | 2023-07-07 | 中国人民大学 | Friend recommendation method based on personalized petty ranking |
CN112507245A (en) * | 2020-12-03 | 2021-03-16 | 中国人民大学 | Social network friend recommendation method based on graph neural network |
CN112507245B (en) * | 2020-12-03 | 2023-07-18 | 中国人民大学 | Social network friend recommendation method based on graph neural network |
Also Published As
Publication number | Publication date |
---|---|
CN111506833B (en) | 2023-05-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109767301B (en) | Recommendation method and system, computer device and computer readable storage medium | |
CN109460793B (en) | Node classification method, model training method and device | |
Ghosh et al. | Misspecified linear bandits | |
CN105354277B (en) | Recommendation method and system based on recurrent neural network | |
CN107330798B (en) | Method for identifying user identity between social networks based on seed node propagation | |
Le et al. | Federated continuous learning with broad network architecture | |
CN110503531A (en) | The dynamic social activity scene recommended method of timing perception | |
Zheng et al. | Model compression based on differentiable network channel pruning | |
CN111506833A (en) | Friend recommendation method based on single-source SimRank accurate solution | |
CN113379042B (en) | Business prediction model training method and device for protecting data privacy | |
CN111325417A (en) | Method and device for realizing privacy protection and realizing multi-party collaborative updating of business prediction model | |
CN110659394A (en) | Recommendation method based on two-way proximity | |
CN110162692A (en) | User tag determines method, apparatus, computer equipment and storage medium | |
CN115775026A (en) | Federated learning method based on organization similarity | |
CN110222838B (en) | Document sorting method and device, electronic equipment and storage medium | |
US11468521B2 (en) | Social media account filtering method and apparatus | |
CN113361928B (en) | Crowd-sourced task recommendation method based on heterogram attention network | |
CN114362948A (en) | Efficient federal derivative feature logistic regression modeling method | |
CN113449176A (en) | Recommendation method and device based on knowledge graph | |
Aliakbary et al. | Noise-tolerant model selection and parameter estimation for complex networks | |
Zahoor et al. | Evolutionary computation technique for solving Riccati differential equation of arbitrary order | |
Shan et al. | An iterated carousel greedy algorithm for finding minimum positive influence dominating sets in social networks | |
Irurozki et al. | Sampling and learning the Mallows and Weighted Mallows models under the Hamming distance | |
CN116993374A (en) | Model optimization method, device, equipment and medium based on deep neural network | |
Vakili et al. | Delayed feedback in kernel bandits |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |