CN111506833A

CN111506833A - Friend recommendation method based on single-source SimRank accurate solution

Info

Publication number: CN111506833A
Application number: CN202010536506.XA
Authority: CN
Inventors: 魏哲巍; 王涵之; 袁野; 杜小勇; 文继荣
Original assignee: Renmin University of China
Current assignee: Renmin University of China
Priority date: 2020-06-12
Filing date: 2020-06-12
Publication date: 2020-08-07
Anticipated expiration: 2040-06-12
Also published as: CN111506833B

Abstract

The invention discloses a friend recommendation method based on a single-source SimRank accurate solution, which comprises the following steps of: converting the target user, the user and the relationship among the users into a graph structure G; compute source node v_iConstructing a personalized Pepper rank vector with respect to the personalized Pepper ranks for all nodes on the graph

Calculating the probability of no-meeting of all nodes on the graph structure G to form a probability matrix of no-meeting

According to n-dimensional vectors

And no longer meeting probability matrix

Computing on a source node v_iObtaining n-dimensional vector according to the SimRank similarity

L rounds of SimRank similarity calculation are repeatedly performed and the n-dimensional vector is subjected to

Updating is carried out; finding n-dimensional vectors

And recommending the user corresponding to the t-dimension with the largest value as a result to the target user. The friend recommendation method based on the single-source SimRank accurate solution can ensure that the accurate solution of the single-source SimRank similarity on a large-scale user group can be obtained within effective time, and the quality and effect of a friend recommendation function are improved.

Description

Friend recommendation method based on single-source SimRank accurate solution

Technical Field

The invention relates to a recommendation method, in particular to a friend recommendation method based on a single-source SimRank accurate solution.

Background

With the accelerated penetration of the China Mobile Internet into the national world, the mobile social users have seen a massive growth. The Chinese mobile social contact user scale in 2018 reaches 7.37 hundred million as shown in a Chinese mobile social contact industry special report published in 2019 by media consultation, the future two years are expected to be steadily increased, 8 hundred million people are expected to be broken through in 2020, huge user groups show huge market space, and various social contact software is also stimulated. According to the functions of social software, mainstream software can be roughly divided into instant messaging social applications represented by WeChat and QQ; a media social application represented by a microblog; interest type social applications represented by bean and known friends and friend type social applications represented by strange and probe.

The functions of various types of social software are different, but the social software generally supports a friend recommendation function, namely, the similarity between specified users is calculated according to the existing friend relationship network, and the users with high similarity but not friends are recommended to the specified users, so that the users are helped to find friends related to the interests of the users.

In the friend recommendation process, the accuracy of similarity calculation between users can directly influence the quality of the recommendation result. A reasonable similarity measurement mode and an accurate similarity calculation result are necessary conditions for accurate friend recommendation. The SimRank similarity is distinguished from a plurality of similarity calculation methods by the intuitive design idea, the actual recursive definition method and the high-quality calculation result, and gradually becomes a widely applied similarity measurement method. Therefore, in an application scenario of friend recommendation, people generally measure the similarity degree between users by means of a SimRank similarity result.

At present, the algorithm in the prior art can realize an accurate solution of the SimRank similarity on a small-scale user group, but cannot calculate an accurate solution on a large graph.

Based on the discovery of the inventor of the application, the algorithm in the prior art can realize friend recommendation of SimRank similarity on a small-scale user group. Along with the rapid expansion of the user scale, the calculation time of the SimRank similarity is increased, and in the face of ultra-large-scale user groups (such as social applications with tens of millions of users, such as twitter and microblog), the existing method cannot accurately calculate the SimRank similarity between the user to be recommended and all users within effective time, only certain accuracy is sacrificed to obtain an estimated value of the SimRank similarity, and the quality and the effect of a friend recommendation function are influenced to a great extent.

The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.

Disclosure of Invention

The invention aims to provide a friend recommendation method based on a single-source SimRank accurate solution, which can improve the quality and effect of a friend recommendation function.

In order to achieve the purpose, the invention provides a friend recommendation method based on a single-source SimRank accurate solution, which comprises the following steps: converting a target user, a user and a relation among the users into a graph structure G, wherein the graph structure G comprises nodes corresponding to the users and edges corresponding to the relation among the users, and the target user is a source node v of the graph structure_iThe graph structure G comprises n nodes; in graph structure G, a source node v is computed_iConstructing a personalized Pepper rank vector with respect to the personalized Pepper ranks for all nodes on the graph

wherein ,

is a vector of dimensions n to n,

is the slave source node v_iThe random walk of the departure eventually stops at node v_kProbability of v_kThe random walk is any node in the graph structure and takes at each step

Is stopped at a probability of

The probability of the node randomly goes to any neighbor node of the current node; calculating the probability of no-meeting of all nodes on the graph structure G to form a probability matrix of no-meeting

The no longer meeting probability matrix

The k-th element on the diagonal stores a value of node v_kNo longer meeting probability

According to n-dimensional vectors

And no longer meeting probability matrix

Computing on a source node v_iObtaining the n-dimensional vector by the single source SimRank similarity

Updating to obtain n-dimensional vector after L updated rounds

wherein ,

c is the attenuation coefficient, c ∈ [0, 1 ]]Calculating the absolute error of the result for the SimRank; finding n-dimensional vectors

And recommending the user corresponding to the t-dimension with the largest value as a result to the target user.

In one possible implementation, in the graph structure G, the source node v is calculated_iConstructing an n-dimensional vector with respect to personalized Pepper's ranking of all nodes on the graph

The method comprises the following steps: according to source node v_iTo node v_kCalculates the source node v according to the personalized Peck ranking vector of the neighbor node_iTo node v_kWherein node v is a set of nodes_kIs any node in the graph structure G.

In one possible implementation, the rootAccording to source node v_iTo node v_kCalculates the source node v according to the personalized Peck ranking vector of the neighbor node_iTo node v_kThe personalized Peltier ranking vector comprises a probability transition matrix P of a graph structure G, wherein the probability transition matrix P is a matrix with n × n dimensions, and the value recorded at the jth row and the ith column is a slave node v_iOne-step transfer to node v along incoming edge_jThe probability of (d); according to a formula

Update and count the vector

Storing, wherein the first formula comprises:

wherein l is an intermediate variable, l is 0, 1,.., L, and

are n-dimensional vectors, i is more than or equal to 0 and less than or equal to n-1,

and

is initialized to

Is an n-dimensional vector of 0 in all dimensions except the ith dimension of 1,

repeatedly L times, one pair by one

Performing an updating process to obtain an updated n-dimensional vector

In one possible implementation, the vector is based on n dimensions

And no longer meeting probability matrix

Updating to obtain n-dimensional vector after L updated rounds

The method comprises the following steps: when the first round of calculation is performed, the SimRank similarity is calculated according to the following formula:

when performing the second to L th calculation, two pairs of n-dimensional vectors are calculated according to the formula

Updating, repeating the updating process until the L th round of calculation to obtain the n-dimensional vector after L updated rounds

The second formula is：

wherein ,

for the no-meet probability matrix, l is an intermediate variable, l is 0, 1

Is an n-dimensional vector.

In a possible implementation manner, the probability of no-encounter of all nodes in the graph structure G is calculated to form a probability matrix of no-encounter

The method comprises the following steps: obtaining a node v in a graph structure_k(ii) a Judging node v_kWhether the degree of income belongs to a preset condition or not, wherein the preset condition comprises a node v_kThe degree of penetration of (a) is 0 or 1; if yes, returning to the node v according to the preset condition_kWhen v is no longer met_kWhen the degree of income is 0, the node v_kNo longer meeting probability

When v is_kWhen the degree of income is 1, the node v_kNo longer meeting probability

If not, calculating the slave node v_kTwo starting random walks firstly meet a node v in the first step_qProbability of Z_l(k, q), repeating the calculation of Z_l(k, q) up to slave node v_kSum of lengths of all extended paths E_kGreater than or equal to a preset value, wherein the preset value is

R (k) is a required slave node v_kThe number of random walks generated by the station,

as the n-dimensional vector

A value of the kth dimension of (1); obtaining a slave node v_kSum of lengths of all extended paths E_kWhen the number of the total layers is larger than or equal to a preset value, the total layers l (k) of the two random walk layers are obtained; compute slave v_kStarting two random walks to generate the probability that the random walks meet after the step l (k), and repeating the calculation of the probability that the random walks meet after the step l (k) for R (k) times, wherein the random walks are slave nodes v_kStarting to generate two random walks which do not stop in the first step (l), (k), and starting from the first step (l), (k) +1, the walking time of each step is counted

Is stopped at a probability of

The probability of the node randomly goes to any neighbor of the current node; generating the calculation result of random walk according to R (k) times and the slave node v_kSum of lengths of all extended paths E_kZ is greater than or equal to the preset value_l(k, q) calculation result calculation node v_kNo longer meeting probability

And node v_kNo longer meeting probability

Store to no longer meet matrix

The kth element on the diagonal; updating the value of k and repeating the above calculation until the node v_kAll nodes in the graph structure are traversed.

In one possible implementation, the compute slave v_kTwo starting random walks firstly meet a node v in the first step_qProbability of Z_l(k, q), repeating the calculation of Z_l(k, q) up to slave node v_kSum of lengths of all extended paths E_kThe preset value or more comprises the following steps: according to node v_kTransferring to a node v along an incoming edge through step I_qProbability (P) of^T)^l(k, q), calculating the slave node v_kTwo starting random walks firstly meet a node v in the first step_qProbability of Z_l(k, q); for all (P)^T)^l-l′Node v with (k, q') > 0_q′To (P)^T)^l′+1(q′，q)、E_kAnd l 'are updated until l' ═ l, where (P)^T)^l-l′(k, q') represents a representation node v_kIs transferred to a node v through l-l' steps_q′Probability of (P)^T)^l′+1(q', q) represents node v_kIs transferred to a node v through l' +1 step_q′The probability of (d); updating the value of l; repeating the above steps until E_kGreater than or equal to the preset value.

In a possible implementation, the node v is a node b_kTransferring to the node v after l steps along the incoming edge_qProbability (P) of^T)^l(k, q), calculating the slave node v_kTwo starting random walks firstly meet a node v in the first step_qProbability of Z_l(k, q) comprises:

performing Z according to the formula_l(k, q), the formula three is:

wherein ,c^l((P^T)^l(k，q))²To be driven fromNode v_kTwo starting random walks meet a node v in the first step_qThe probability of (a) of (b) being,

to the slave node v_kTwo random walks that start meet at node v before the first step_qAnd reaches node v_qProbability of having met before, (P)^T)^l(k, q) represents a node v_kTransferring to a node v along an incoming edge through step I_qProbability of (P)^T)⁰(k, k) is initialized to 1, pair

(P^T)⁰(k, x) is initialized to 0, for

(P^T)^l(k, x) is initialized to 0; l is an intermediate variable, initialized to 0.

In one possible implementation, all (P)^T)^l-l′Node v with (k, q') > 0_q′To (P)^T)^l′+1(q′，q)、E_kAnd l' updating includes:

according to the formula four pairs of all (P)^T)^l-l′Node v with (k, q') > 0_qOver (P)^T)^l′+1(q′，q)、E_kAnd l' are updated, and the formula four is:

(P^T)^l′+1(q′，q)＝(P^T)^l′+1(q′，q)+(P^T)^l′(q′，x)/d_in(v_x)

E_k＝E_k+1

l′＝l′+1

wherein, the node v_xMeans all (P)^T)^l′(q', x) > 0.

In one possible implementation, the calculation result of the random walk generated according to R (k) times and the slave node v_kExtension ofSum of lengths of all paths taken out E_kZ is greater than or equal to the preset value_l(k, q) calculation result calculation node v_kNo longer meeting probability

The method comprises the following steps:

calculating the probability of no-more-meeting of the node k according to the formula five

The fifth formula is:

wherein ,

representing the probability that two resulting random walks originating from node vk meet after step l (k), c^l(k)Representing the probability that two random walks will not stop in the first step, l (k) being E_kValue of variable l, Z, at or above a predetermined value_l(k, q) is a slave node v_kTwo starting random walks firstly meet a node v in the first step_qThe probability of (d); and i (w) is an indicator variable used for counting whether the random walks meet in the w-th generation process, w is less than or equal to r (k), when two random walks produced in the w-th generation meet, i (w) is 1, otherwise, i (w) is 0.

In a possible implementation manner, the recommending users corresponding to the t nodes as a result to a target user includes: the found t nodes correspond to users in the social network; and eliminating users who have friend relations with the user to be recommended, and recommending the rest users to the target user.

Compared with the prior art, the friend recommendation method based on the single-source SimRank accurate solution provided by the embodiment repeatedly performs L rounds of single-source SimRank similarity calculation and performs n-dimensional vector calculation

Updating to find n-dimensional vector

Recommending the user corresponding to the t-dimension with the largest value as a result to the target user, wherein the value of the t-dimension is larger than the value of the corresponding node, and the value of the t-dimension can be judged according to the judgment result of the judgment result and can be judged according to the judgment result of O (logn @)²And obtaining estimated values of the SimRank similarity between all users and the users to be recommended under the time complexity of + m log (1/), wherein the absolute error between the estimated values of the SimRank similarity between the users and the users to be recommended and the true values is not more than. When set to 10^-7And when the sink variable type is used for storing the sink result, an accurate solution of the sink similarity can be obtained in an effective time, and the quality and effect of the friend recommendation function are improved.

Further, the friend recommendation method based on the single-source SimRank accurate solution provided by this embodiment does not need to perform preprocessing of a graph structure, and can accurately calculate the single-source SimRank similarity of a dynamically changing group (such as the appearance of a new user, the logout of an existing user, the change of a friend relationship, and the like), thereby implementing friend recommendation for the dynamically changing user group.

Drawings

Fig. 1 is a flowchart of a friend recommendation method based on a single-source SimRank accurate solution according to an embodiment of the present invention;

fig. 2 is a flowchart of one implementation of step S3 provided according to an embodiment of the present invention.

Detailed Description

The following detailed description of the present invention is provided in conjunction with the accompanying drawings, but it should be understood that the scope of the present invention is not limited to the specific embodiments.

Throughout the specification and claims, unless explicitly stated otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element or component but not the exclusion of any other element or component.

As shown in fig. 1, which is a flowchart of a friend recommendation method based on a single-source SimRank accurate solution according to a preferred embodiment of the present invention, the method includes: steps S1-S5.

In step S1, a target user, a user, and a relationship between users are converted into a graph structure G, where the graph structure G includes a node corresponding to the user and an edge corresponding to the relationship between the users, and the target user is a source node v of the graph structure_iThe graph structure G includes n nodes.

The users in this embodiment refer to all registered users on the platform, and the relationship between the users may specifically be a concern relationship between the users. For example, all registered buddies on facebook and the buddy relationship network.

Specifically, for social networks with concern relationships, such as microblogs, facebooks, instagrams, and the like, users of the social networks correspond to nodes on a graph structure, and concern relationships among the users correspond to edges on the graph structure. Specifically, if the a user pays attention to the B user, a directed edge from the B user node to the a user node needs to be established on the graph structure. (example B- > A), where B is the in-neighbor node of A, A is the out-neighbor node of B, and this edge is the out-edge of node B and is the in-edge of node A. The number of outgoing edges owned by a node is referred to as "outgoing degree", and the number of incoming edges owned by the node is referred to as "incoming degree".

For a social network with friend relationships such as WeChat and QQ, users on the social network correspond to graph nodes, and friend relationships correspond to edges on a graph structure. Specifically, if there is a buddy relationship between the a-user and the B-user (i.e., A, B are buddies of each other), a directed edge from the a-user node to the B-user node and a directed edge from the B-user node to the a-user node are established on the graph structure.

In step S2, in the graph structure G, the source node v is calculated_iConstructing a personalized Pepper rank vector with respect to the personalized Pepper ranks for all nodes on the graph

wherein ,

is a vector of dimensions n to n,

Is stopped at a probability of

The probability of the node randomly goes to any neighbor node of the current node;

in step S3, the non-encounter probability of all nodes on the graph structure G is calculated to form a non-encounter probability matrix

The no longer meeting probability matrix

In step S4, a vector is generated from the n-dimensional vector

And no longer meeting probability matrix

Calculating the single-source SimRank similarity of the source node vi to obtain an n-dimensional vector

Updating to obtain n-dimensional vector after L updated rounds

wherein ,

c is the attenuation coefficient, c ∈ [0, 1 ]]Calculating the absolute error of the result for the SimRank;

in step S5, an n-dimensional vector is found

It should be noted that the value stored in each dimension of the SimRank vector corresponds to one node on the graph structure and the source node v_iAnd further obtaining the estimated value of the SimRank similarity between a certain user and the target user in the social network.

Therefore, by the friend recommendation method based on the single-source SimRank accurate solution provided by the embodiment, L rounds of SimRank similarity calculation are repeatedly performed, and an n-dimensional vector is calculated

Updating to find n-dimensional vector

Recommending the user corresponding to the t-dimension node as a result to the target user, wherein the t-dimension node has the largest value, and the value can be judged according to the judgment result of the T-dimension node²And obtaining estimated values of the SimRank similarity between all users and the users to be recommended under the time complexity of + m log (1/), wherein the absolute error between the estimated values of the SimRank similarity between the users and the users to be recommended and the true values is not more than. When set to 10^-7And when the sink variable type is used for storing the SimRank result, the large-scale user group can be obtained in the effective timeThe single-source SimRank similarity accurate solution improves the quality and effect of friend recommendation functions.

The present embodiment provides a recommendation method based on the complexity O (log n @)²+ m log (1/)), and the complexity of most other algorithms

In contrast, the complexity of the method avoids larger n and smaller²The simultaneous occurrence of the numerator denominator makes the complexity result large, so that 10 can be calculated in the effective time^-7Single source SimRank results.

In addition, the recommendation method provided by the embodiment does not need to perform preprocessing of the graph structure, can accurately calculate the SimRank similarity of dynamically changing groups (such as appearance of new users, logout of existing users, change of friend relationships, and the like), and realizes friend recommendation for the dynamically changing user groups.

In one implementation, step S5 may include: the found t nodes correspond to users in the social network; and eliminating users who have friend relations with the user to be recommended, and recommending the rest users to the target user.

In one implementation, step S2 may further include:

according to source node v_iTo node v_kCalculates the source node v according to the personalized Peck ranking vector of the neighbor node_iTo node v_kWherein node v is a set of nodes_kIs any node in the graph structure G.

Specifically, the method can be realized by the following steps:

obtaining a probability transition matrix P of the graph structure G, wherein P is a matrix with n × n dimensions, and the value recorded at the jth row and the ith column is the slave node v_iOne-step transfer to node v along incoming edge_jThe probability of (d);

according to the formula (1) to

Update and count the vector

Storing, wherein the formula (1) comprises:

wherein l is an intermediate variable, l is 0, 1,.., L, and

and

is initialized to

repeatedly L times, one pair by one

Performing an updating process to obtain an updated n-dimensional vector

As shown in fig. 2, which is a flowchart of an implementation manner of step S3 in this embodiment, the method includes: step S31-step S38.

In the step ofIn S31, a node v in the graph structure is obtained_k。

In step S32, it is judged that the node v_kWhether the degree of income belongs to a preset condition or not, wherein the preset condition comprises a node v_kThe degree of penetration of (a) is 0 or 1;

in step S33, if yes, returning to the node v according to the preset condition_kWhen d is_in(v_k) When equal to 0, node v_kNo longer meeting probability

When d is_in(v_k) When 1, node v_kNo longer meeting probability

d_in(v_k) Representing a structural node v of the diagram_kThe degree of entry of (c).

If not, in step S34, the slave node v is calculated_kTwo starting random walks firstly meet a node v in the first step_qProbability of Z_l(k, q), repeating the calculation of Z_l(k, q) up to slave node v_kSum of lengths of all extended paths E_kGreater than or equal to a preset value, wherein the preset value is

as the n-dimensional vector

A value of the kth dimension of (1);

in step S35, the slave node v is acquired_kSum of lengths of all extended paths E_kWhen the number of the main points is larger than or equal to the preset value, two follow-up points followThe total number of layers l (k) that the machine walks;

in step S36, the slave node v is calculated_kStarting two random walks to generate the probability that the random walks meet after the step l (k), and repeating the calculation of the probability that the random walks meet after the step l (k) for R (k) times, wherein the random walks are slave nodes v_kStarting to generate two random walks which do not stop in the first step (l), (k), and starting from the first step (l), (k) +1, the walking time of each step is counted

Is stopped at a probability of

The probability of the node randomly goes to any neighbor of the current node;

in step S37, according to the calculation results of the random walk generated R (k) times and the slave node v_kSum of lengths of all extended paths E_kZ is greater than or equal to the preset value_l(k, q) calculation result calculation node v_kNo longer meeting probability

And node v_kNo longer meeting probability

Store to no longer meet matrix

The kth element on the diagonal;

the implementation manner of step S34-step S37 may include: calculating the probability of no-more-meeting of the node k according to the formula (5)

The formula (5) is:

wherein ,

representing a slave node v_kTwo starting random walks generating the probability that they meet after step l (k), c^l(k)Representing the probability that two random walks do not stop in the previous step; l (k) is E_kValue of variable l, Z, at or above a predetermined value_l(k, q) is a slave node v_kTwo starting random walks firstly meet a node v in the first step_qThe probability of (d); i (w) is an indicator variable, i (w) is 1 when the two random walks generated at the w-th time meet, otherwise i (w) is 0.

In step S38, the value of k is updated, and the above calculation is repeated until the node v_kAll nodes in the graph structure are traversed.

l (k) is E_kAnd when the value of the variable l is larger than or equal to the preset value, the number of layers of the probability of no meeting after the deterministic calculation is represented. After the step S34 is completed, the value of l (k) is also determined, and thus can be used in the calculation of formula (5). As can be seen from equation (5), we will no longer meet the probability

The probability of no-meeting of the first (l), (k) layers is deterministically calculated by step S34 (i.e., formula (5) includes Z)_l(k, q) one), the probability of no-more-meet at layers l (k) through L is obtained by the method of generating random walks in step S36

The purpose of splitting into two parts is to balance the advantages and the disadvantages of the two parts, and a balance point with the shortest time and the most accurate result is found through the selection of l (k).

It should be noted that the above steps calculate each node v on the graph_kNo longer meeting probability

Is further not obtainedRe-encounter probability matrix

Matrix array

Is a diagonal matrix, i.e. only the elements on the diagonal are not 0, the matrix

The value stored by the k-th element on the diagonal is the node v_kNo longer meeting probability

Matrix array

The estimated value of (c) is used in the calculation of the SimRank similarity.

In one implementation, step S34 may include: step S341-step S344.

Step S341, according to the node v_kTransferring to a node v along an incoming edge through step I_qProbability (P) of^T)^l(k, q), calculating the slave node v_kTwo starting random walks firstly meet a node v in the first step_qProbability of Z_l(k，q)。

wherein ,P^TIs the transpose of a matrix P, wherein P is a matrix with n × n dimensions, and the value recorded at the ith column of the jth row is the slave node v_iOne-step transfer to node v along incoming edge_jProbability of (P)', P^T(i，j)＝P(j，i)，P^TThe value recorded at the ith row and the jth column is a slave node v_iOne-step transfer to node v along incoming edge_jThe probability of (c).

In this embodiment, the random walk is from v_kWalk to node v through l steps along the incoming edge_qMeaning that the walking is performed at each step

Is stopped, and the transition in step S341 refers to the slave node v_kGo to node v along incoming edge without stopping_qThe probability of (c).

Performing Z according to equation (3)_l(k, q), and the formula (3) is:

wherein ,Z_l(k, q) denotes the slave node v_kTwo starting random walks firstly meet a node v in the first step_qProbability of (Z)₀(k, k) is initialized to 1, pair

Z₀(k, x) is initialized to 0; c. C^l((P^T)^l(k，q))²To the slave node v_kTwo starting random walks meet a node v in the first step_qThe probability of (a) of (b) being,

(P^T)⁰(k, x) is initialized to 0, for

Equation (3) indicates that for current l, according to (P)^T)^l(k，q)、(P^T)^l′(q', q) and Z_l-l′Value calculation of (k, q') Z_l(k，q)。

Step S342, for all (P)^T)^l-l′Node v with (k, q') > 0_q′To (P)^T)^l′+1(q′，q)、E_kAnd l 'are updated until l' ═ l, where (P)^T)^l-l′(k, q') represents a node v_kIs transferred to a node v through l-l' steps_q′Probability of (P)^l′+1(q', q) represents node v_q′Is transferred to a node v through l' +1 step_kThe probability of (c).

For all (P) according to formula (4)^T)^l-l′Node v with (k, q') > 0_q′To (P)^T)^l′+1(q′，q)、E_kAnd l', the formula (4) is:

(P^T)^l′+1(q′，q)＝(P^T)^l′+1(q′，q)+(P^T)^l′(q′，x)/d_in(v_x)

(P^T)^l-l′(k, q') represents a node v_kIs transferred to a node v along an incoming edge through steps l-l_q′The probability of (c). (P)^T)^l′(q', x) represents node v_g′Transferring to a node v through l' step along the incoming edge_xThe probability of (c). (P) of each point in the update process^T)^l-l′(k, q') are different, where (P)^T)^l-l′(k, q') > 0 indicates that (P) is selected^T)^l-l′(k, q') > 0. (P)^T)^l′+1(q', q) represents node v_q′Transferring to a node v along an incoming edge through steps l' +1_q′Probability of, node v_xMeans all (P)^T)^l′(q', x) > 0.

Step S343, updating the value of l, l ═ l + 1;

step S344, repeat the above steps until E_kGreater than or equal to the preset value.

In one implementation, step S4 may include:

when the first round of calculation is performed, the SimRank similarity is calculated according to the following formula:

The formula (2) is:

wherein ,

for the no-meet probability matrix, l is an intermediate variable, l is 0, 1

Is an n-dimensional vector.

The values stored during the calculation of step 2.

The formula (2) needs to repeatedly execute L rounds to ensure that the finally obtained SimRank vector is under the absolute error, and the SimRank vector obtained after l rounds of calculation is recorded as

From this, an estimation result of the SimRank vector under the absolute error, SimRan, can be obtainedThe value stored in each dimension of the k vector corresponds to one node on the graph structure and a source node v_iAnd further obtaining the estimated value of the SimRank similarity between a certain user and the target user in the social network.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing descriptions of specific exemplary embodiments of the present invention have been presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain certain principles of the invention and its practical application to enable one skilled in the art to make and use various exemplary embodiments of the invention and various alternatives and modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.

Claims

1. A friend recommendation method based on a single-source SimRank accurate solution is characterized by comprising the following steps:

converting a target user, a user and a relation among the users into a graph structure G, wherein the graph structure G comprises nodes corresponding to the users and edges corresponding to the relation among the users, and the target user is a source node v of the graph structure_iThe graph structure G comprises n nodes;

in graph structure G, a source node v is computed_iConstructing a personalized Pepper rank vector with respect to the personalized Pepper ranks for all nodes on the graph

wherein ,

is a vector of dimensions n to n,

Is stopped at a probability of

The no longer meeting probability matrix

According to n-dimensional vectors

And no longer meeting probability matrix

Updating to obtain n-dimensional vector after L updated rounds

wherein ,

finding n-dimensional vectors

2. The friend recommendation method of claim 1, wherein in graph structure G, a compute source node v_iConstructing an n-dimensional vector with respect to personalized Pepper's ranking of all nodes on the graph

The method comprises the following steps:

3. The friend recommendation method of claim 2, wherein the source node v is based on_iTo node v_kCalculates the source node v according to the personalized Peck ranking vector of the neighbor node_iTo node v_kThe personalized peck rank vector of (a) comprises:

according to a formula

Update and count the vector

Storing, wherein the first formula comprises:

wherein l is an intermediate variable, l is 0, 1,.., L, and

and

is initialized to

repeatedly L times, one pair by one

Performing an updating process to obtain an updated n-dimensional vector

4. The friend recommendation method of claim 3, wherein the vector is based on n-dimensions

And no longer meeting probability matrix

Updating to obtain n-dimensional vector after L updated rounds

The method comprises the following steps:

The second formula is:

wherein ,

for the no-meet probability matrix, l is an intermediate variable, l is 0, 1

Is an n-dimensional vector.

5. The friend recommendation method of claim 2, wherein the no-encounter probabilities for all nodes in graph structure G are computed to form a no-encounter probability matrix

The method comprises the following steps:

obtaining a node v in a graph structure_k；

Judging node v_kWhether the degree of income belongs to a preset condition or not, wherein the preset condition comprises a node v_kThe degree of penetration of (a) is 0 or 1;

if yes, returning to the node v according to the preset condition_kWhen v is no longer met_kWhen the degree of income is 0, the node v_kNo longer meeting probability

If not, calculating the slave node v_kTwo points of departureThe random walk first meets the node v in the first step_qProbability of Z_l(k, q), repeating the calculation of Z_l(k, q) up to slave node v_kSum of lengths of all extended paths E_kGreater than or equal to a preset value, wherein the preset value is

as the n-dimensional vector

A value of the kth dimension of (1);

obtaining a slave node v_kSum of lengths of all extended paths E_kWhen the number of the total layers is larger than or equal to a preset value, the total layers l (k) of the two random walk layers are obtained;

compute slave v_kStarting two random walks to generate the probability that the random walks meet after the step l (k), and repeating the calculation of the probability that the random walks meet after the step l (k) for R (k) times, wherein the random walks are slave nodes v_kStarting to generate two random walks which do not stop in the first step (l), (k), and starting from the first step (l), (k) +1, the walking time of each step is counted

Is stopped at a probability of

The probability of the node randomly goes to any neighbor of the current node;

generating the calculation result of random walk according to R (k) times and the slave node v_kSum of lengths of all extended paths E_kZ is greater than or equal to the preset value_l(k, q) calculation result calculation node v_kNo longer meeting probability

And node v_kNo longer meeting probability

Store to no longer meet matrix

The kth element on the diagonal;

updating the value of k and repeating the above calculation until the node v_kAll nodes in the graph structure are traversed.

6. The friend recommendation method of claim 5, wherein the compute slave node v_kTwo starting random walks firstly meet a node v in the first step_qProbability of Z_l(k, q), repeating the calculation of Z_l(k, q up to slave node v)_kSum of lengths of all extended paths E_kThe preset value or more comprises the following steps:

according to node v_kTransferring to a node v along an incoming edge through step I_qProbability of (2)

Compute slave v_kTwo starting random walks firstly meet a node v in the first step_qProbability of Z_l(k，q)；

For all

Node v of_q′To, for

E_kAnd l 'are updated until l', where,

representing a node v_kIs transferred to a node v through l-l' steps_q′The probability of (a) of (b) being,

representing a node v_kIs transferred to a node v through l' +1 step_q′The probability of (d);

updating the value of l;

repeating the above steps until E_kGreater than or equal to the preset value.

7. The friend recommendation method of claim 6, wherein the function according to node v_kTransferring to the node v after l steps along the incoming edge_qProbability of (2)

Compute slave v_kTwo starting random walks firstly meet a node v in the first step_qProbability of Z_l(k, q) comprises:

performing Z according to the formula_l(k, q), the formula three is:

wherein ,

to the slave node v_kTwo starting random walks meet a node v in the first step_qThe probability of (a) of (b) being,

to the slave node v_kTwo starting random walks meet a node v in the first step_qAnd reaches node v_qThe probability of having been met before,

representing a node v_kTransferring to a node v along an incoming edge through step I_qThe probability of (a) of (b) being,

is initialized to 1, pair

Is initialized to 0, pair

Is initialized to 0; l is an intermediate variable, initialized to 0.

8. The friend recommendation method of claim 6, wherein all

Node v of_q′To, for

E_kAnd l' updating includes:

according to the formula, four pairs of all

Node v of_q′To, for

E_kAnd l' are updated, and the formula four is:

E_k＝E_k+1

l′＝l′+1

wherein, the node v_xMeans all of

The node of (2).

9. The friend recommendation method of claim 5, wherein the calculation result of the random walk generated according to R (k) times and the slave node v_kSum of lengths of all extended paths E_kZ is greater than or equal to the preset value_l(k, q) calculation result calculation node v_kNo longer meeting probability

The method comprises the following steps:

The fifth formula is:

wherein ,

representing a slave node v_kTwo starting random walks generating the probability that they meet after step l (k), c^l(k)Representing the probability that two random walks will not stop in the first step, l (k) being E_kValue of variable l, Z, at or above a predetermined value_l(k, q) is a slave node v_kTwo starting random walks firstly meet a node v in the first step_qThe probability of (d); i (w) is an indicator variable for counting whether the random walks meet in the process of generating the random walk for the w time, wherein w is less than or equal to R (k), and when two random walks generated for the w time meetWhen the machine walks, i (w) is 1, otherwise, i (w) is 0.

10. The friend recommendation method of claim 1, wherein said recommending users corresponding to the t nodes to target users as a result comprises:

the found t nodes correspond to users in the social network;

and eliminating users who have friend relations with the user to be recommended, and recommending the rest users to the target user.