CN116226525A - Personalized webpage ranking method and system based on linear algebra - Google Patents

Personalized webpage ranking method and system based on linear algebra Download PDF

Info

Publication number
CN116226525A
CN116226525A CN202310194387.8A CN202310194387A CN116226525A CN 116226525 A CN116226525 A CN 116226525A CN 202310194387 A CN202310194387 A CN 202310194387A CN 116226525 A CN116226525 A CN 116226525A
Authority
CN
China
Prior art keywords
graph
node
webpage
algebraic
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310194387.8A
Other languages
Chinese (zh)
Inventor
崔博远
乔鹏鹏
张志威
袁野
王国仁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202310194387.8A priority Critical patent/CN116226525A/en
Publication of CN116226525A publication Critical patent/CN116226525A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2237Vectors, bitmaps or matrices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Software Systems (AREA)
  • Algebra (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a personalized webpage ranking method and a personalized webpage ranking system based on linear algebra, wherein graphs are expressed as adjacent matrixes, algebra calculation is carried out, good parallelism is realized, and personalized webpage ranking is calculated efficiently. The method comprises the following steps: and taking the original graph data corresponding to a group of webpages as an input webpage set, taking a starting webpage set as a subset of the input webpage set, and constructing a graph adjacency matrix according to the original graph data. The strong connected component with node number 1 is found on the graph adjacent matrix by using algebraic Trim-1 method, and the adjacent matrix is reconstructed, so as to obtain the graph represented by the reconstructed adjacent matrix. Algebraic FW-BW algorithms based on algebraic breadth-first search are used to find the largest strongly connected component in the graph. Algebraic label propagation algorithms based on matrix multiplication are used to find strongly connected components in a graph represented using contiguous matrices. And calculating personalized web page ranks on the reconstructed directed acyclic graph by using a PM algorithm, wherein the personalized web page ranks are used for indicating the association tightness degree of the web pages and the initial web page set.

Description

Personalized webpage ranking method and system based on linear algebra
Technical Field
The invention relates to the technical field of network analysis, in particular to a personalized webpage ranking method and system based on linear algebra.
Background
In the field of network analysis, personalized web page ranking has important significance. Personalized network rank PPR (Personalized PR) is a special application of network rank PR (PageRank) for measuring the relevance of individual nodes in a graph to a given set of starting nodes. PPR has a wide range of applications. As on a social network, a user may want to know which friends are more important to himself; on a business network, merchants also want to know items that are more relevant to the items being sold in order to improve the business strategy.
The classical personalized web page ranking algorithm is a PM (Power method) iterative algorithm, the PM algorithm is continuously iterated, an approximate PPR vector is calculated in each iteration, the approximate PPR vector is used as an input value of the next iteration, then a recent PPR-like vector is calculated according to the side relation in the graph until convergence, and the finally obtained PPR vector is returned as a result.
Although the PM algorithm can ensure higher accuracy, the algorithm has high complexity and high calculation cost, so that an improvement strategy is to calculate strong connected components first, then calculate reachable subgraphs of the initial node set by using the strong connected components, and then calculate personalized webpage ranking by using the PM algorithm on the reachable subgraphs with smaller data scale. But efficient parallel computation of strongly connected components is a problem yet to be solved. In the prior art, the strong connected component is mostly calculated based on the data format of the adjacency list, and the irregularity of the graph data causes the problem of uneven load and the like in parallel calculation, so that the parallelism is poor, and the efficiency of calculating the strong connected component is influenced.
Therefore, the current personalized webpage ranking algorithm based on the strong connected components has the problem that high concurrency is difficult to achieve on graph data.
Disclosure of Invention
In view of this, the invention provides a personalized web page ranking method and system based on linear algebra, which can fully utilize a mature parallel matrix computing method to realize good parallelism and quickly obtain strong connected components by representing the graph as an adjacent matrix and performing algebraic computation, thereby efficiently computing the personalized web page ranking.
In order to achieve the above purpose, the technical scheme of the invention comprises the following steps:
step one: and taking the original graph data corresponding to a group of webpages as an input webpage set, taking a starting webpage set as a subset of the input webpage set, and constructing a graph adjacency matrix according to the original graph data.
Step two: the strong connected component with node number 1 is found on the graph adjacent matrix by using algebraic Trim-1 method, and the adjacent matrix is reconstructed, so as to obtain the graph represented by the reconstructed adjacent matrix.
Step three: algebraic FW-BW algorithms based on algebraic breadth-first search are used to find the largest strongly connected component in the graph represented using the reconstructed adjacency matrix.
Step four: algebraic label propagation algorithms based on matrix multiplication are used to find strongly connected components in a graph represented using contiguous matrices.
Step five: and calculating personalized webpage ranks on the reconstructed directed acyclic graph by using a PM algorithm, wherein the size of each webpage ranking value indicates the association tightness degree of the webpage and the initial webpage set.
Further, constructing a graph adjacency matrix according to the original graph data, which comprises the following specific steps:
s101: inputting a webpage set V, a webpage link relation and a starting webpage set V 0 And V 0 Number of clicks of middle web page l= { (v, L) v )|v∈V 0 };l v The number of clicks for web page v; initial web page set V 0 There are n web pages.
S102: building graph G based on input web pages and web page link relations 0 V is set up 0 The corresponding node is marked.
S103: according to the point of the web pageThe number of hits calculates the start probability p= { (v, P) of the start web page v )|v∈V 0 },
Figure BDA0004106672740000021
l u The number of clicks for web page u; p is p v The probability is initiated for web page v.
S104: according to the diagram G 0 Generating a sparse matrix FW, namely a graph adjacent matrix, wherein a transposed matrix of FW is BW, and storing FW and BW according to a sparse compression matrix format.
Further, the strong connected component method for finding node number 1 on the graph adjacency matrix by using algebraic Trim-1 method is as follows:
s201: sparse matrix multiplication is carried out on the full 1 vector and FW and BW respectively, the result is respectively the ingress degree f (V) and the egress degree b (V) of all nodes V in the input webpage set V, and whether f (V) and b (V) are 0 or not is recorded by using Boolean vectors.
S202: vector addition of f (v) and b (v) based on algebraic primitive, i.e. m (v) =f (v) & b (v); m (v) is a marker vector, and a node recorded as 0 in m (v) is a strong connected component with the node number of 1.
S203: m (v) is multiplied by FW and BW respectively, and whether the ingress and egress degrees f (v) and b (v) of the remaining nodes with m (v) of 1 are 0 or not is calculated and recorded by using Boolean vectors.
S204: s202 and S203 are repeated until m (v) is no longer changed.
S205: the strongly connected component id corresponding to the node recorded as 0 in m (v) is recorded as its own node serial number.
S206: traversing all edges (i, j) in FW and BW, wherein i and j are nodes, and only reserving the edges meeting m (i) =1 & (m (j) =1, namely eliminating the strong connected component with the node number of 1 from the graph, so as to obtain the graph with the strong connected component removed.
Further, algebraic FW-BW algorithms based on algebraic breadth-first search are used to find strongly connected components in a graph represented using a reconstructed adjacency matrix by:
s301: and calculating the input degree and the output degree of all nodes V in the input webpage V and multiplying to obtain an n (V) vector.
S302: sorting n (v) to obtain the maximum n (v) 0 )。
S303: from v 0 An algebraic breadth-first search based on the adjacency matrix FW is started to obtain a BFS tree B.
S304: from v 0 Algebraic breadth-first search based on the adjacency matrix BW is started to find the BFS tree F.
S305: the intersection S of B and F is v 0 And the strong connected component S is positioned, and the nodes in the S are removed from FW and BW.
Further, an algebraic label propagation algorithm based on matrix multiplication is used to find strongly connected components in a graph represented by a contiguous matrix, and the method is as follows:
s401: node v assigns minf (v) =v, and using matrix multiplication to transmit minf (v) to its reachable neighbors, node updates minf (v) to the received minimum, and repeats this process until minf (v) is no longer updated.
S402: and assigning minub (v) =v to the node v which satisfies minuf (v) =v, transmitting the minub (v) to the neighbor node which can reach the minum by using matrix multiplication, updating the minum (v) to the received minimum value by the node, and repeating the process until the minum (v) is not updated any more.
S403: the strong connected component ID of a group of nodes of minf (v) =minb (v) =u is labeled u and is culled from FW and BW.
S404: s401, S402, S403 are repeated until FW and BW are empty.
Further, the method for calculating personalized web page ranking by using PM algorithm on the reconstructed directed acyclic graph comprises the following steps:
s501: graph G 0 Reconstructing, compressing nodes belonging to the same strong connected component into a super node, generating new edges according to the edge relation among the strong connected components, and obtaining a directed acyclic graph G 1 And stored in a contiguous matrix.
S502: will V 0 The strong connected component of the middle node is marked as the initial component to obtain G 1 Corresponding initial component set V in 1
S503: for V 1 Node in (a)Algebraic breadth-first search is performed respectively, and then all obtained breadth-first search trees are combined to obtain a component reachable sub-graph G 2
S504: according to G 2 Middle node and G 0 Corresponding relation to G 0 Reconstructing to obtain reachable subgraph G 3
S505: at G 3 Computing G using PM algorithm 3 V on 0 Personalized network rank vector of (3)
Figure BDA0004106672740000041
S506: by means of
Figure BDA0004106672740000042
Calculate V 0 Personalized network rank->
Figure BDA0004106672740000043
Wherein 0 is a zero vector with dimension V 0 Unreachable subgraph G of (1) 0 \G 3 The number of intermediate nodes.
In order to achieve the above purpose, the invention also provides a personalized web page ranking system based on linear algebra, which comprises the following modules:
and the graph adjacency matrix construction module takes the original graph data corresponding to a group of webpages as an input webpage set, the initial webpage set is a subset of the input webpage set, and the graph adjacency matrix is constructed according to the original graph data.
And the reconstruction module is used for finding out strong connected components with the node number of 1 on the graph adjacent matrix by using an algebraic Trim-1 method and carrying out reconstruction on the adjacent matrix to obtain a graph represented by the reconstructed adjacent matrix.
Algebraic FW-BW algorithm module uses algebraic FW-BW algorithm based on algebraic breadth-first search to find the largest strongly connected component in the graph represented using the reconstructed adjacency matrix.
Algebraic label propagation algorithm module that uses algebraic label propagation algorithm based on matrix multiplication to find strongly connected components in a graph represented using adjacency matrices.
And the webpage ranking module is used for calculating personalized webpage ranking on the reconstructed directed acyclic graph by using a PM algorithm, and the size of each webpage ranking value indicates the association tightness degree of the webpage and the initial webpage set.
Further, the graph adjacency matrix construction module specifically realizes the following steps:
s101: inputting a webpage set V, a webpage link relation and a starting webpage set V 0 And V 0 Number of clicks of middle web page l= { (v, L) v )|v∈V 0 };l v The number of clicks for web page v; initial web page set V 0 There are n web pages.
S102: building graph G based on input web pages and web page link relations 0 V is set up 0 The corresponding node is marked.
S103: calculating the initial probability P= { (v, P) of the initial webpage according to the click number of the webpage v )|v∈V 0 },
Figure BDA0004106672740000051
l u The number of clicks for web page u; p is p v The probability is initiated for web page v.
S104: according to the diagram G 0 Generating a sparse matrix FW, namely a graph adjacent matrix, wherein a transposed matrix of FW is BW, and storing FW and BW according to a sparse compression matrix format.
Further, the reconstruction module specifically realizes the following steps:
s201: sparse matrix multiplication is carried out on the full 1 vector and FW and BW respectively, the result is respectively the ingress degree f (V) and the egress degree b (V) of all nodes V in the input webpage set V, and whether f (V) and b (V) are 0 or not is recorded by using Boolean vectors.
S202: vector addition of f (v) and b (v) based on algebraic primitive, i.e. m (v) =f (v) & b (v); m (v) is a marker vector, and a node recorded as 0 in m (v) is a strong connected component with the node number of 1.
S203: m (v) is multiplied by FW and BW respectively, and whether the ingress and egress degrees f (v) and b (v) of the remaining nodes with m (v) of 1 are 0 or not is calculated and recorded by using Boolean vectors.
S204: s202 and S203 are repeated until m (v) is no longer changed.
S205: the strongly connected component id corresponding to the node recorded as 0 in m (v) is recorded as its own node serial number.
S206: traversing all edges (i, j) in FW and BW, wherein i and j are nodes, and only reserving the edges meeting m (i) =1 & (m (j) =1, namely eliminating the strong connected component with the node number of 1 from the graph, so as to obtain the graph with the strong connected component removed.
Further, the algebraic FW-BW algorithm module specifically realizes the following steps:
s301: and calculating the input degree and the output degree of all nodes V in the input webpage V and multiplying to obtain an n (V) vector.
S302: sorting n (v) to obtain the maximum n (v) 0 )。
S303: from v 0 An algebraic breadth-first search based on the adjacency matrix FW is started to obtain a BFS tree B.
S304: from v 0 Algebraic breadth-first search based on the adjacency matrix BW is started to find the BFS tree F.
S305: the intersection S of B and F is v 0 And the node nodes in the S are removed from FW and BW by the strong connected component S.
Further, the algebraic label propagation algorithm module specifically realizes the following steps:
s401: node v assigns minf (v) =v, and using matrix multiplication to transmit minf (v) to its reachable neighbors, node updates minf (v) to the received minimum, and repeats this process until minf (v) is no longer updated.
S402: and assigning minub (v) =v to the node v which satisfies minuf (v) =v, transmitting the minub (v) to the neighbor node which can reach the minum by using matrix multiplication, updating the minum (v) to the received minimum value by the node, and repeating the process until the minum (v) is not updated any more.
S403: the strong connected component ID of a group of nodes of minf (v) =minb (v) =u is labeled u and is culled from FW and BW.
S404: s401, S402, S403 are repeated until FW and BW are empty.
The webpage ranking module specifically realizes the following steps:
s501: graph G 0 Reconstructing, compressing nodes belonging to the same strong connected component into a super node, generating new edges according to the edge relation among the strong connected components, and obtaining a directed acyclic graph G 1 And stored in a contiguous matrix.
S502: will V 0 The strong connected component of the middle node is marked as the initial component to obtain G 1 Corresponding initial component set V in 1
S503: for V 1 The nodes in the tree are respectively subjected to algebraic breadth-first search, and then all obtained breadth-first search trees are combined to obtain a component reachable sub-graph G 2
S504: according to G 2 Middle node and G 0 Corresponding relation to G 0 Reconstructing to obtain reachable subgraph G 3
S505: at G 3 Computing G using PM algorithm 3 V on 0 Personalized network rank vector of (3)
Figure BDA0004106672740000071
S506: by means of
Figure BDA0004106672740000072
Calculate V 0 Personalized network rank->
Figure BDA0004106672740000073
Wherein 0 is a zero vector with dimension V 0 Unreachable subgraph G of (1) 0 \G 3 The number of intermediate nodes.
The beneficial effects are that:
the invention provides a personalized webpage ranking method and a system based on linear algebra, which can fully utilize a mature parallel matrix computing method to realize good parallelism and quickly obtain strong connected components by representing a graph as an adjacent matrix and performing algebraic computation, thereby efficiently computing the personalized webpage ranking. The invention solves the problem that the personalized webpage ranking algorithm based on the strong connected components is difficult to realize high concurrency on the graph data, uses the adjacency matrix as a storage format of the graph data, and calculates by using a linear algebra method, thereby realizing data level concurrency by utilizing the high concurrency of matrix operation.
Drawings
FIG. 1 is a flow chart of the Trim method of the present invention;
FIG. 2 is a flow chart of the FW-BW method of the present invention;
FIG. 3 is a flow chart of a tag propagation algorithm of the present invention;
FIG. 4 is a general flow chart of a personalized web page ranking method based on linear algebra.
Detailed Description
The invention will now be described in detail by way of example with reference to the accompanying drawings.
The invention provides a personalized webpage ranking method based on linear algebra, the general flow of which is shown in figure 4, and the process is as follows: constructing a graph adjacency matrix according to the original graph data; using algebraic Trim-1 method to find strong connected component with node number 1 on adjacent matrix of graph and reconstructing adjacent matrix; using algebraic FW-BW algorithm based on algebraic breadth-first search to find the largest strongly connected component in the graph represented using the adjacency matrix; using an algebraic label propagation algorithm based on matrix multiplication to find strongly connected components in a graph represented using contiguous matrices; the personalized web page rank is calculated on the reconstructed directed acyclic graph using the PM (Power method) algorithm.
The invention calculates personalized webpage ranking based on a linear algebra method, and the following is a description of a specific embodiment:
step one: taking original graph data corresponding to a group of webpages as an input webpage set, taking a starting webpage set as a subset of the input webpage set, and constructing a graph adjacency matrix according to the original graph data; in the embodiment of the present invention, the first step specifically includes the following steps:
s101: inputting a webpage set V, a webpage link relation and a starting webpage set V 0 And V 0 Number of clicks of middle web page l= { (v, L) v )|v∈V 0 };l v The number of clicks for web page v; initial web page set V 0 There are n web pages.
S102: building graph G based on input web pages and web page link relations 0 V is set up 0 Marking the corresponding node;
s103: calculating the initial probability P= { (v, P) of the initial webpage according to the click number of the webpage v )|v∈V 0 },
Figure BDA0004106672740000081
l u The number of clicks for web page u; p is p v The probability is initiated for web page v.
S104: according to the diagram G 0 Generating a sparse matrix FW, namely a graph adjacent matrix, wherein a transposed matrix of FW is BW, and storing FW and BW according to a sparse compression matrix format.
Step two: using algebraic Trim-1 method to find strong connected component with node number 1 on the adjacent matrix of the graph and reconstructing the adjacent matrix to obtain the graph represented by the reconstructed adjacent matrix; the flow of the second step is shown in fig. 1, specifically:
s201: sparse matrix multiplication is carried out on the full 1 vector and FW and BW respectively, the result is respectively the ingress degree f (V) and the egress degree b (V) of all nodes V in the input webpage set V, and whether f (V) and b (V) are 0 or not is recorded by using Boolean vectors.
S202: vector addition of f (v) and b (v) based on algebraic primitive, i.e. m (v) =f (v) & b (v); m (v) is a marker vector, and a node recorded as 0 in m (v) is a strong connected component with the node number of 1.
S203: m (v) is multiplied by FW and BW respectively, and whether the ingress and egress degrees f (v) and b (v) of the remaining nodes with m (v) of 1 are 0 or not is calculated and recorded by using Boolean vectors.
S204: s202 and S203 are repeated until m (v) is no longer changed.
S205: the strongly connected component id corresponding to the node recorded as 0 in m (v) is recorded as its own node serial number.
S206: traversing all edges (i, j) in FW and BW, wherein i and j are nodes, and only reserving the edges meeting m (i) =1 & (m (j) =1, namely eliminating the strong connected component with the node number of 1 from the graph, so as to obtain the graph with the strong connected component removed.
Step three: using algebraic FW-BW algorithm based on algebraic breadth-first search to find the largest strongly connected component in the graph represented using the reconstructed adjacency matrix; the flow of the third step is shown in fig. 2, specifically:
s301: and calculating the input degree and the output degree of all nodes V in the input webpage V and multiplying to obtain an n (V) vector.
S302: sorting n (v) to obtain the maximum n (v) 0 )。
S303: from v 0 An algebraic breadth-first search based on the adjacency matrix FW is started to obtain a BFS tree B.
S304: from v 0 Algebraic breadth-first search based on the adjacency matrix BW is started to find the BFS tree F.
S305: the intersection S of B and F is v 0 And (3) the strong connected component S is positioned, and nodes in the S are removed from FW and BW.
Step four: using an algebraic label propagation algorithm based on matrix multiplication to find strongly connected components in a graph represented using contiguous matrices; the flow of the steps is shown in fig. 3, specifically:
s401: node v assigns minf (v) =v, and using matrix multiplication to transmit minf (v) to its reachable neighbors, node updates minf (v) to the received minimum, and repeats this process until minf (v) is no longer updated.
S402: nodes v satisfying minf (v) =v assign minb (v) =v, and transmit minb (v) to neighbor nodes up to themselves using matrix multiplication, and the nodes update their minb (v) to the received minimum value (this process is repeated until minb (v) is no longer updated.
S403: the strong connected component ID of a group of nodes of minf (v) =minb (v) =u is labeled u and is culled from FW and BW.
S404: s401, S402, S403 are repeated until FW and BW are empty.
Step five: and calculating personalized webpage ranks on the reconstructed directed acyclic graph by using a PM algorithm, wherein the size of each webpage ranking value indicates the association tightness degree of the webpage and the initial webpage set.
S501: graph G 0 Reconstructing, compressing nodes belonging to the same strong connected component into a super node, generating new edges according to the edge relation among the strong connected components, and obtaining a directed acyclic graph G 1 And stored in a contiguous matrix.
S502: will V 0 The strong connected component of the middle node is marked as the initial component to obtain G 1 Corresponding initial component set V in 1
S503: for V 1 The nodes in the tree are respectively subjected to algebraic breadth-first search, and then all obtained breadth-first search trees are combined to obtain a component reachable sub-graph G 2
S504: according to G 2 Middle node and G 0 Corresponding relation to G 0 Reconstructing to obtain reachable subgraph G 3
S505: at G 3 Computing G using PM algorithm 3 V on 0 Personalized network rank vector of (3)
Figure BDA0004106672740000101
S506: by means of
Figure BDA0004106672740000102
Calculate V 0 Personalized network rank->
Figure BDA0004106672740000103
Wherein 0 is a zero vector with dimension V 0 Unreachable subgraph G of (1) 0 \G 3 The number of intermediate nodes.
The invention further provides a personalized webpage ranking system based on linear algebra, which is characterized by comprising the following modules:
and the graph adjacency matrix construction module takes the original graph data corresponding to a group of webpages as an input webpage set, the initial webpage set is a subset of the input webpage set, and the graph adjacency matrix is constructed according to the original graph data.
And the reconstruction module is used for finding out strong connected components with the node number of 1 on the graph adjacent matrix by using an algebraic Trim-1 method and carrying out reconstruction on the adjacent matrix to obtain a graph represented by the reconstructed adjacent matrix.
Algebraic FW-BW algorithm module uses algebraic FW-BW algorithm based on algebraic breadth-first search to find the largest strongly connected component in the graph represented using the reconstructed adjacency matrix.
Algebraic label propagation algorithm module that uses algebraic label propagation algorithm based on matrix multiplication to find strongly connected components in a graph represented using adjacency matrices.
And the webpage ranking module is used for calculating personalized webpage ranking on the reconstructed directed acyclic graph by using a PM algorithm, and the size of each webpage ranking value indicates the association tightness degree of the webpage and the initial webpage set.
The graph adjacency matrix construction module specifically realizes the following steps:
s101: inputting a webpage set V, a webpage link relation and a starting webpage set V 0 And V 0 Number of clicks of middle web page l= { (v, L) v )|v∈V 0 };l v The number of clicks for web page v; initial web page set V 0 There are n web pages.
S102: building graph G based on input web pages and web page link relations 0 V is set up 0 The corresponding node is marked.
S103: calculating the initial probability P= { (v, P) of the initial webpage according to the click number of the webpage v )|v∈V 0 },
Figure BDA0004106672740000111
l u The number of clicks for web page u; p is p v The probability is initiated for web page v.
S104: according to the diagram G 0 Generating a sparse matrix FW, namely a graph adjacent matrix, wherein a transposed matrix of FW is BW, and storing FW and BW according to a sparse compression matrix format.
The reconstruction module specifically realizes the following steps:
s201: sparse matrix multiplication is carried out on the full 1 vector and FW and BW respectively, the result is respectively the ingress degree f (V) and the egress degree b (V) of all nodes V in the input webpage set V, and whether f (V) and b (V) are 0 or not is recorded by using Boolean vectors.
S202: vector addition of f (v) and b (v) based on algebraic primitive, i.e. m (v) =f (v) & b (v); m (v) is a marker vector, and a node recorded as 0 in m (v) is a strong connected component with the node number of 1.
S203: m (v) is multiplied by FW and BW respectively, and whether the ingress and egress degrees f (v) and b (v) of the remaining nodes with m (v) of 1 are 0 or not is calculated and recorded by using Boolean vectors.
S204: s202 and S203 are repeated until m (v) is no longer changed.
S205: the strongly connected component id corresponding to the node recorded as 0 in m (v) is recorded as its own node serial number.
S206: traversing all edges (i, j) in FW and BW, wherein i and j are nodes, and only reserving the edges meeting m (i) =1 & (m (j) =1, namely eliminating the strong connected component with the node number of 1 from the graph, so as to obtain the graph with the strong connected component removed.
Algebraic FW-BW algorithm module, concretely comprising the following steps:
s301: and calculating the input degree and the output degree of all nodes V in the input webpage V and multiplying to obtain an n (V) vector.
S302: sorting n (v) to obtain the maximum n (v) 0 )。
S303: from v 0 An algebraic breadth-first search based on the adjacency matrix FW is started to obtain a BFS tree B.
S304: from v 0 Algebraic breadth-first search based on the adjacency matrix BW is started to find the BFS tree F.
S305: the intersection S of B and F is v 0 And (3) the strong connected component S is positioned, and nodes in the S are removed from FW and BW.
The algebraic label propagation algorithm module specifically realizes the following steps:
s401: node v assigns minf (v) =v, and using matrix multiplication to transmit minf (v) to its reachable neighbors, node updates minf (v) to the received minimum, and repeats this process until minf (v) is no longer updated.
S402: and assigning minub (v) =v to the node v which satisfies minuf (v) =v, transmitting the minub (v) to the neighbor node which can reach the minum by using matrix multiplication, updating the minum (v) to the received minimum value by the node, and repeating the process until the minum (v) is not updated any more.
S403: the strong connected component ID of a group of nodes of minf (v) =minb (v) =u is labeled u and is culled from FW and BW.
S404: s401, S402, S403 are repeated until FW and BW are empty.
The webpage ranking module specifically realizes the following steps:
s501: graph G 0 Reconstructing, compressing nodes belonging to the same strong connected component into a super node, generating new edges according to the edge relation among the strong connected components, and obtaining a directed acyclic graph G 1 And stored in a contiguous matrix.
S502: will V 0 The strong connected component of the middle node is marked as the initial component to obtain G 1 Corresponding initial component set V in 1
S503: for V 1 The nodes in the tree are respectively subjected to algebraic breadth-first search, and then all obtained breadth-first search trees are combined to obtain a component reachable sub-graph G 2
S504: according to G 2 Middle node and G 0 Corresponding relation to G 0 Reconstructing to obtain reachable subgraph G 3
S505: at G 3 Computing G using PM algorithm 3 V on 0 Personalized network rank vector of (3)
Figure BDA0004106672740000121
S506: by means of
Figure BDA0004106672740000122
Calculate V 0 Personalized network rank->
Figure BDA0004106672740000123
Wherein 0 is a zero vector with dimension V 0 Unreachable subgraph G of (1) 0 \G 3 The number of intermediate nodes.
The personalized web page ranking system based on linear algebra can be realized by adopting equipment which comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein four functional modules are respectively an attribute graph construction module, a graph connectivity irrelevant node selection module, a graph dense module degree relevant node selection module and a friend recommendation module in the processor according to the functions, and the personalized web page ranking based on linear algebra can be realized when the processor executes the program.
In addition, the method steps of the present application may be implemented by hardware, such as logic gates, switches, application Specific Integrated Circuits (ASIC), programmable logic controllers, embedded microcontrollers, etc., in addition to data processing programs. Such hardware that can implement the methods of the present application may also constitute the present application.
The flowcharts and block diagrams in the figures of the present application illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In view of the foregoing, it will be appreciated by those skilled in the art that various combinations and/or combinations of features described in the various embodiments of the disclosure and/or in the claims may be provided even though such combinations or combinations are not explicitly described in this application. In particular, the features recited in the various embodiments and/or claims of the present application may be combined in various combinations and/or combinations without departing from the spirit and teachings of the application, all of which are within the scope of the disclosure.

Claims (10)

1. A personalized webpage ranking method based on linear algebra is characterized by comprising the following steps:
step one: taking original graph data corresponding to a group of webpages as an input webpage set, taking an initial webpage set as a subset of the input webpage set, and constructing a graph adjacency matrix according to the original graph data;
step two: using algebraic Trim-1 method to find strong connected component with node number 1 on the adjacent matrix of the graph and reconstructing the adjacent matrix to obtain the graph represented by the reconstructed adjacent matrix;
step three: using algebraic FW-BW algorithm based on algebraic breadth-first search to find the largest strongly connected component in the graph represented using the reconstructed adjacency matrix;
step four: using an algebraic label propagation algorithm based on matrix multiplication to find strongly connected components in a graph represented using contiguous matrices;
step five: and calculating personalized webpage ranks on the reconstructed directed acyclic graph by using a PM algorithm, wherein the size of each webpage ranking value indicates the association tightness degree of the webpage and the initial webpage set.
2. The method for ranking personalized web pages based on linear algebra of claim 1, wherein the constructing graph adjacency matrix based on the original graph data comprises the following specific steps:
s101: inputting a webpage set V, a webpage link relation and a starting webpage set V 0 And V 0 Number of clicks of middle web page l= { (v, L) v )|v∈V 0 };l v The number of clicks for web page v; the initial webpage set V 0 There are n web pages in the middle;
s102: constructing a graph G based on the input web pages and web page link relations 0 V is set up 0 Marking the corresponding node;
s103: calculating the initial probability P= { (v, P) of the initial webpage according to the click number of the webpage v )|v∈V 0 },
Figure FDA0004106672720000011
l u The number of clicks for web page u; p is p v The probability is the webpage v starting probability;
s104: according to the diagram G 0 Generating a sparse matrix FW, namely a graph adjacent matrix, wherein a transposed matrix of FW is BW, and storing FW and BW according to a sparse compression matrix format.
3. The personalized web page ranking method based on linear algebra of claim 1, wherein the strong connected component method for finding node number 1 on the graph adjacency matrix by using algebraic Trim-1 method is as follows:
s201: using the full 1 vector to carry out sparse matrix multiplication with FW and BW respectively, wherein the result is respectively the input degree f (V) and the output degree b (V) of all nodes V in the input webpage set V, and recording whether f (V) and b (V) are 0 or not by using Boolean vectors;
s202: vector addition of f (v) and b (v) based on algebraic primitive, i.e. m (v) =f (v) & b (v); m (v) is a marking vector, and a node recorded as 0 in m (v) is a strong communication component with the node number of 1;
s203: multiplying m (v) with FW and BW respectively, calculating whether the ingress and egress degrees f (v) and b (v) of the rest nodes with m (v) being 1 are 0 or not, and recording by using Boolean vectors;
s204: repeating S202 and S203 until m (v) is no longer changed;
s205: recording the strong connected component id corresponding to the node recorded as 0 in m (v) as the own node serial number;
s206: traversing all edges (i, j) in FW and BW, wherein i and j are nodes, and only reserving the edges meeting m (i) =1 & (m (j) =1, namely eliminating the strong connected component with the node number of 1 from the graph, so as to obtain the graph with the strong connected component removed.
4. The method for ranking personalized web pages based on linear algebra according to claim 1, wherein algebraic FW-BW algorithm based on algebraic breadth-first search is used to find strongly connected components in the graph represented by the reconstructed adjacency matrix, specifically comprising:
s301: calculating the input degree and the output degree of all nodes V in the input webpage V and multiplying to obtain an n (V) vector;
s302: sorting n (v) to obtain the maximum n (v) 0 );
S303: from v 0 Starting algebraic breadth-first search based on an adjacency matrix FW to obtain a BFS tree B;
s304: from v 0 Starting algebraic breadth-first search based on the adjacency matrix BW to obtain a BFS tree F;
s305: the intersection S of B and F is v 0 And the node nodes in the S are removed from FW and BW by the strong connected component S.
5. The method for ranking personalized web pages based on linear algebra according to claim 1, wherein the algebraic label propagation algorithm based on matrix multiplication is used to find strongly connected components in the graph represented by the adjacency matrix, and the specific method is as follows:
s401: assigning a minu (v) =v to the node v, transmitting the minu (v) to the reachable neighbor node of the node v by using matrix multiplication, updating the minu (v) to a received minimum value by the node, and repeating the process until the minu (v) is not updated any more;
s402: assigning minub (v) =v to a node v satisfying minuf (v) =v, transmitting the minub (v) to a neighbor node which can reach itself by using matrix multiplication, updating the minub (v) to a received minimum value by the node, and repeating the process until the minub (v) is not updated any more;
s403: labeling the strong connected component ID of a group of nodes of minf (v) =minb (v) =u as u, and eliminating from FW and BW;
s404: s401, S402, S403 are repeated until FW and BW are empty.
6. The method for calculating personalized web page ranking using PM algorithm on the reconstructed directed acyclic graph according to claim 1, wherein the method for calculating personalized web page ranking using PM algorithm comprises:
s501: graph G 0 Reconstructing, compressing nodes belonging to the same strong connected component into a super node, generating new edges according to the edge relation among the strong connected components, and obtaining a directed acyclic graph G 1 And stored in a contiguous matrix form;
s502: will V 0 The strong connected component of the middle node is marked as the initial component to obtain G 1 Corresponding initial component set V in 1
S503: for V 1 The nodes in the tree are respectively subjected to algebraic breadth-first search, and then all obtained breadth-first search trees are combined to obtain a component reachable sub-graph G 2
S504: according to G 2 Middle node and G 0 Corresponding relation to G 0 Reconstructing to obtain reachable subgraph G 3
S505: at G 3 Computing G using PM algorithm 3 V on 0 Personalized network rank vector of (3)
Figure FDA0004106672720000031
S506: by means of
Figure FDA0004106672720000032
Calculate V 0 Personalized network rank->
Figure FDA0004106672720000033
Wherein 0 is a zero vector with dimension V 0 Unreachable subgraph G of (1) 0 \G 3 The number of intermediate nodes.
7. A linear algebra-based personalized web page ranking system, comprising the following modules:
the image adjacency matrix construction module takes original image data corresponding to a group of webpages as an input webpage set, an initial webpage set is a subset of the input webpage set, and an image adjacency matrix is constructed according to the original image data;
a reconstruction module, configured to find a strong connected component with node number 1 on the graph adjacency matrix by using algebraic Trim-1 method and perform reconstruction on the adjacency matrix to obtain a graph represented by the reconstructed adjacency matrix;
an algebraic FW-BW algorithm module that uses algebraic FW-BW algorithms based on algebraic breadth-first search to find the largest strongly connected component in the graph represented using the reconstructed adjacency matrix;
an algebraic label propagation algorithm module that uses an algebraic label propagation algorithm based on matrix multiplication to find strongly connected components in a graph represented using an adjacency matrix;
and the webpage ranking module is used for calculating personalized webpage ranking on the reconstructed directed acyclic graph by using a PM algorithm, and the size of each webpage ranking value indicates the association tightness degree of the webpage and the initial webpage set.
8. The personalized web page ranking system based on linear algebra of claim 7, wherein the graph adjacency matrix construction module specifically implements the following steps:
s101: inputting a webpage set V, a webpage link relation and a starting webpage set V 0 And V 0 Number of clicks of middle web page l= { (v, L) v )|v∈V 0 };l v The number of clicks for web page v; the initial webpage set V 0 There are n web pages in the middle;
s102: constructing a graph G based on the input web pages and web page link relations 0 V is set up 0 Marking the corresponding node;
s103: calculating the initial probability P= { (v, P) of the initial webpage according to the click number of the webpage v )|v∈V 0 },
Figure FDA0004106672720000041
l u The number of clicks for web page u; p is p v The probability is the webpage v starting probability;
s104: according to the diagram G 0 Generating a sparse matrix FW, namely a graph adjacent matrix, wherein a transposed matrix of FW is BW, and storing FW and BW according to a sparse compression matrix format.
9. The personalized web page ranking system based on linear algebra of claim 8, wherein the reconstruction module is specifically configured to:
s201: using the full 1 vector to carry out sparse matrix multiplication with FW and BW respectively, wherein the result is respectively the input degree f (V) and the output degree b (V) of all nodes V in the input webpage set V, and recording whether f (V) and b (V) are 0 or not by using Boolean vectors;
s202: vector addition of f (v) and b (v) based on algebraic primitive, i.e. m (v) =f (v) & b (v); m (v) is a marking vector, and a node recorded as 0 in m (v) is a strong communication component with the node number of 1;
s203: multiplying m (v) with FW and BW respectively, calculating whether the ingress and egress degrees f (v) and b (v) of the rest nodes with m (v) being 1 are 0 or not, and recording by using Boolean vectors;
s204: repeating S202 and S203 until m (v) is no longer changed;
s205: recording the strong connected component id corresponding to the node recorded as 0 in m (v) as the own node serial number;
s206: traversing all edges (i, j) in FW and BW, wherein i and j are nodes, and only reserving the edges meeting m (i) =1 & (m (j) =1, namely eliminating the strong connected component with the node number of 1 from the graph, so as to obtain the graph with the strong connected component removed.
10. The linear algebra-based personalized web page ranking system of claim 9, wherein the algebra FW-BW algorithm module comprises the following steps:
s301: calculating the input degree and the output degree of all nodes V in the input webpage V and multiplying to obtain an n (V) vector;
s302: sorting n (v) to obtain the maximum n (v) 0 );
S303: from v 0 Starting algebraic breadth-first search based on an adjacency matrix FW to obtain a BFS tree B;
s304: from v 0 Starting algebraic breadth-first search based on the adjacency matrix BW to obtain a BFS tree F;
s305: the intersection S of B and F is v 0 The strong communication component S is located, and node nodes in the S are removed from FW and BW;
the algebraic label propagation algorithm module specifically realizes the following steps:
s401: assigning a minu (v) =v to the node v, transmitting the minu (v) to the reachable neighbor node of the node v by using matrix multiplication, updating the minu (v) to a received minimum value by the node, and repeating the process until the minu (v) is not updated any more;
s402: assigning minub (v) =v to a node v satisfying minuf (v) =v, transmitting the minub (v) to a neighbor node which can reach itself by using matrix multiplication, updating the minub (v) to a received minimum value by the node, and repeating the process until the minub (v) is not updated any more;
s403: labeling the strong connected component ID of a group of nodes of minf (v) =minb (v) =u as u, and eliminating from FW and BW;
s404: repeating S401, S402 and S403 until FW and BW are empty;
the webpage ranking module specifically realizes the following steps:
s501: graph G 0 Reconstructing, compressing nodes belonging to the same strong connected component into a super node, generating new edges according to the edge relation among the strong connected components, and obtaining a directed acyclic graph G 1 And stored in a contiguous matrix form;
s502: will V 0 The strong connected component of the middle node is marked as the initial component to obtain G 1 Corresponding initial component set V in 1
S503: for V 1 The nodes in the tree are respectively subjected to algebraic breadth-first search, and then all obtained breadth-first search trees are combined to obtain a component reachable sub-graph G 2
S504: according to G 2 Middle node and G 0 Corresponding relation to G 0 Reconstructing to obtain reachable subgraph G 3
S505: at G 3 Computing G using PM algorithm 3 V on 0 Personalized network rank vector of (3)
Figure FDA0004106672720000061
S506: by means of
Figure FDA0004106672720000062
Calculate V 0 Personalized network rank->
Figure FDA0004106672720000063
Wherein 0 is a zero vector with dimension V 0 Unreachable subgraph G of (1) 0 \G 3 The number of intermediate nodes. />
CN202310194387.8A 2023-03-02 2023-03-02 Personalized webpage ranking method and system based on linear algebra Pending CN116226525A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310194387.8A CN116226525A (en) 2023-03-02 2023-03-02 Personalized webpage ranking method and system based on linear algebra

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310194387.8A CN116226525A (en) 2023-03-02 2023-03-02 Personalized webpage ranking method and system based on linear algebra

Publications (1)

Publication Number Publication Date
CN116226525A true CN116226525A (en) 2023-06-06

Family

ID=86569226

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310194387.8A Pending CN116226525A (en) 2023-03-02 2023-03-02 Personalized webpage ranking method and system based on linear algebra

Country Status (1)

Country Link
CN (1) CN116226525A (en)

Similar Documents

Publication Publication Date Title
Zhang et al. Weisfeiler-lehman neural machine for link prediction
Censor-Hillel et al. Fast approximate shortest paths in the congested clique
US8990209B2 (en) Distributed scalable clustering and community detection
CN112132287A (en) Distributed quantum computing simulation method and device
CN109656798B (en) Vertex reordering-based big data processing capability test method for supercomputer
US10191998B1 (en) Methods of data reduction for parallel breadth-first search over graphs of connected data elements
CN110719106B (en) Social network graph compression method and system based on node classification and sorting
Van Iersel et al. Constructing level-2 phylogenetic networks from triplets
US9026539B2 (en) Ranking supervised hashing
Iverson et al. Evaluation of connected-component labeling algorithms for distributed-memory systems
CN111309979A (en) RDF Top-k query method based on neighbor vector
CN110659394A (en) Recommendation method based on two-way proximity
CN104809161A (en) Method and system for conducting compression and query on sparse matrix
CN112905906B (en) Recommendation method and system fusing local collaboration and feature intersection
CN113228059A (en) Cross-network-oriented representation learning algorithm
CN116974249A (en) Flexible job shop scheduling method and flexible job shop scheduling device
CN116226525A (en) Personalized webpage ranking method and system based on linear algebra
Luo et al. Rethinking ResNets: improved stacking strategies with high order schemes
Poormohammadi et al. TripNet: a method for constructing rooted phylogenetic networks from rooted triplets
Huang et al. Community detection algorithm for social network based on node intimacy and graph embedding model
Gill et al. Comparative study of ant colony and genetic algorithms for VLSI circuit partitioning
CN112733926A (en) Multi-layer network clustering method based on semi-supervision
Jabbour et al. Summarizing big graphs by means of pseudo-boolean constraints
Long et al. A unified community detection algorithm in large-scale complex networks
Poormohammadi A New Heuristic Algorithm for MRTC Problem

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination