CN115858145A - Ultra-large-scale e-commerce network partition optimization method and device based on graph node feature pre-calculation - Google Patents

Ultra-large-scale e-commerce network partition optimization method and device based on graph node feature pre-calculation Download PDF

Info

Publication number
CN115858145A
CN115858145A CN202211381008.8A CN202211381008A CN115858145A CN 115858145 A CN115858145 A CN 115858145A CN 202211381008 A CN202211381008 A CN 202211381008A CN 115858145 A CN115858145 A CN 115858145A
Authority
CN
China
Prior art keywords
partition
node
nodes
distributed server
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211381008.8A
Other languages
Chinese (zh)
Inventor
金路
鲍迪恩
蒋炜
汪陈笑
马顺华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Bangrui Technology Co ltd
Zhejiang Bangsheng Technology Co ltd
Original Assignee
Hangzhou Bangrui Technology Co ltd
Zhejiang Bangsheng Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Bangrui Technology Co ltd, Zhejiang Bangsheng Technology Co ltd filed Critical Hangzhou Bangrui Technology Co ltd
Priority to CN202211381008.8A priority Critical patent/CN115858145A/en
Publication of CN115858145A publication Critical patent/CN115858145A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention discloses a partition optimization method and a partition optimization device for a super-large-scale e-commerce network based on graph node feature pre-calculation. The method comprises the steps of maintaining different numbers of central nodes in each distributed server partition, obtaining partition quality scores by calculating two kinds of homogeneous feature similarity and node degree similarity of the newly added nodes and the central nodes in each partition of the distributed servers and loads of each partition of the distributed servers, and partitioning the nodes into the distributed server partition with the highest score, so that the quality of each partition of the distributed servers is improved. In the dynamic updating stage, the invention uses the designed graph dividing quality scoring function to divide the input stream information and update the central nodes of all the partitions of the distributed server in real time, thereby completing the maintenance of the super-large scale dynamic E-commerce network and improving the calculation efficiency of the distributed database storage engine.

Description

Ultra-large-scale e-commerce network partition optimization method and device based on graph node feature pre-calculation
Technical Field
The invention relates to the field of graph calculation of an ultra-large scale e-commerce network, in particular to a method and a device for optimizing a partition of the ultra-large scale e-commerce network based on graph node characteristic pre-calculation, which is a distributed graph database storage optimization technology for the ultra-large scale e-commerce network formed by a multi-dimensional multi-scale cube.
Background
With the rapid development of internet technology, each link of traditional business activities is gradually electronized and networked, and people have emerged many applications related to e-commerce networks in daily life, such as gathering, treasure making, and the kyoto. The ultra-large scale e-commerce network often contains abundant data information, and huge social benefit and economic benefit can be generated by mining the information. Therefore, data mining on the e-commerce network becomes a great research hotspot in the current society.
In the research field, these e-commerce networks are usually represented by graph structures, wherein nodes in the graph represent entities in the e-commerce network, feature vectors of the nodes represent multidimensional information of the entities, and edges in the graph represent relationships between the entities. With the lapse of time, the topology and the node representation of the network are changed continuously, and the scale of the network is increased exponentially, so that a distributed system is required to store the node information of the network, and the newly added stream data is dynamically divided and updated.
Since a large number of algorithms in graph computation are iteratively computed with edges, the number of edges between sub-graphs must be as small as possible to reduce the communication cost between different nodes. Moreover, in order to improve the computational efficiency of the distributed graph database storage engine, it is necessary to ensure that the number of nodes in each partition of the distributed server is relatively balanced. Most of the existing graph partitioning technologies taking nodes as centers are only suitable for static networks or dynamic networks with small scales, and the obvious defects exist in ultra-large-scale power-provider networks with obvious long-tail characteristics, because the height nodes in the power-law graph can obviously increase the communication cost with each partition node of a distributed server. In addition, for a super-large scale dynamic e-commerce network with abundant node information, the methods do not fully utilize the characteristics of the nodes.
Disclosure of Invention
Aiming at the defects and improvement requirements of the prior art, the invention provides a partition optimization method and a partition optimization device of a super-large scale electric business network based on graph node characteristic precomputation, on the premise of improving the equipment reuse rate by using a distributed database storage engine, a class of node partitioning methods for the super-large scale dynamic electric business network formed by a multi-dimensional multi-scale cube is introduced, homogeneous nodes in the super-large scale dynamic electric business network are partitioned to the same distributed server partition as much as possible by fully utilizing multi-dimensional information of nodes in the super-large scale dynamic electric business network, height nodes in the super-large scale electric business network are distributed to the same distributed server partition as much as possible by calculating the similarity score of the nodes, and the partition quality of each partition of the distributed server is improved by calculating the load balance partition of each partition of the distributed server and improving the calculation efficiency of the super-large scale electric business network in subsequent graph calculation tasks.
The technical scheme adopted by the invention for solving the technical problem is as follows: in a first aspect, the invention provides a partition optimization method for a super-large-scale electricity and commerce network based on graph node feature pre-calculation, which comprises the following steps:
(1) Acquiring E-commerce network diagram data G (V, E) at an initial moment, wherein V = (V) 1 ,v 2 ,…,v n ) Representing a set of entities in a network, v n Denotes the nth entity, E = (E) 1 ,e 2 ,…,e m ) Representing a set of connection states between entities, e m Represents the m-th state; the ith entity to be divided is the ith node in the graph
Figure BDA0003928272910000021
Wherein id represents the unique code of the entity, enum i Representing enumerated characteristics of entities, including nationality, gender, occupation, number l 1 ,scalar i Scalar features representing entities, including age, income, quantity l 2
(2) Acquiring the maximum load C = (C) of each partition of the distributed server 1 ,c 2 ,…,c k ) K is the partition number of the distributed server;
(3) Load P = (P) for initializing each partition of distributed server 1 ,p 2 ,…,p k ) 0, central node set S = (S) of each partition of distributed server 1 ,s 2 ,…,s k ) Is empty;
(4) Obtaining an adjacency matrix A representing the relationship between nodes according to the E-business network diagram data G (V, E) at the initial moment;
(5) Calculating l in the E-business network based on the adjacency matrix A 1 Modularity of individual enumerated features
Figure BDA0003928272910000022
Recording the enumeration feature with the maximum modularity as ENUM;
(6) Calculating l in the E-business network based on the adjacency matrix A 2 Coefficient of parity of scalar features
Figure BDA0003928272910000023
And marking the SCALAR feature with the maximum coefficient of parity as SCALAR;
(7) For a node v in an e-commerce network i Performing the following operations:
(7.1) arbitrarily selecting a distributed Server partition p j If | p j |<c j Then compute node v i Partitioning quality scores at the distributed server partitions; otherwise, the step (7.1) is executed again until the node v is obtained i Partition quality scores to all of the less loaded distributed server partitions;
(7.2) if the partition quality scores of all the underloaded distributed server partitions contain only one maximum, then node v is assigned i Partition to the highest scoring target distributed server partition p des Otherwise, randomly selecting one from the distributed server partitions with the highest scores as the node v i Target distributed server partition p des
(7.3) if the target distributed server partition p des If the number of the loaded nodes is less than 3, the step (8) is carried out; otherwise, update p des Set of central nodes s des
(8) Continuously repeating the step (7) until all nodes in the E-business network are divided;
(9) And (5) for any newly added edge flow data, executing the step (7) on the nodes in the newly added data, and realizing the maintenance and the update of the power business network.
Further, the e-commerce network is a hundred million-level ultra-large-scale e-commerce network formed by a multi-dimensional multi-scale cube.
Further, in step (5), the modularity of any enumerated feature is calculated as follows:
Figure BDA0003928272910000031
wherein q is h Representing the connection degree between the similar nodes of the ultra-large scale electricity business network on the h-th enumeration characteristic, d i Representing a node v i Degree, mu, in the adjacency matrix A i Representing a node v i At the h-th enumerating a type of feature, the type is represented by integer values 1 through t 1 Denotes, t 1 Is the total number of types, δ (μ), of the h-th enumerated feature of the node ij ) Is a kronecker function with coefficient values of
Figure BDA0003928272910000032
Further, in step (6), the coformulation coefficient of any scalar feature is calculated as follows:
Figure BDA0003928272910000033
wherein r is b Representing the degree of connection between like nodes of the very large scale electricity quotient network on the b scalar characteristic, d i Representing a node v i Degree, x, in the adjacency matrix A i Representing a node v i At the value of the b-th scalar feature, δ (x) i ,x j ) Is a kronecker function with coefficient values of
Figure BDA0003928272910000034
Further, in step (7.1), the partition quality score is calculated as follows:
Score(v i ,p)=max{sim ENUM (v i ,v j )+sim SCALAR (v i ,v j )-sim DEGREE (v i ,v j )}+αγ|p| γ-1
where p denotes the partition of the distributed server, sim ENUM (v i ,v j ) And sim SCALAR (v i ,v j ) Respectively representing the similarity of two nodes on the enumeration feature ENUM and the SCALAR feature SCALAR, and calculating the similarity in the following way:
Figure BDA0003928272910000035
wherein, F i Representing point v i Vector representation on a feature F, wherein the feature F represents an enumerated feature ENUM or a SCALAR feature SCALAR;
partitioning sim in quality score DEGREE (v i ,v j ) Representing the similarity of two nodes in terms of node degree, and the calculation mode is as follows:
Figure BDA0003928272910000036
wherein, d is tablePair node v i And v j Number of randomly selected neighbor nodes, A id Representing the value of the adjacency matrix a with row coordinate i and ordinate d,
Figure BDA0003928272910000037
represents the mean value ≥ of the element from the ith row in the adjacency matrix>
Figure BDA0003928272910000038
In dividing quality scores
Figure BDA0003928272910000039
Where γ is the influence coefficient of adjusting partition load on the partition quality score.
Further, in step (7.1): if the load of the current distributed server partition is 0, the node v i The partition quality score at the distributed server partition contains only the load cost score α γ | p i | γ-1
If the number of the load nodes of the current distributed server partition is more than 0 and less than 3, the node v j Dividing the quality score into nodes v in the distributed server partition j The mean value of the partition quality scores of all the nodes partitioned to the distributed server;
if the number of the load nodes of the current distributed server partition is more than or equal to 3, the node v j Dividing the quality score into nodes v in the distributed server partition j The mean of the partition quality scores for all the central nodes partitioned to the distributed server.
Further, in step (7.4), partition p is updated des Set of central nodes s des The DBSCAN algorithm is adopted, and the specific steps are as follows:
a) Setting the MinPts value in the algorithm to
Figure BDA0003928272910000041
The radius is 1;
b) For any node in the partition, if the number of neighbor nodes of the node is more than or equal to
Figure BDA0003928272910000042
Adding the point into the central node set of the current partition;
c) And (c) repeatedly executing the step (b) until all the nodes in the current partition are traversed.
In a second aspect, the invention further provides a partition optimization device for a very large scale electricity and commercial network based on graph node feature pre-calculation, which comprises a memory and one or more processors, wherein executable codes are stored in the memory, and when the processors execute the executable codes, the steps of the partition optimization method for the very large scale electricity and commercial network based on graph node feature pre-calculation are realized.
In a third aspect, the present invention further provides a computer readable storage medium, on which a program is stored, which when executed by a processor, implements the steps of the method for partition optimization of a very large scale power grid network based on graph node feature pre-computation.
The invention has the beneficial effects that:
the distributed graph database storage engine is used for the ultra-large-scale dynamic E-business network, so that the consumption of the server is effectively saved, the time delay is greatly reduced, and the calculation throughput efficiency is improved; a brand-new quality scoring function is designed aiming at rich node information of a super-large-scale dynamic E-commerce network, the partitioning quality of each partition of a distributed server is guaranteed, and the calculation efficiency of a distributed database storage engine is improved. Compared with the traditional database storage method based on a single machine, the method based on the distributed database storage engine can effectively solve the problem of memory bottleneck of the ultra-large-scale e-commerce network, and improves the efficiency of computing throughput and the security of the database; compared with the traditional node-based graph partitioning method, the method makes full use of the node information of the ultra-large-scale dynamic E-business network, improves the partitioning accuracy, and reduces the calculation complexity.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a partition optimization method for a very large scale electricity merchant network based on graph node feature pre-calculation provided by the present invention.
Fig. 2 is a structural diagram of the ultra-large scale commercial network partition optimization device based on graph node feature pre-calculation provided by the invention.
Detailed Description
The present invention will now be described in further detail with reference to the accompanying drawings and examples, which are provided for illustration of the present invention and are not intended to limit the scope of the present invention.
As shown in fig. 1, a specific implementation process of the method for partition optimization of a very large scale power utility network based on graph node feature pre-calculation according to the embodiment of the present invention is as follows:
(1) The method comprises the following steps of obtaining data information of the ultra-large scale e-commerce network and the maximum load of each partition of the distributed server:
acquiring E-commerce network diagram data G (V, E) at an initial moment, wherein V = (V) 1 ,v 2 ,…,v 10000 ) Representing a set of entities in the network, E = (E) 1 ,e 2 ,…,e 500000 ) Representing a set of connection states between entities; each entity to be partitioned may be represented as a node v i ={id,enum 1 ,enum 2 ,enum 3 ,scalar 1 ,scalar 2 Where id represents the unique code of the entity, enum i Respectively representing enumerated features of the entities: nationality, gender, occupation; scalar i Scalar features representing entities respectively: age, income;
acquiring the maximum load C = (C) of each partition of the distributed server 1 ,c 2 ,…,c 8 ) Namely, the partition number of the distributed servers is 8;
(2) Data initialization:
initialisation of fractionsLoad P = (P) of each partition of distributed server 1 ,p 2 ,…,p 8 ) 0, central node set S = (S) of each partition of distributed server 1 ,s 2 ,…,s 8 ) Is empty;
(3) In the pre-division stage, the homogeneous enumeration characteristic and the homogeneous scalar characteristic of the network are extracted by using the E-commerce network diagram information at the initial moment, and the method comprises the following specific steps:
(3.1) obtaining a 10000-order adjacency matrix A representing the relationship between nodes according to the E-commerce network diagram data G (V, E) at the initial moment;
Figure BDA0003928272910000051
(3.2) calculating the modularity Q = (Q) of 3 enumerated features in the e-commerce network 1 ,q 2 ,q 3 ) The modularity of any enumerated feature is calculated as follows:
Figure BDA0003928272910000052
wherein q is h Representing the connection degree between the similar nodes of the ultra-large scale electricity business network on the h-th enumeration characteristic, d i Representing a node v i Degree, mu, in the adjacency matrix A i Representing a node v i At the h-th enumerating a type of feature, the type is represented by integer values 1 through t 1 Denotes t 1 Is the total number of types, δ (μ), of the h-th enumerated feature of the node ij ) Is a kronecker function with coefficient values of
Figure BDA0003928272910000061
Obtaining an enumeration characteristic 'occupation' with the maximum modularity after calculation;
(3.3) calculating l in the E-commerce network 2 Coefficient of parity of scalar features R = (R) 1 ,r 2 ) The coequal coefficient for any scalar feature is calculated as follows:
Figure BDA0003928272910000062
wherein r is b Representing the degree of connection between like nodes of the very large scale electricity quotient network on the b scalar characteristic, d i Representing a node v i Degree, x, in the adjacency matrix A i Representing a node v i At the value of the b-th scalar feature, δ (x) i ,x j ) Is a kronecker function with coefficient values of
Figure BDA0003928272910000063
Calculating to obtain the age of the scalar feature with the maximum coefficient of identity;
(4) Performing the following operations on any one node in the e-commerce network:
(4.1) for each not-yet-fully loaded distributed Server partition, compute node v i The partition quality score of the distributed server partition is calculated in the following way:
Score(v i ,p)=max{sim ENUM (v i ,v j )+sim SCALAR (v i ,v j )-sim DEGREE (v i ,v j )}+αγ|p| γ-1
where p denotes the partition of the distributed server, sim ENUM (v i ,v j ) And sim SCALAR (v i ,v j ) Respectively representing the similarity of two nodes on the enumerated feature "occupation" and the scalar feature "age", and calculating the way as follows:
Figure BDA0003928272910000064
wherein, F i Representing point v i Vector representation on the feature F, wherein the feature F represents an enumeration feature ENUM or a SCALAR feature SCALAR;
partitioning sim in quality score DEGREE (v i ,v j ) Showing two sectionsThe similarity of points in the node degree is calculated as follows:
Figure BDA0003928272910000065
wherein d represents a pair node v i And v j Number of randomly selected neighbor nodes, A id Representing the value of the adjacency matrix a with row coordinate i and ordinate d,
Figure BDA0003928272910000066
represents the mean value ≥ of the element from the ith row in the adjacency matrix>
Figure BDA0003928272910000067
At a computing node v j When the partition quality of the distributed server partition is divided, the following rules are adopted:
if the load of the current distributed server partition is 0, the node v i The partition quality score in the distributed server partition only contains the load cost score α γ | p i | γ-1
If the number of the load nodes of the current distributed server partition is more than 0 and less than 3, the node v j Dividing the quality score into nodes v in the distributed server partition j The mean value of the partition quality scores of all the nodes partitioned to the distributed server;
if the number of the load nodes of the current distributed server partition is more than or equal to 3, the node v j Dividing the quality score into nodes v in the distributed server partition j The mean value of the partition quality scores of all the central nodes partitioned to the distributed server;
(4.2) if the quality score of the division of the node to each not-fully loaded distributed server partition only contains a maximum value, dividing the node to the distributed server partition p with the highest score des Otherwise, randomly selecting one target distributed server partition p as the node from the distributed server partitions with the highest scores des
(4.3) if the distributed Server partition p des If the number of the loaded nodes is less than 3, the step (4.4) is carried out; otherwise updating p using DBSCAN algorithm des Set of central nodes s des The method comprises the following specific steps:
(a) Setting the value of MinPts in the algorithm to
Figure BDA0003928272910000071
The radius is 1;
(b) For any node in the distributed server partition, if the number of neighbor nodes of the node is more than or equal to
Figure BDA0003928272910000072
Adding the point into the central node set of the current distributed server partition;
(c) Repeatedly executing the step (b) until all nodes in the current distributed server partition are traversed;
(4.4) continuously repeating the step (4) until all nodes in the network are divided;
(5) In the dynamic maintenance stage, for newly added hundred million pieces of edge flow data, the step (4) is executed on ten million nodes in the data, and maintenance and updating of hundred million super-large-scale power business networks are completed.
Entities in the ultra-large scale e-commerce network contain abundant information, and a multi-dimensional multi-scale cube is used for representing, so that network information can be effectively captured, and the accuracy of the result of subsequent data mining is improved. At the same time, a point-centric distributed storage structure is determined using the structure characterizing entity. Currently, the point-centered graph partitioning technology has a significant disadvantage in the ultra-large scale power grid network with a significant long tail characteristic, because the height nodes in the power law graph significantly increase the communication cost of the partition nodes. Therefore, the invention designs a brand-new partition quality score function, in the function, the homogeneous nodes are partitioned to the same partition as much as possible by calculating the similarity scores of two kinds of homogeneous characteristics of the newly added node and the central node of the partition of the distributed server, the height nodes are distributed in the same partition as much as possible together with the neighbors of the height nodes by calculating the negative value of the similarity scores of the node degrees, and the partition load is balanced by calculating the load cost, so that the partition calculation efficiency is effectively improved under the condition of reducing the communication cost.
For the ultra-large scale electric business network, the network scale is exponentially increased, and the initial network scale is relatively small. Therefore, the calculation amount of the feature matching degree of the subsequent nodes can be greatly reduced by calculating the most obvious enumerated features and scalar features of the initial network. In addition, when the nodes of the distributed server partitions reach a certain scale, the central node set of the distributed server partitions is used for replacing all the nodes of the distributed server partitions, so that the calculation cost of the matching degree of the nodes can be effectively reduced.
Corresponding to the embodiment of the partition optimization method of the ultra-large-scale electric business network based on graph node characteristic pre-calculation, the invention also provides an embodiment of a partition optimization device of the ultra-large-scale electric business network based on graph node characteristic pre-calculation.
Referring to fig. 2, the device for optimizing partitions of a very large scale e-commerce network based on graph node feature pre-calculation provided in the embodiment of the present invention includes a memory and one or more processors, where the memory stores executable codes, and when the processors execute the executable codes, the device is configured to implement the method for optimizing partitions of a very large scale e-commerce network based on graph node feature pre-calculation in the embodiment.
The embodiment of the partition optimization device for the ultra-large-scale e-commerce network based on graph node characteristic pre-calculation in the invention can be applied to any equipment with data processing capability, and the any equipment with data processing capability can be equipment or devices such as computers. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a device in a logical sense, a processor of any device with data processing capability reads corresponding computer program instructions in the nonvolatile memory into the memory for operation. In terms of hardware, as shown in fig. 2, the present invention is a hardware structure diagram of any device with data processing capability where the very large scale commercial network partition optimization apparatus pre-computed based on the graph node characteristics is located, and in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 2, any device with data processing capability where the apparatus is located in the embodiments may also include other hardware according to the actual function of the any device with data processing capability, which is not described herein again.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement without inventive effort.
The embodiment of the invention also provides a computer readable storage medium, on which a program is stored, and when the program is executed by a processor, the partition optimization method for the ultra-large scale commercial network based on graph node feature pre-calculation in the above embodiment is implemented.
The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any data processing capability device described in any of the foregoing embodiments. The computer readable storage medium may also be any external storage device of a device with data processing capabilities, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), etc. provided on the device. Further, the computer readable storage medium may include both an internal storage unit and an external storage device of any data processing capable device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing-capable device, and may also be used for temporarily storing data that has been output or is to be output.
The above-described embodiments are intended to illustrate rather than to limit the invention, and any modifications and variations of the present invention are within the spirit of the invention and the scope of the appended claims.

Claims (9)

1. A partition optimization method for a super-large-scale e-commerce network based on graph node feature pre-calculation is characterized by comprising the following steps:
(1) Acquiring E-commerce network diagram data G (V, E) at an initial moment, wherein V = (V) 1 ,v 2 ,…,v n ) Representing a set of entities in an e-commerce network, v n Denotes the nth entity, E = (E) 1 ,e 2 ,…,e m ) Representing a set of connection states between entities, e m Represents the m-th state; the ith entity to be divided is the ith node in the E-commerce network diagram data
Figure FDA0003928272900000011
Wherein id represents the unique code of the entity, enum i Representing enumerated characteristics of entities, including nationality, gender, occupation, number l 1 ,scalar i Scalar features representing entities, including age, income, quantity l 2
(2) Acquiring the maximum load C = (C) of each partition of the distributed server 1 ,c 2 ,…,c k ) K is the partition number of the distributed server;
(3) Load P = (P) for initializing each partition of distributed server 1 ,p 2 ,…,p k ) 0, central node set S = (S) of each partition of distributed server 1 ,s 2 ,…,s k ) Is empty;
(4) Obtaining an adjacency matrix A representing the relationship between nodes according to the E-commerce network diagram data G (V, E) at the initial moment;
(5) Calculating l in the E-business network based on the adjacency matrix A 1 Modularity of individual enumerated features
Figure FDA0003928272900000012
Recording the enumeration feature with the maximum modularity as ENUM;
(6) Calculating l in the E-business network based on the adjacency matrix A 2 Coefficient of parity of scalar features
Figure FDA0003928272900000013
And marking the SCALAR feature with the maximum coefficient of parity as SCALAR;
(7) For a node v in the E-commerce network diagram data i The following operations are performed:
(7.1) arbitrarily selecting a distributed Server partition p j If | p j |<c j Then compute node v i Partitioning quality scores at the distributed server partitions; otherwise, the step (7.1) is executed again until the node v is obtained i Partition quality scores to all of the unloaded distributed server partitions;
(7.2) if the partition quality scores of all the underloaded distributed server partitions contain only one maximum, then node v is assigned i Partition to the highest scoring target distributed server partition p des Otherwise, randomly selecting one from the distributed server partitions with the highest scores as the node v i Target distributed server partition p des
(7.3) if target distributed Server partition p des If the number of the loaded nodes is less than 3, the step (8) is carried out; otherwise, update p des Set of central nodes s des
(8) Continuously repeating the step (7) until all the nodes in the E-commerce network are divided;
(9) And (5) for any newly added edge flow data, executing the step (7) on the nodes in the newly added data, and realizing the maintenance and the update of the power business network.
2. The partition optimization method for the ultra-large scale electricity and commercial network based on graph node feature pre-calculation is characterized in that the electricity and commercial network is a hundred million-level ultra-large scale electricity and commercial network formed by a multi-dimensional multi-scale cube.
3. The partition optimization method for the ultra-large-scale electric business network based on graph node feature pre-calculation is characterized in that in the step (5), the modularity calculation mode of any enumerated feature is as follows:
Figure FDA0003928272900000021
wherein q is h Representing the degree of connectivity between nodes of the same kind in the h-th enumerated characteristic of the e-commerce network, d i Representing a node v i Degree, mu, in the adjacency matrix A i Representing a node v i At the h-th enumerating a type of feature, the type is represented by integer values 1 through t 1 Denotes, t 1 Is the total number of types, δ (μ), of the h-th enumerated feature of the node ij ) Is a kronecker function with coefficient values of
Figure FDA0003928272900000022
4. The partition optimization method for the ultra-large-scale power business network based on graph node feature pre-calculation is characterized in that in the step (6), the coercion coefficient of any scalar feature is calculated as follows:
Figure FDA0003928272900000023
wherein r is b Representing the degree of connectivity between homogeneous nodes of the E-commerce network on the b-th scalar characteristic, d i Representing a node v i The degree in the adjacency matrix a is,x i representing a node v i At the value of the b-th scalar feature, δ (x) i ,x j ) Is a kronecker function with coefficient values of
Figure FDA0003928272900000024
5. The partition optimization method for the ultra-large-scale power-grid network based on graph node feature pre-calculation is characterized in that in the step (7.1), the partition quality score is calculated in the following way:
Score(v i ,p)=max{sim ENUM (v i ,v j )+sim sCALAR (v i ,v j )-sim DEGREE (v i ,v j )}+αγ|p| γ-1
where p denotes the partition of the distributed server, sim ENUM (v i ,v j ) And sim SCALAR (v i ,v j ) Respectively representing the similarity of two nodes on the enumeration feature ENUM and the SCALAR feature SCALAR, and calculating the similarity in the following way:
Figure FDA0003928272900000025
wherein, F i Representing point v i Vector representation on feature F; feature F represents an enumerated feature ENUM or a SCALAR feature SCALAR;
partitioning sim in quality score DEGREE (v i ,v j ) Representing the similarity of two nodes in terms of node degree, and the calculation mode is as follows:
Figure FDA0003928272900000026
wherein d represents a pair node v i And v j Number of randomly selected neighbor nodes, A id Representing the value of the adjacency matrix a with row coordinate i and ordinate d,
Figure FDA0003928272900000027
represents the mean value ≥ of the element from the ith row in the adjacency matrix>
Figure FDA0003928272900000028
In dividing quality scores
Figure FDA0003928272900000031
Where γ is the influence coefficient of adjusting partition load on the partition quality score.
6. The partition optimization method for the ultra-large-scale electric business network based on graph node feature pre-calculation is characterized in that in the step (7.1):
if the load of the current distributed server partition is 0, the node v i The partition quality score at the distributed server partition contains only the load cost score α γ | p i | γ-1
If the number of the load nodes of the current distributed server partition is more than 0 and less than 3, the node v j Dividing the quality score into nodes v in the distributed server partition j The mean value of the partition quality scores of all the nodes partitioned to the distributed server;
if the number of the load nodes of the current distributed server partition is more than or equal to 3, the node v j Dividing the quality score into nodes v in the distributed server partition j The mean of the partition quality scores for all the central nodes partitioned to the distributed server.
7. The partition optimization method for the ultra-large-scale power business network based on graph node feature pre-calculation as claimed in claim 1, wherein in the step (7.3), the partition p is updated des Set of central nodes s des The DBSCAN algorithm is adopted, and the specific process is as follows:
(a) Setting the value of MinPts in DBSCAN algorithm to be
Figure FDA0003928272900000032
The radius is 1;
(b) For any node in the partition, if the number of neighbor nodes of the node is more than or equal to
Figure FDA0003928272900000033
Adding the point into the central node set of the current partition;
(c) And (c) repeatedly executing the step (b) until all the nodes in the current partition are traversed.
8. An ultra-large scale electricity business network partition optimization device based on graph node characteristic pre-calculation, which comprises a memory and one or more processors, wherein the memory stores executable codes, and the processors are used for implementing the steps of the ultra-large scale electricity business network partition optimization method based on graph node characteristic pre-calculation according to any one of claims 1 to 7 when executing the executable codes.
9. A computer-readable storage medium, on which a program is stored, which program, when being executed by a processor, carries out the steps of a method for very large scale corporate network partition optimization based on graph node feature pre-computation according to any of the claims 1-7.
CN202211381008.8A 2022-11-05 2022-11-05 Ultra-large-scale e-commerce network partition optimization method and device based on graph node feature pre-calculation Pending CN115858145A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211381008.8A CN115858145A (en) 2022-11-05 2022-11-05 Ultra-large-scale e-commerce network partition optimization method and device based on graph node feature pre-calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211381008.8A CN115858145A (en) 2022-11-05 2022-11-05 Ultra-large-scale e-commerce network partition optimization method and device based on graph node feature pre-calculation

Publications (1)

Publication Number Publication Date
CN115858145A true CN115858145A (en) 2023-03-28

Family

ID=85662590

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211381008.8A Pending CN115858145A (en) 2022-11-05 2022-11-05 Ultra-large-scale e-commerce network partition optimization method and device based on graph node feature pre-calculation

Country Status (1)

Country Link
CN (1) CN115858145A (en)

Similar Documents

Publication Publication Date Title
Liu et al. A communication efficient collaborative learning framework for distributed features
US11227190B1 (en) Graph neural network training methods and systems
WO2020147612A1 (en) Graph-based convolution network training method, apparatus and system
US20120330864A1 (en) Fast personalized page rank on map reduce
CN110765320B (en) Data processing method, device, storage medium and computer equipment
US20220138502A1 (en) Graph neural network training methods and systems
WO2022116689A1 (en) Graph data processing method and apparatus, computer device and storage medium
CN111461164B (en) Sample data set capacity expansion method and model training method
CN110719106B (en) Social network graph compression method and system based on node classification and sorting
Ye et al. Variable selection via penalized neural network: a drop-out-one loss approach
CN114579584B (en) Data table processing method and device, computer equipment and storage medium
CN112100450A (en) Graph calculation data segmentation method, terminal device and storage medium
CN104281664A (en) Data segmenting method and system of distributed graph calculating system
CN110674181B (en) Information recommendation method and device, electronic equipment and computer-readable storage medium
CN115293919A (en) Graph neural network prediction method and system oriented to social network distribution generalization
CN111028092A (en) Community discovery method based on Louvain algorithm, computer equipment and readable storage medium thereof
CN108427773B (en) Distributed knowledge graph embedding method
CN113792170B (en) Graph data dividing method and device and computer equipment
CN115858145A (en) Ultra-large-scale e-commerce network partition optimization method and device based on graph node feature pre-calculation
Ko et al. On data summarization for machine learning in multi-organization federations
US11709798B2 (en) Hash suppression
CN113722554A (en) Data classification method and device and computing equipment
CN110309367B (en) Information classification method, information processing method and device
CN113822768A (en) Community network processing method, device, equipment and storage medium
Kollias et al. Sketch to skip and select: Communication efficient federated learning using locality sensitive hashing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination