CN110032605B - Method and system for acquiring connection relation characteristics among users in social network - Google Patents

Method and system for acquiring connection relation characteristics among users in social network Download PDF

Info

Publication number
CN110032605B
CN110032605B CN201910233454.6A CN201910233454A CN110032605B CN 110032605 B CN110032605 B CN 110032605B CN 201910233454 A CN201910233454 A CN 201910233454A CN 110032605 B CN110032605 B CN 110032605B
Authority
CN
China
Prior art keywords
reservoir
edge
basic
users
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910233454.6A
Other languages
Chinese (zh)
Other versions
CN110032605A (en
Inventor
王芳
冯丹
张玲玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201910233454.6A priority Critical patent/CN110032605B/en
Publication of CN110032605A publication Critical patent/CN110032605A/en
Application granted granted Critical
Publication of CN110032605B publication Critical patent/CN110032605B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a social contactA method and a system for acquiring the connection relation characteristics among users in a network belong to the field of big data processing and comprise the following steps: establishing a streaming graph according to the relation among users in the social network to be processed; establishing two data structures for storage sides, wherein the data structures are respectively called a basic reservoir and a gradual increase reservoir; the number of storable sides of the basic impounding reservoir is fixed as c, and the number of storable sides of the gradually increased impounding reservoirs is dynamically increased; traversing the flow chart, leading the 1 st edge to the c th edge to enter the basic reservoir according to the probability of 1, and leading the c +1 th edge to the N th edge to enter the basic reservoir
Figure DDA0002904944290000011
And
Figure DDA0002904944290000012
the probability of the step (2) is entered into a basic reservoir and a gradual increase reservoir, the edge in the basic reservoir is replaced with equal probability when the replacement occurs, and the edge in the gradual increase reservoir is not replaced; and traversing the union of the edge sets in the basic reservoir and the increasing reservoir to obtain the degree of each vertex in the sample edge set, thereby obtaining the connection relation characteristics between the users in the social network. The invention can obtain the connection relation characteristics among the entities in the relation network.

Description

Method and system for acquiring connection relation characteristics among users in social network
Technical Field
The invention belongs to the field of big data processing, and particularly relates to a method and a system for acquiring connection relation characteristics between entities in a relation network.
Background
With the advent of the big data age, there are complex relationships between people, objects, and people, and thus various complex relationship networks are formed. For example, a structure diagram formed by the relationship between molecules constituting a substance, a social network formed by the relationship between users, a computer network formed by the communication relationship between computers, and the like belong to the relationship network.
The graph is an important data structure and can be conveniently used for expressing the connection relationship between entities (people, things and the like) in a relationship network. As the amount of data in an application increases, a structure such as a stream graph is generally used in practical applications to store and process the data. The edges of the stream graph are stored in the computer, and each edge is composed of two connected entities (vertexes) in the corresponding relationship network. However, the edges in the stream graph are not isolated, and there is a connection relationship between the edges. For example, in a social network graph, user a and user b are friends of each other, i.e., (a, b) is an edge in a streaming graph; meanwhile, the user a and the user c are in a friend relationship with each other, that is, (a, c) is another edge in the streaming graph; edges (a, b) and (a, c) share a common vertex a, i.e. the two edges are connected. The stream graph is established according to the relationship network, and the connection relationship characteristics of the edges in the stream graph are mined, so that the connection relationship characteristics between the entities in the relationship network can be obtained, and more effective information is provided for related applications. For example, in a social network, link-like features of edges in a streaming graph may be used to determine whether the social network can be used for promotion of a certain product or to program a related advertising fee. Specifically, if the connection between the edges in the streaming graph is very tight, which means that the connection between the users in the social network is very tight, the advertisement published in the social network will have more audience groups and will bring better advertisement revenue. On the other hand, if the connection relationship between users of a certain social network is sparse, advertisements can not be placed on the social network to achieve the expected advertisement benefit.
Because the structure of the relational network is complex, the side data volume in the stream graph established according to the relational network is large, and the processing based on the whole stream graph brings challenges to computing resources and storage resources. To solve this problem, after obtaining the stream graph corresponding to the relationship network, the existing method often adopts a single sampling method based on the water reservoir to obtain the characteristics of the corresponding stream graph, in which each edge is processed only once, and the sampled edge set is stored in a data structure called the water reservoir. Because the method does not consider the specific connection relation of the edge sets, the edges obtained by sampling are often isolated edges, and therefore the sample edge sets with the connection relation cannot be obtained. In addition, due to the fixed capacity of the water reservoir in the method, the acquired samples with the connection relationship are possibly replaced by samples without the connection relationship later. Therefore, in the existing method for processing the relational network based on the flow chart, the flow chart is processed by adopting a single sampling method based on the water reservoir, so that the connection relation characteristics among the entities in the relational network cannot be acquired.
Disclosure of Invention
Aiming at the defects and the improvement requirements of the prior art, the invention provides a method and a system for acquiring the connection relationship characteristics between entities in a relationship network, and aims to acquire the connection relationship characteristics between the entities in the relationship network.
In order to achieve the above object, according to a first aspect of the present invention, there is provided a method for obtaining connection relationship characteristics between entities in a relationship network, including:
establishing a flow chart according to the relation between entities in a relation network to be processed;
establishing two data structures for storing the edges of the streaming graph, wherein the two data structures are respectively called a basic reservoir and an increasing reservoir; the number of storable sides of the basic impounding reservoir is fixed as c, and the number of storable sides of the gradually increased impounding reservoirs is dynamically increased;
traversing the flow chart, leading the 1 st edge to the c th edge to enter the basic reservoir according to the probability of 1, and leading the c +1 th edge to the N th edge to enter the basic reservoir
Figure GDA0002904944280000031
And
Figure GDA0002904944280000032
the side that has entered the basal reservoir is replaced by the newly entered side with equal probability when the basal reservoir reaches the maximum storage capacity, and the side that has entered the increasing reservoir is not replaced;
taking a union of edge sets in the basic water reservoir and the increasing water reservoir as a sample edge set, traversing the sample edge set to obtain the degree of each vertex in the sample edge set, and thus obtaining the connection relation characteristics between the entity corresponding to each vertex and other entities in the relation network;
wherein N represents the total number of edges in the flow chart, and N is more than c; i represents the edge sequence number in the flow chart, i belongs to { c +1, c +2,. N };
Figure GDA0002904944280000033
and
Figure GDA0002904944280000034
respectively representing the probability that the ith side enters the basic reservoir and the increasing reservoir;
Figure GDA0002904944280000035
and the greater the edge sequence number, the greater the probability
Figure GDA0002904944280000036
The smaller; if the ith side does not enter the basic reservoir and can form a triangle with the side in the basic reservoir, then
Figure GDA0002904944280000037
If not, then,
Figure GDA0002904944280000038
according to the method for acquiring the connection relation characteristics between the entities in the relational network, after the flow diagram is established according to the relational network, the edges in the flow diagram are stored by using two water reservoir structures so as to finish sampling of the flow diagram, wherein a basic water reservoir is used for storing an edge set obtained by using a water reservoir sampling method, and an increasing water reservoir is used for storing the edges which have the connection relation with the edges in the basic water reservoir.
Further, the method for obtaining the connection relationship characteristics between the entities in the relationship network provided by the present invention further includes:
and estimating the total quantity of triangles in the flow chart according to the edge set in the basic water reservoir, wherein the total quantity of triangles in the flow chart is used for reflecting the connection relation characteristics between the entities in the relation network from the aspect of quantity.
Further, the air conditioner is provided with a fan,
Figure GDA0002904944280000039
further, if the ith side enters the basic reservoir, one side which has entered the basic reservoir is replaced;
wherein the probability that each side in the basic reservoir is replaced is
Figure GDA0002904944280000041
Further, if the ith side does not enter the basal reservoir and can form a triangle with the side in the basal reservoir, the more sides of the first i sides of the flow chart which do not enter the basal reservoir and can form a triangle with the side in the basal reservoir, the more probability that the ith side enters the increasing reservoir
Figure GDA0002904944280000042
The smaller;
because different relational networks have different complexity and the corresponding flow diagrams have different characteristics, if the flow diagrams corresponding to the relational networks are dense, a large number of sides meet the condition of entering the increasing reservoir, and conversely, if the flow diagrams corresponding to the relational networks are sparse, only a small number of sides meet the condition of the increasing reservoir, the probability of entering the increasing reservoir is controlled by the method, the probability of entering the increasing reservoir is lower for the sides in the dense flow diagrams, and the probability of entering the increasing reservoir is higher for the sides in the sparse flow diagrams, so that the phenomenon that a large number of calculation and storage resources are consumed due to too many sides in the increasing reservoir can be effectively avoided, and meanwhile, the phenomenon that the sides in the increasing reservoir are too few and the connection relationship of the sides in the flow diagrams cannot be accurately obtained can be effectively avoided.
AsFurther preferably, if the ith side does not enter the basal reservoir and can form a triangle with the side in the basal reservoir, the probability that the ith side enters the increasing reservoir is
Figure GDA0002904944280000043
Wherein, tiRepresenting the number of sides of the first i sides of the flow graph that do not enter the basal reservoir and can form a triangle with the sides in the basal reservoir.
According to a second aspect of the present invention, there is provided a system for acquiring connection relationship characteristics between entities in a relationship network, including: the device comprises a flow chart establishing module, a reservoir establishing module, a sampling module and a characteristic acquiring module;
the stream graph establishing module is used for establishing a stream graph according to the relation between the entities in the relation network to be processed;
the reservoir establishing module is used for establishing two data structures for storing the edges of the flow chart, wherein the data structures are respectively called a basic reservoir and a gradually-increased reservoir; the number of storable sides of the basic impounding reservoir is fixed as c, and the number of storable sides of the gradually increased impounding reservoirs is dynamically increased;
the sampling module is used for traversing the flow chart, so that the 1 st edge to the c th edge enter the basic reservoir according to the probability of 1, and the c +1 th edge to the N th edge enter the basic reservoir
Figure GDA0002904944280000051
And
Figure GDA0002904944280000052
the side that has entered the basal reservoir is replaced by the newly entered side with equal probability when the basal reservoir reaches the maximum storage capacity, and the side that has entered the increasing reservoir is not replaced;
the characteristic acquisition module is used for taking a union set of edge sets in the basic reservoir and the increasing reservoir as a sample edge set, traversing the sample edge set to obtain the degree of each vertex in the sample edge set, and acquiring the connection relation characteristics between the entity corresponding to each vertex and other entities in the relation network;
wherein N represents the total number of edges in the flow chart, and N is more than c; i represents the edge sequence number in the flow chart, i belongs to { c +1, c +2,. N };
Figure GDA0002904944280000053
and
Figure GDA0002904944280000054
respectively representing the probability that the ith side enters the basic reservoir and the increasing reservoir;
Figure GDA0002904944280000055
and the greater the edge sequence number, the greater the probability
Figure GDA0002904944280000056
The smaller; if the ith side does not enter the basic reservoir and can form a triangle with the side in the basic reservoir, then
Figure GDA0002904944280000057
If not, then,
Figure GDA0002904944280000058
generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
(1) according to the method for acquiring the connection relation characteristics between the entities in the relational network, after the flow diagram is established according to the relational network, the edges in the flow diagram are stored by using two water reservoir structures so as to finish sampling of the flow diagram, wherein a basic water reservoir is used for storing an edge set obtained by using a water reservoir sampling method, and an increasing water reservoir is used for storing the edges which have the connection relation with the edges in the basic water reservoir.
(2) The method for acquiring the connection relation characteristics among the entities in the relational network, provided by the invention, has the advantages that the basic water reservoir is utilized to store the edge set obtained by adopting the water reservoir sampling method, and on the basis, the total number of triangles in the flow chart can be estimated, so that the connection relation characteristics among the entities in the corresponding relational network can be acquired from the aspect of quantity.
(3) According to the method for acquiring the connection relation characteristics among the entities in the relation network, the probability of entering the increasing reservoir in the flow chart is controlled according to the characteristics of the flow chart, so that the probability of entering the increasing reservoir by the sides in the dense flow chart is smaller, and the probability of entering the increasing reservoir by the sides in the sparse flow chart is larger, therefore, the situation that a large amount of calculation and storage resources are consumed due to too many sides in the increasing reservoir can be effectively avoided, and meanwhile, the situation that the connection relation among the sides in the flow chart cannot be accurately acquired due to too few sides in the increasing reservoir can be effectively avoided.
Drawings
Fig. 1 is a schematic diagram illustrating a method for obtaining connection relationship characteristics between entities in a relationship network according to an embodiment of the present invention;
fig. 2(a) is a schematic diagram of a total amount of triangles in the streaming type fig. 1, which is obtained by using the method for obtaining characteristics of connection relationships between entities in a relationship network provided by the present invention;
fig. 2(b) is a schematic diagram of the degree characteristic of the vertex in the streaming type fig. 1, which is obtained by using the method for obtaining the connection relationship characteristic between the entities in the relationship network provided by the present invention;
fig. 2(c) is a schematic diagram of the total amount of triangles in the streaming type fig. 2, which is obtained by using the method for obtaining the characteristics of the connection relationship between the entities in the relationship network provided by the present invention;
fig. 2(d) is a schematic diagram of the degree characteristic of the vertex in the streaming type fig. 2, which is acquired by using the method for acquiring the connection relationship characteristic between the entities in the relationship network provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
In a first embodiment of the present invention, a method for obtaining a connection relationship characteristic between entities in a relationship network, as shown in fig. 1, includes:
establishing a flow chart according to the relation between entities in a relation network to be processed;
establishing two data structures for storing the edges of the streaming graph, wherein the two data structures are respectively called a basic reservoir and an increasing reservoir; the number of storable sides of the basic impounding reservoir is fixed as c, and the number of storable sides of the gradually increased impounding reservoirs is dynamically increased; specifically, the basic reservoir may be a static array with a length of C, the incremental reservoir may be a dynamic array with an indefinite length, and may be implemented by a map data structure, a dynamic linked list structure or other data structures in C + +; in FIG. 1, e1、e2、……ec、……eiEach represents an edge of the flow chart, and the subscripts represent corresponding edge serial numbers;
traversing the flow chart, leading the 1 st edge to the c th edge to enter the basic reservoir according to the probability of 1, and leading the c +1 th edge to the N th edge to enter the basic reservoir
Figure GDA0002904944280000071
And
Figure GDA0002904944280000072
the side that has entered the basal reservoir is replaced by the newly entered side with equal probability when the basal reservoir reaches the maximum storage capacity, and the side that has entered the increasing reservoir is not replaced; in the present embodiment, it is preferred that,
Figure GDA0002904944280000073
if the ith side enters the basic reservoir, replacing one side which has entered the basic reservoir; base ofThe probability that each side in the foundation reservoir is replaced is
Figure GDA0002904944280000074
Taking a union of edge sets in the basic water reservoir and the increasing water reservoir as a sample edge set, traversing the sample edge set to obtain the degree of each vertex in the sample edge set, and thus obtaining the connection relation characteristics between the entity corresponding to each vertex and other entities in the relation network;
wherein N represents the total number of edges in the flow chart, and N is more than c; i represents the edge sequence number in the flow chart, i belongs to { c +1, c +2,. N };
Figure GDA0002904944280000075
and
Figure GDA0002904944280000076
respectively representing the probability that the ith side enters the basic reservoir and the increasing reservoir;
Figure GDA0002904944280000077
and the greater the edge sequence number, the greater the probability
Figure GDA0002904944280000078
The smaller; if the ith side does not enter the basic reservoir and can form a triangle with the side in the basic reservoir, then
Figure GDA0002904944280000079
If not, then,
Figure GDA0002904944280000081
according to the method for acquiring the connection relation characteristics between the entities in the relational network, after the flow diagram is established according to the relational network, the edges in the flow diagram are stored by using two water reservoir structures so as to finish sampling of the flow diagram, wherein a basic water reservoir is used for storing an edge set obtained by using a water reservoir sampling method, and an increasing water reservoir is used for storing the edges which have the connection relation with the edges in the basic water reservoir.
In an alternative embodiment, if the ith (i e { c +1, c +2,. N }) edge does not enter the basal reservoir and can form a triangle with the edge in the basal reservoir, the more edges of the first i edges of the stream graph that do not enter the basal reservoir and can form a triangle with the edge in the basal reservoir, the more probability that the ith edge enters the increasing reservoir
Figure GDA0002904944280000082
The smaller; in this embodiment, if the ith side does not enter the basal reservoir and can form a triangle with the side in the basal reservoir, the probability that the ith side enters the increasing reservoir is
Figure GDA0002904944280000083
Wherein, tiRepresenting the number of sides of the first i sides of the flow chart, which do not enter the basal reservoir and can form a triangle with the sides in the basal reservoir;
because different relational networks have different complexity and the corresponding flow diagrams have different characteristics, if the flow diagrams corresponding to the relational networks are dense, a large number of sides meet the condition of entering the increasing reservoir, and conversely, if the flow diagrams corresponding to the relational networks are sparse, only a small number of sides meet the condition of the increasing reservoir, the probability of entering the increasing reservoir is controlled by the method, the probability of entering the increasing reservoir is lower for the sides in the dense flow diagrams, and the probability of entering the increasing reservoir is higher for the sides in the sparse flow diagrams, so that the phenomenon that a large number of calculation and storage resources are consumed due to too many sides in the increasing reservoir can be effectively avoided, and meanwhile, the phenomenon that the sides in the increasing reservoir are too few and the connection relationship of the sides in the flow diagrams cannot be accurately obtained can be effectively avoided.
In a second embodiment of the present invention, a method for acquiring a connection relationship characteristic between entities in a relational network, provided by the present invention, is similar to the first embodiment, except that in addition to the above steps, the method for acquiring a connection relationship characteristic between entities in a relational network, provided by the second embodiment of the present invention, further includes:
estimating the total quantity of triangles in the flow chart according to the edge set in the basic water reservoir, wherein the total quantity of triangles in the flow chart is used for describing the connection relation characteristics between the entities in the relation network from the aspect of quantity;
the specific estimation method can adopt the existing Triest algorithm, GSH (graph sample and hold) algorithm, or other methods.
The invention also provides a system for acquiring the connection relationship characteristics between the entities in the relationship network, which comprises the following steps: the device comprises a flow chart establishing module, a reservoir establishing module, a sampling module and a characteristic acquiring module;
the stream graph establishing module is used for establishing a stream graph according to the relation between the entities in the relation network to be processed;
the reservoir establishing module is used for establishing two data structures for storing the edges of the flow chart, wherein the data structures are respectively called a basic reservoir and a gradually-increased reservoir; the number of storable sides of the basic impounding reservoir is fixed as c, and the number of storable sides of the gradually increased impounding reservoirs is dynamically increased;
the sampling module is used for traversing the flow chart, so that the 1 st edge to the c th edge enter the basic reservoir according to the probability of 1, and the c +1 th edge to the N th edge enter the basic reservoir
Figure GDA0002904944280000091
And
Figure GDA0002904944280000092
the side that has entered the basal reservoir is replaced by the newly entered side with equal probability when the basal reservoir reaches the maximum storage capacity, and the side that has entered the increasing reservoir is not replaced;
the characteristic acquisition module is used for taking a union set of edge sets in the basic reservoir and the increasing reservoir as a sample edge set, traversing the sample edge set to obtain the degree of each vertex in the sample edge set, and acquiring the connection relation characteristics between the entity corresponding to each vertex and other entities in the relation network;
wherein N represents the total number of edges in the flow chart, and N is more than c; i represents the edge sequence number in the flow chart, i belongs to { c +1, c +2,. N };
Figure GDA0002904944280000093
and
Figure GDA0002904944280000094
respectively representing the probability that the ith side enters the basic reservoir and the increasing reservoir;
Figure GDA0002904944280000095
and the greater the edge sequence number, the greater the probability
Figure GDA0002904944280000096
The smaller; if the ith side does not enter the basic reservoir and can form a triangle with the side in the basic reservoir, then
Figure GDA0002904944280000097
If not, then,
Figure GDA0002904944280000101
in this embodiment, the detailed implementation of each module can refer to the description in the above method embodiment, and will not be repeated here.
For two existing relational networks, by utilizing the method and the system for acquiring the connection relation characteristics between the entities in the relational networks, which are provided by the invention, corresponding flow charts (a flow chart 1 and a flow chart 2) are respectively established, and the characteristics of the total number of triangles and the vertex degree in the flow charts are acquired so as to acquire the connection relation characteristics between the entities in the corresponding relational networks. For the streaming diagram 1, the obtained total number of triangles and the degree feature of the vertex are shown in fig. 2(a) and fig. 2(b), respectively, and for the streaming diagram 2, the obtained total number of triangles and the degree feature of the vertex are shown in fig. 2(c) and fig. 2(d), respectively. From the results shown in fig. 2, it can be seen that the total number of triangles in the flow chart with different features may be equal. The conventional reservoir sampling method can only obtain the total quantity of triangles in the flow chart, and the connection relation characteristics between the entities in the corresponding relation network cannot be accurately obtained only by using the information.
Application example 1:
by utilizing the method and the system for acquiring the connection relationship characteristics between the entities in the relationship network, the connection relationship characteristics between molecules in the molecular structure diagram can be acquired. In a flow chart created from a molecular structure diagram, molecules correspond to vertices in the flow chart, and connections between molecules constituting the same substance correspond to one side in the flow chart. By utilizing the acquired connection relation characteristics between the molecules, the specific composition of the substance can be acquired from the aspect of the connection relation of the molecules.
Application example 2:
by using the method and the system for acquiring the connection relationship characteristics among the entities in the relationship network, the connection relationship characteristics among the users in the social network can be acquired. According to the stream graph established by the social network, the user corresponds to the vertex in the stream graph, and the connection between the users in friend relationship corresponds to one edge in the stream graph. By utilizing the acquired connection relation characteristics between the users, the closeness degree of the relation between the users in the social network can be acquired, so that an effective decision basis is provided for product promotion and advertisement release.
Application example 3:
by using the method and the system for acquiring the connection relationship characteristics between the entities in the relationship network, the communication relationship characteristics between the computers in the computer network can be acquired. In a streaming graph established from a computer network, a node corresponds to a vertex in the streaming graph and a connection between two nodes in communication corresponds to an edge in the streaming graph. The acquired communication relation characteristics between the computers can be utilized to effectively control the flow in the computer network, for example, if edges exist between a certain vertex and other vertices, the existing network flow scheduling strategy can be used to allocate a longer time for occupying bandwidth to the node.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (7)

1. A method for acquiring characteristics of connection relations among users in a social network is characterized by comprising the following steps:
establishing a streaming graph according to the relation between users in the social network to be processed; the users in the social network correspond to the vertexes in the stream graph, and the connection between the users who are in friend relationship in the social network corresponds to one edge in the stream graph;
establishing two data structures for storing the edges of the streaming graph, namely a basic reservoir and an increasing reservoir; the number of storable sides of the basic impounding reservoir is fixed as c, and the number of storable sides of the gradually increasing impounding reservoirs is dynamically increased;
traversing the flow chart, so that the 1 st edge to the c th edge enter the basic impounding reservoir with the probability of 1, and the c +1 th edge to the N th edge enter the basic impounding reservoir with the probability of 1
Figure FDA0002904944270000011
And
Figure FDA0002904944270000012
the side that has entered the basal reservoir is replaced by the newly entered side with equal probability when the basal reservoir reaches the maximum storage capacity, and the side that has entered the increasing reservoir is not replaced;
taking a union of the edge sets in the basic reservoir and the increasing reservoir as a sample edge set, and traversing the sample edge set to obtain the degree of each vertex in the sample edge set, so as to obtain the connection relation characteristics between the user corresponding to each vertex and other users in the social network;
wherein N represents the total number of edges in the flow graph, and N > c; i represents an edge sequence number in the flow chart, and i belongs to { c +1, c +2,. N };
Figure FDA0002904944270000013
and
Figure FDA0002904944270000014
respectively representing the probability that the ith edge enters the basic reservoir and the increasing reservoir;
Figure FDA0002904944270000015
and the greater the edge sequence number, the greater the probability
Figure FDA0002904944270000016
The smaller; if the ith side does not enter the basic water reservoir and can form a triangle with the side in the basic water reservoir, then
Figure FDA0002904944270000017
If not, then,
Figure FDA0002904944270000018
2. the method for obtaining characteristics of connection relationships between users in a social network according to claim 1, further comprising:
estimating the total amount of triangles in the flow graph according to the edge sets in the basal water reservoir.
3. As claimed in claim 1 or 2The method for obtaining characteristics of connection relationships between users in a social network, characterized in that,
Figure FDA0002904944270000021
4. the method for acquiring characteristics of connection relationships between users in a social network according to claim 1 or 2, wherein if the ith edge enters the basal reservoir, one edge which has entered the basal reservoir is replaced;
wherein the probability that each side in the basic reservoir is replaced is
Figure FDA0002904944270000022
5. The method for acquiring characteristics of connection relationships between users in social networks according to claim 1 or 2, wherein if the ith edge does not enter the basal reservoir and can form a triangle with the edge in the basal reservoir, the more edges of the first i edges of the stream graph that do not enter the basal reservoir and can form a triangle with the edge in the basal reservoir, the more probability that the ith edge enters the increasing reservoir
Figure FDA0002904944270000023
The smaller.
6. The method for obtaining characteristics of connection relationships between users in social network according to claim 4, wherein if the ith edge does not enter the basal reservoir and can form a triangle with the edge in the basal reservoir, the probability that the ith edge enters the increasing reservoir is
Figure FDA0002904944270000024
Wherein, tiIndicating that none of the top i edges of the flow graph enter the baseThe number of sides of the reservoir and capable of forming a triangle with the sides in the basal reservoir.
7. A system for acquiring characteristics of connection relationships among users in a social network is characterized by comprising: the device comprises a flow chart establishing module, a reservoir establishing module, a sampling module and a characteristic acquiring module;
the streaming graph establishing module is used for establishing a streaming graph according to the relation between users in the social network to be processed; the users in the social network correspond to the vertexes in the stream graph, and the connection between the users who are in friend relationship in the social network corresponds to one edge in the stream graph;
the reservoir establishing module is used for establishing two data structures for storing the edges of the flow chart, wherein the data structures are respectively called a basic reservoir and an increasing reservoir; the number of storable sides of the basic impounding reservoir is fixed as c, and the number of storable sides of the gradually increasing impounding reservoirs is dynamically increased;
the sampling module is used for traversing the flow chart, so that the 1 st edge to the c th edge enter the basic reservoir with the probability of 1, and the c +1 th edge to the N th edge enter the basic reservoir with the probability of 1
Figure FDA0002904944270000031
And
Figure FDA0002904944270000032
the side that has entered the basal reservoir is replaced by the newly entered side with equal probability when the basal reservoir reaches the maximum storage capacity, and the side that has entered the increasing reservoir is not replaced;
the characteristic acquisition module is used for taking a union of edge sets in the basic reservoir and the increasing reservoir as a sample edge set, traversing the sample edge set to obtain the degree of each vertex in the sample edge set, and acquiring the connection relation characteristics between the user corresponding to each vertex and other users in the social network;
wherein N represents the total number of edges in the flow graph, and N > c; i represents an edge sequence number in the flow chart, and i belongs to { c +1, c +2,. N };
Figure FDA0002904944270000033
and
Figure FDA0002904944270000034
respectively representing the probability that the ith edge enters the basic reservoir and the increasing reservoir;
Figure FDA0002904944270000035
and the greater the edge sequence number, the greater the probability
Figure FDA0002904944270000036
The smaller; if the ith side does not enter the basic water reservoir and can form a triangle with the side in the basic water reservoir, then
Figure FDA0002904944270000037
If not, then,
Figure FDA0002904944270000038
CN201910233454.6A 2019-03-26 2019-03-26 Method and system for acquiring connection relation characteristics among users in social network Active CN110032605B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910233454.6A CN110032605B (en) 2019-03-26 2019-03-26 Method and system for acquiring connection relation characteristics among users in social network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910233454.6A CN110032605B (en) 2019-03-26 2019-03-26 Method and system for acquiring connection relation characteristics among users in social network

Publications (2)

Publication Number Publication Date
CN110032605A CN110032605A (en) 2019-07-19
CN110032605B true CN110032605B (en) 2021-04-06

Family

ID=67236677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910233454.6A Active CN110032605B (en) 2019-03-26 2019-03-26 Method and system for acquiring connection relation characteristics among users in social network

Country Status (1)

Country Link
CN (1) CN110032605B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114389961B (en) * 2022-01-14 2024-03-08 北京中科通量科技有限公司 Graph flow triangle counting method and device based on node heat sampling

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ES2488872B2 (en) * 2013-01-29 2015-02-23 Universidad De Sevilla Device for the evaluation of the maturity of grape seeds through digitalization of images
CN105005586A (en) * 2015-06-24 2015-10-28 华中科技大学 Degree feature replacement policy based stream type graph sampling method
CN106100921A (en) * 2016-06-08 2016-11-09 华中科技大学 The dynamic streaming figure parallel samples method synchronized based on dot information
CN107786388A (en) * 2017-09-26 2018-03-09 西安交通大学 A kind of abnormality detection system based on large scale network flow data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ES2488872B2 (en) * 2013-01-29 2015-02-23 Universidad De Sevilla Device for the evaluation of the maturity of grape seeds through digitalization of images
CN105005586A (en) * 2015-06-24 2015-10-28 华中科技大学 Degree feature replacement policy based stream type graph sampling method
CN106100921A (en) * 2016-06-08 2016-11-09 华中科技大学 The dynamic streaming figure parallel samples method synchronized based on dot information
CN107786388A (en) * 2017-09-26 2018-03-09 西安交通大学 A kind of abnormality detection system based on large scale network flow data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"在线社交网络用户影响力关系模型构建";何磊明;《中国优秀硕士学位论文全文数据库》;20160831;I141-69 *

Also Published As

Publication number Publication date
CN110032605A (en) 2019-07-19

Similar Documents

Publication Publication Date Title
US8922559B2 (en) Graph clustering
CN109635989B (en) Social network link prediction method based on multi-source heterogeneous data fusion
US20200082318A1 (en) Method and device for determining delivery region
CN112181971A (en) Edge-based federated learning model cleaning and equipment clustering method, system, equipment and readable storage medium
WO2018090545A1 (en) Time-factor fusion collaborative filtering method, device, server and storage medium
CN107705213B (en) Overlapped community discovery method of static social network
CN102571431B (en) Group concept-based improved Fast-Newman clustering method applied to complex network
WO2020042579A1 (en) Group classification method and device, electronic device, and storage medium
WO2017045415A1 (en) Content delivery method and device
CN102457501A (en) Identification method and system for instant messaging account
CN110032605B (en) Method and system for acquiring connection relation characteristics among users in social network
CN111310074A (en) Interest point label optimization method and device, electronic equipment and computer readable medium
WO2014101507A1 (en) On-line user distribution processing method, device and storage medium
Soundarajan et al. MaxReach: Reducing network incompleteness through node probes
CN111581442A (en) Method and device for realizing graph embedding, computer storage medium and terminal
US10614034B2 (en) Crowd sourced data sampling at the crowd
CN103945238A (en) Community detection method based on user behaviors
CN110097581B (en) Method for constructing K-D tree based on point cloud registration ICP algorithm
US10699298B2 (en) Method and system for selecting a highest value digital content
CN106100921B (en) Dynamic flow chart parallel sampling method based on point information synchronization
CN111241424A (en) Social network pattern mining method based on random walk sampling
CN115935080A (en) Social network flow data oriented MPICH parallel computing-based maximum cluster enumeration method
CN109918543B (en) Link prediction method for nodes in graph flow
CN113822768B (en) Method, device, equipment and storage medium for processing community network
CN110598122A (en) Social group mining method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant