CN114722242A - Binary counting type summarization method and device based on graph data stream and computer equipment - Google Patents

Binary counting type summarization method and device based on graph data stream and computer equipment Download PDF

Info

Publication number
CN114722242A
CN114722242A CN202210248361.2A CN202210248361A CN114722242A CN 114722242 A CN114722242 A CN 114722242A CN 202210248361 A CN202210248361 A CN 202210248361A CN 114722242 A CN114722242 A CN 114722242A
Authority
CN
China
Prior art keywords
bucket
dimensional array
directed edge
weighted
updated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210248361.2A
Other languages
Chinese (zh)
Inventor
符永铨
陈磊
葛可适
苏华友
姜晶菲
黄震
李东升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202210248361.2A priority Critical patent/CN114722242A/en
Publication of CN114722242A publication Critical patent/CN114722242A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9014Indexing; Data structures therefor; Storage structures hash tables

Abstract

The application relates to a binary counting type summarization method and device based on graph data stream, computer equipment and storage medium. The method comprises the following steps: constructing a two-dimensional array abstract structure; inserting a weighted directed edge into the two-dimensional array abstract structure, respectively calculating a source vertex and a target vertex of the weighted directed edge by using two hash functions to obtain a row index and a column index of a bucket of the weighted directed edge, then finding a corresponding bucket in the two-dimensional array abstract structure, and updating a counter of the corresponding bucket by using an algebra sum rule and a weighted algebra sum rule to obtain an updated two-dimensional array abstract structure; and respectively calculating weight predicted values according to the algebra and rules and the counters of the buckets updated by the weighted algebra and rules, and updating the updated two-dimensional array abstract structure by using the obtained weight predicted values of the directed edges to obtain a binary counting abstract. By adopting the method, the access of the graph data stream of the binary relation can be realized.

Description

Binary counting type summarization method and device based on graph data stream and computer equipment
Technical Field
The present application relates to the field of network communication technologies, and in particular, to a binary counting type summarization method and apparatus based on graph data streams, a computer device, and a storage medium.
Background
With the development of network communication technology, graph data flow appears, which is a general data flow model and represents an infinite continuous arriving data record sequence, and each data record corresponds to a weighted directed edge in a graph. Many data flows in the field of network communications can be classified into the category of graph data flows, for example, a network flow in a computer network corresponds to one data transmission from a source host to a specific destination host, and a message in a social network corresponds to one information interaction between two online accounts. Because of the limited storage and processing capabilities of computer systems, how to compute a fixed-size digest for a graph data stream with economical storage space becomes an important means to improve the scalability of graph data stream processing.
The traditional counting type abstract is composed of one-dimensional arrays and can only access a unary relation, namely, for a key value pair, an input key is calculated through a hash function, a random position of an array is output, and then the array element of the position is updated or inquired. The graph data flow reflects the binary relation between the vertexes, and each time one directed edge needs to be accessed, the calculation is not carried out by one-dimensional input key, so that the traditional counting type abstract is not applicable any more.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a graph data stream-based binary counter type summarization method, apparatus, computer device and storage medium capable of achieving graph data stream access in binary relation.
A method for binary-counting summarization based on graph data streams, the method comprising:
constructing a two-dimensional array abstract structure; the two-dimensional array abstract structure comprises a plurality of two-dimensional arrays; each position in the two-dimensional array is called a bucket, and each bucket maintains a counter;
inserting a weighted directed edge into the two-dimensional array abstract structure, and respectively calculating a source vertex and a destination vertex of the weighted directed edge by using two hash functions to obtain a row index and a column index of a bucket of the weighted directed edge;
finding a corresponding bucket in the two-dimensional array abstract structure according to the row index and the column index of the bucket with the weighted directed edge, and updating a counter of the corresponding bucket by utilizing an algebra sum rule and a weighted algebra sum rule to obtain an updated two-dimensional array abstract structure;
calculating the source vertex and the target vertex of the directed edge of the updated two-dimensional array abstract structure according to the hash function to obtain the row index and the column index of the bucket corresponding to the directed edge; respectively carrying out weight prediction value calculation according to the algebra sum rule and the counter of the bucket after the weighted algebra sum rule is updated, and obtaining the weight prediction value of the directed edge;
and updating the updated two-dimensional array abstract structure by using the weight predicted value of the directed edge to obtain a binary counting abstract.
In one embodiment, the updating the counter of the corresponding bucket by using the algebraic sum rule and the weighted algebraic sum rule to obtain the updated two-dimensional array abstract structure includes:
updating the counter of the corresponding bucket by utilizing the algebra sum rule and the weighted algebra sum rule to obtain the updated counter of the bucket;
and constructing an updated two-dimensional array abstract structure according to the updated counter of the bucket.
In one embodiment, the algebraic sum rule is:
Figure BDA0003545803560000021
wherein, I [ l ]]The first two-dimensional array is shown,
Figure BDA0003545803560000022
index representing bucket, no
Figure BDA0003545803560000023
Line and first
Figure BDA0003545803560000024
The bucket of column (t), s represents the source vertex, t represents the destination vertex, and c represents the weight.
In one embodiment, the weighted algebraic sum rule is:
Figure BDA0003545803560000025
Figure BDA0003545803560000026
wherein sign (s, t) ═ Hr(s)×Hc(t), sign (s, t) E + -1, sign (s, t) representing a random value calculated from directed edges, Hr(·),Hc(. cndot.) represents 2 hash functions.
In one embodiment, the calculating of the weight prediction value is performed according to the algebraic sum rule and the counter of the bucket updated by the weighted algebraic sum rule, so as to obtain the weight prediction value of the directed edge, and the method includes:
for the counter of the bucket updated by algebra and rules, the minimum value of each bucket is taken as the weight prediction value of the directed edge as
Figure BDA0003545803560000027
Where x represents the aggregate value of the bucket's counter, u represents the source vertex of the directed edge, v represents the destination vertex of the directed edge,
Figure BDA0003545803560000028
and k represents the number of the two-dimensional arrays.
In one embodiment, for the counter of the bucket updated by the weighted algebraic sum rule, the median of each bucket is taken as the weight predicted value of the directed edge, and the weight predicted value is
Figure BDA0003545803560000031
A binary-counting summarization apparatus based on graph data streams, the apparatus comprising:
the two-dimensional array abstract structure constructing module is used for constructing a two-dimensional array abstract structure; the two-dimensional array abstract structure comprises a plurality of two-dimensional arrays; each position in the two-dimensional array is called a bucket, and each bucket maintains a counter;
the weighted directed edge inserting module is used for inserting weighted directed edges into the two-dimensional array abstract structure, and calculating the source vertex and the target vertex of each weighted directed edge by using two hash functions to obtain the row index and the column index of the bucket of each weighted directed edge;
the two-dimensional array abstract structure updating module is used for finding a corresponding bucket in the two-dimensional array abstract structure according to the row index and the column index of the weighted directed edge bucket, and updating a counter of the corresponding bucket by utilizing an algebra sum rule and a weighted algebra sum rule to obtain an updated two-dimensional array abstract structure;
the binary counting type abstract constructing module is used for calculating the source vertex and the target vertex of the directed edge of the updated two-dimensional array abstract structure according to the hash function to obtain the row index and the column index of the barrel corresponding to the directed edge; respectively carrying out weight prediction value calculation according to the algebra sum rule and the counter of the bucket after the weighted algebra sum rule is updated, and obtaining the weight prediction value of the directed edge; and updating the updated two-dimensional array abstract structure by using the weight predicted value of the directed edge to obtain a binary counting abstract.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
constructing a two-dimensional array abstract structure; the two-dimensional array abstract structure comprises a plurality of two-dimensional arrays; each position in the two-dimensional array is called a bucket, and each bucket maintains a counter;
inserting a weighted directed edge into the two-dimensional array abstract structure, and respectively calculating a source vertex and a target vertex of the weighted directed edge by using two hash functions to obtain a row index and a column index of a bucket of the weighted directed edge;
finding a corresponding bucket in the two-dimensional array abstract structure according to the row index and the column index of the bucket with the weighted directed edge, and updating a counter of the corresponding bucket by utilizing an algebra sum rule and a weighted algebra sum rule to obtain an updated two-dimensional array abstract structure;
calculating the source vertex and the target vertex of the directed edge of the updated two-dimensional array abstract structure according to the hash function to obtain the row index and the column index of the bucket corresponding to the directed edge; respectively carrying out weight prediction value calculation according to the algebra sum rule and the counter of the bucket after the weighted algebra sum rule is updated, and obtaining the weight prediction value of the directed edge;
and updating the updated two-dimensional array abstract structure by using the weight predicted value of the directed edge to obtain a binary counting abstract.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
constructing a two-dimensional array abstract structure; the two-dimensional array abstract structure comprises a plurality of two-dimensional arrays; each position in the two-dimensional array is called a bucket, and each bucket maintains a counter;
inserting a weighted directed edge into the two-dimensional array abstract structure, and respectively calculating a source vertex and a target vertex of the weighted directed edge by using two hash functions to obtain a row index and a column index of a bucket of the weighted directed edge;
finding a corresponding bucket in the two-dimensional array abstract structure according to the row index and the column index of the bucket with the weighted directed edge, and updating a counter of the corresponding bucket by utilizing an algebra sum rule and a weighted algebra sum rule to obtain an updated two-dimensional array abstract structure;
calculating the source vertex and the target vertex of the directed edge of the updated two-dimensional array abstract structure according to the hash function to obtain the row index and the column index of the bucket corresponding to the directed edge; respectively carrying out weight prediction value calculation according to the algebraic sum rule and the counter of the bucket updated by the weighted algebraic sum rule to obtain a weight prediction value of the directed edge;
and updating the updated two-dimensional array abstract structure by using the weight predicted value of the directed edge to obtain a binary counting abstract.
According to the binary counting type summarization method, device, computer equipment and storage medium based on the graph data stream, a two-dimensional array summarization structure is firstly constructed, each position in an array is called as a barrel, each barrel maintains a counter for aggregating the count values of weighted directed edges, and each barrel can be accessed through row and column indexes within constant time; inserting a weighted directed edge into the two-dimensional array abstract structure, and respectively calculating a source vertex and a target vertex of the weighted directed edge by using two hash functions to obtain a row index and a column index of a bucket of the weighted directed edge; finding a corresponding bucket in the two-dimensional array abstract structure according to the row index and the column index of the bucket with the weighted directed edge, and updating a counter of the corresponding bucket by utilizing an algebra sum rule and a weighted algebra sum rule to obtain an updated two-dimensional array abstract structure; calculating the source vertex and the target vertex of the directed edge of the updated two-dimensional array abstract structure according to a hash function to obtain the row index and the column index of the bucket corresponding to the directed edge; respectively carrying out weight prediction value calculation according to the algebra sum rule and the counter of the bucket after the weighted algebra sum rule is updated, and obtaining the weight prediction value of the directed edge; the invention can flexibly support the access of the graph data stream by constructing the two-dimensional array abstract structure and adding the binary counting abstract obtained after the insertion operation and the query operation in the two-dimensional array abstract structure, realizes the online insertion and query of the weighted directed edge of the graph data stream, and flexibly supports the approximate calculation of the weight result by utilizing the algebraic sum and the weighted algebraic sum and two classes of operators, and when the hash conflict of the inserted directed edge is less, the precision of the weighted algebraic sum operator is higher than that of the algebraic sum operator.
Drawings
FIG. 1 is a flow diagram illustrating a binary-counting summarization method based on graph data flow, according to one embodiment;
FIG. 2 is a diagram illustrating a two-dimensional array abstract structure in one embodiment;
FIG. 3 is a block diagram of an embodiment of a binary-counting summarization apparatus based on graph data flow;
FIG. 4 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.
In one embodiment, as shown in fig. 1, there is provided a binary-counting summarization method based on graph data flow, comprising the following steps:
102, constructing a two-dimensional array abstract structure; the two-dimensional array abstract structure comprises a plurality of two-dimensional arrays; each location in the two-dimensional array is referred to as a bucket, and each bucket maintains a counter.
The abstract structure is used for storing a graph data stream, wherein the graph data stream is a general data stream model and represents a data record sequence which arrives infinitely continuously; many data flows in the field of network communications can be classified as graph data flows, for example, a network flow in a computer network corresponds to one data transmission from a source host to a specific destination host, and a message in a social network corresponds to one information interaction between two online accounts.
A two-dimensional array abstract structure comprises k two-dimensional arrays, and each two-dimensional array comprises mr*mcBucket (m)rNumber of rows, m, corresponding to the current two-dimensional arraycThe number of columns corresponding to the current two-dimensional array, both being parameters pre-configured by the system). Each location in the array is referred to as a "bucket," each bucket maintaining a counter for aggregating the count values of weighted directed edges, each bucket being accessible within a constant time by row and column indices, the ith two-dimensional array being indexed by I [ I [ ]]The p-th row and q-th column in the two-dimensional array are represented by I [ I](p, q) represents (p.epsilon. [1, m)r],q∈[1,mc]P and q are positive integers). The larger the number of rows or columns, the more memory resources are needed for a two-dimensional array, and each bucket contains 1 counter count. The two-dimensional array digest structure supports insert operations (inserting a weighted directed edge into the two-dimensional array digest structure) and query operations (querying the weight of a directed edge). Each two-dimensional array needs to select 2 hash functions to calculate the indexes of the rows and the columns, and a two-dimensional array abstract structure needs 2k hash functions as a hash function family
Figure BDA0003545803560000061
(
Figure BDA0003545803560000062
Representing a hash function that computes the line index,
Figure BDA0003545803560000063
represents a hash function that computes the column index, ∈ [1, k ]]) And the method is used for the insertion and query process of the directed edge.
And 104, inserting a weighted directed edge into the two-dimensional array abstract structure, and respectively calculating a source vertex and a target vertex of the weighted directed edge by using two hash functions to obtain a row index and a column index of a bucket of the weighted directed edge.
Inserting a weighted directed edge into a two-dimensional array abstract structure<s,t,v>When the key value of the element in the hash table is mapped into the element storage position, the hash function is used for calculating the storage address of the element in the element table, the hash function family is used for calculating the row and column indexes of the weighted directed edge in the two-dimensional array abstract structure, and for k two-dimensional arrays, the index mark of the bucket of the weighted directed edge is used for marking
Figure BDA0003545803560000064
For the l-th group, the bucket index is
Figure BDA0003545803560000065
Represents the first
Figure BDA0003545803560000066
Line and first
Figure BDA0003545803560000067
The barrels of the column.
And 106, finding a corresponding bucket in the two-dimensional array abstract structure according to the row index and the column index of the bucket with the weighted directed edge, and updating a counter of the corresponding bucket by using an algebra sum rule and a weighted algebra sum rule to obtain an updated two-dimensional array abstract structure.
Each position in the array is called a barrel, the position represents the number of rows and columns, the position of the row index and the column index in the two-dimensional array abstract structure is found according to the row index and the column index of the weighted directed edge barrel, the position is the barrel corresponding to the row index and the column index of the weighted directed edge barrel, the counter of the corresponding barrel is updated by utilizing an algebra sum rule and a weighted algebra sum rule, and the updated two-dimensional array abstract structure, namely the two-bit array abstract structure supporting the insert operation, is obtained.
Step 108, calculating the source vertex and the target vertex of the directed edge of the updated two-dimensional array abstract structure according to the hash function to obtain the row index and the column index of the bucket corresponding to the directed edge; respectively carrying out weight prediction value calculation according to the algebra sum rule and the counter of the bucket after the weighted algebra sum rule is updated, and obtaining the weight prediction value of the directed edge; and updating the updated two-dimensional array abstract structure by using the weight predicted value of the directed edge to obtain a binary counting abstract.
The query operation is executed after the insertion operation is completed, and for a given arbitrary directed edge < u, v >, the row and column indexes of the updated two-dimensional array abstract structure are calculated as
Figure BDA0003545803560000071
The index of the ith bucket is
Figure BDA0003545803560000072
And then, respectively calculating the weight predicted values of the counters of the updated buckets according to the two insertion rules, and calculating the weight predicted values of the directed edges in the updated two-dimensional array abstract structure, so that the graph data stream can be conveniently inquired of the directed edges.
In the binary counting type summarization method based on the graph data stream, a two-dimensional array summarization structure is firstly constructed, each position in an array is called as a 'bucket', each bucket maintains a counter for aggregating the count values of weighted directed edges, and each bucket can be accessed through row and column indexes within constant time; inserting a weighted directed edge into the two-dimensional array abstract structure, and respectively calculating a source vertex and a target vertex of the weighted directed edge by using two hash functions to obtain a row index and a column index of a bucket of the weighted directed edge; finding a corresponding bucket in the two-dimensional array abstract structure according to the row index and the column index of the bucket with the weighted directed edge, and updating a counter of the corresponding bucket by utilizing an algebra sum rule and a weighted algebra sum rule to obtain an updated two-dimensional array abstract structure; calculating the source vertex and the target vertex of the directed edge of the updated two-dimensional array abstract structure according to the hash function to obtain the row index and the column index of the bucket corresponding to the directed edge; respectively carrying out weight prediction value calculation according to the algebra sum rule and the counter of the bucket after the weighted algebra sum rule is updated, and obtaining the weight prediction value of the directed edge; the invention can flexibly support the access of the graph data stream by constructing the two-dimensional array abstract structure and adding the binary counting abstract obtained after the insertion operation and the query operation in the two-dimensional array abstract structure, realizes the online insertion and query of the weighted directed edge of the graph data stream, and flexibly supports the approximate calculation of the weight result by utilizing the algebraic sum and the weighted algebraic sum and two classes of operators, and when the hash conflict of the inserted directed edge is less, the precision of the weighted algebraic sum operator is higher than that of the algebraic sum operator.
In one embodiment, the updating the counter of the corresponding bucket by using the algebraic sum rule and the weighted algebraic sum rule to obtain the updated two-dimensional array abstract structure includes:
updating the counter of the corresponding bucket by utilizing the algebra sum rule and the weighted algebra sum rule to obtain the updated counter of the bucket;
and constructing an updated two-dimensional array abstract structure according to the updated counter of the bucket.
As shown in FIG. 2, counters for buckets in the two-dimensional array digest structure are updated with algebraic sum operators and weighted algebraic sum operators.
In one embodiment, the algebraic sum rule is:
Figure BDA0003545803560000081
wherein, I [ l ]]The first two-dimensional array is shown,
Figure BDA0003545803560000082
index representing bucket, no
Figure BDA0003545803560000083
Line and first
Figure BDA0003545803560000084
The bucket of column (t), s represents the source vertex, t represents the destination vertex, and c represents the weight.
In one embodiment, the weighted algebraic sum rule is:
Figure BDA0003545803560000085
Figure BDA0003545803560000086
wherein sign (s, t) ═ Hr(s)×Hc(t), sign (s, t) E + -1, sign (s, t) representing a random value calculated from directed edges, Hr(·),Hc(. cndot.) represents 2 hash functions.
In one embodiment, the calculating of the weight prediction value is performed according to the algebraic sum rule and the counter of the bucket updated by the weighted algebraic sum rule, so as to obtain the weight prediction value of the directed edge, and the method includes:
for the counter of the bucket updated by algebra and rules, the minimum value of each bucket is taken as the weight prediction value of the directed edge as
Figure BDA0003545803560000087
Where x represents the aggregate value of the bucket's counter, u represents the source vertex of the directed edge, v represents the destination vertex of the directed edge,
Figure BDA0003545803560000088
and k represents the number of the two-dimensional arrays.
If a plurality of directed edges are inserted into the same bucket, the counter of the bucket is not less than the weight of the arbitrarily inserted edge, so the minimum value of each bucket is taken as the weight predicted value of the directed edge.
In one embodiment, for the counter of the bucket updated by the weighted algebraic sum rule, the median of each bucket is taken as the weight predicted value of the directed edge, and the weight predicted value is
Figure BDA0003545803560000091
When the counter of the bucket is updated by using the weighted algebra and the rule, the numerical value of each bucket is close to the true value through the numerical value of the positive and negative offset part, so that the median is selected as the weight predicted value of the directed edge.
It should be understood that, although the steps in the flowchart of fig. 1 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 3, there is provided a graph data flow-based binary-counting summarization apparatus, including: a two-dimensional array abstract structure constructing module 302, a weighted directed edge inserting module 304, a two-dimensional array abstract structure updating module 306 and a binary counting abstract structure constructing module 308, wherein:
a two-dimensional array abstract structure constructing module 302, configured to construct a two-dimensional array abstract structure; the two-dimensional array abstract structure comprises a plurality of two-dimensional arrays; each position in the two-dimensional array is called a bucket, and each bucket maintains a counter;
a weighted directed edge insertion module 304, configured to insert a weighted directed edge into the two-dimensional array digest structure, and calculate a source vertex and a destination vertex of the weighted directed edge by using two hash functions, respectively, to obtain a row index and a column index of a bucket of the weighted directed edge;
the two-dimensional array abstract structure updating module 306 is configured to find a corresponding bucket in the two-dimensional array abstract structure according to the row index and the column index of the weighted directed edge bucket, and update a counter of the corresponding bucket by using an algebra sum rule and a weighted algebra sum rule to obtain an updated two-dimensional array abstract structure;
a binary counting type abstract constructing module 308, configured to calculate, according to a hash function, a source vertex and a destination vertex of a directed edge of the updated two-dimensional array abstract structure, so as to obtain a row index and a column index of a bucket corresponding to the directed edge; respectively carrying out weight prediction value calculation according to the algebra sum rule and the counter of the bucket after the weighted algebra sum rule is updated, and obtaining the weight prediction value of the directed edge; and updating the updated two-dimensional array abstract structure by using the weight predicted value of the directed edge to obtain a binary counting abstract.
In one embodiment, the two-dimensional array digest structure updating module 306 is further configured to update the counter of the corresponding bucket by using the algebraic sum rule and the weighted algebraic sum rule to obtain an updated two-dimensional array digest structure, including:
updating the counter of the corresponding bucket by utilizing the algebra sum rule and the weighted algebra sum rule to obtain the updated counter of the bucket;
and constructing an updated two-dimensional array abstract structure according to the updated counter of the bucket.
In one embodiment, the algebraic sum rule is:
Figure BDA0003545803560000101
wherein, I [ l ]]The first two-dimensional array is shown,
Figure BDA0003545803560000102
index representing bucket, no
Figure BDA0003545803560000103
Go, first
Figure BDA0003545803560000104
(t) buckets in column, s represents source vertex, t represents destination vertex, and c represents weight.
In one embodiment, the weighted algebraic sum rule is:
Figure BDA0003545803560000105
Figure BDA0003545803560000106
wherein sign (s, t) ═ Hr(s)×Hc(t), sign (s, t) E + -1, sign (s, t) representing a random value calculated from directed edges, Hr(·),Hc(. cndot.) represents 2 hash functions.
In one embodiment, the binary-counting type abstract constructing module 308 is further configured to perform weight prediction value calculation according to the algebraic sum rule and the counter of the bucket updated by the weighted algebraic sum rule, to obtain a weight prediction value of the directed edge, where the weight prediction value includes:
for the counter of the bucket updated by algebra and rules, the minimum value of each bucket is taken as the weight prediction value of the directed edge as
Figure BDA0003545803560000107
Where x represents the aggregate value of the bucket's counter, u represents the source vertex of the directed edge, v represents the destination vertex of the directed edge,
Figure BDA0003545803560000108
and k represents the number of the two-dimensional arrays.
In one embodiment, for the counter of the bucket updated by the weighted algebraic sum rule, the median of each bucket is taken as the weight predicted value of the directed edge, and the weight predicted value is
Figure BDA0003545803560000111
For the specific definition of the binary count type summarization device based on graph data stream, refer to the above definition of a binary count type summarization method based on graph data stream, which is not described herein again. The modules in the binary counting type summarization device based on graph data flow can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 3. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a binary-counting summarization method based on graph data streams. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the configuration shown in fig. 3 is a block diagram of only a portion of the configuration associated with the present application, and is not intended to limit the computing device to which the present application may be applied, and that a particular computing device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In an embodiment, a computer device is provided, comprising a memory storing a computer program and a processor implementing the steps of the method in the above embodiments when the processor executes the computer program.
In an embodiment, a computer storage medium is provided, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method in the above-mentioned embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A binary counting type summarization method based on graph data flow is characterized by comprising the following steps:
constructing a two-dimensional array abstract structure; the two-dimensional array abstract structure comprises a plurality of two-dimensional arrays; each position in the two-dimensional array is called a bucket, and each bucket maintains a counter;
inserting a weighted directed edge into a two-dimensional array abstract structure, and respectively calculating a source vertex and a destination vertex of the weighted directed edge by using two hash functions to obtain a row index and a column index of a bucket of the weighted directed edge;
finding a corresponding bucket in the two-dimensional array abstract structure according to the row index and the column index of the bucket with the weighted directed edge, and updating a counter of the corresponding bucket by utilizing an algebra sum rule and a weighted algebra sum rule to obtain an updated two-dimensional array abstract structure;
calculating the source vertex and the target vertex of the directed edge of the updated two-dimensional array abstract structure according to the hash function to obtain the row index and the column index of the bucket corresponding to the directed edge; respectively carrying out weight prediction value calculation according to the algebra sum rule and the counter of the bucket updated by the weighted algebra sum rule to obtain the weight prediction value of the directed edge;
and updating the updated two-dimensional array abstract structure by using the weight predicted value of the directed edge to obtain a binary counting abstract.
2. The method of claim 1, wherein updating the counter of the corresponding bucket using an algebraic sum rule and a weighted algebraic sum rule to obtain an updated two-dimensional array digest structure comprises:
updating the counter of the corresponding barrel by utilizing an algebraic sum rule and a weighted algebraic sum rule to obtain the updated counter of the barrel;
and constructing an updated two-dimensional array abstract structure according to the updated counter of the bucket.
3. The method of claim 2, further comprising:
the algebraic sum rule is as follows:
Figure FDA0003545803550000011
wherein, I [ l ]]The first two-dimensional array is shown,
Figure FDA0003545803550000012
index representing bucket, no
Figure FDA0003545803550000013
Line and first
Figure FDA0003545803550000014
Figure FDA0003545803550000015
The buckets of columns, s for the source vertex, t for the destination vertex, and c for the weight.
4. The method of claim 3, further comprising:
the weighted algebraic sum rule is as follows:
Figure FDA0003545803550000016
wherein sign (s, t) ═ Hr(s)×Hc(t), sign (s, t) E + -1, sign (s, t) representing a random value calculated from directed edges, Hr(·),Hc(. cndot.) represents 2 hash functions.
5. The method of claim 4, wherein performing weight prediction value calculation according to the algebraic sum rule and the counter of the bucket updated by the weighted algebraic sum rule to obtain the weight prediction value of the directed edge comprises:
for the counter of the bucket updated by the algebra and the rule, taking the minimum value of each bucket as the weight predicted value of the directed edge as
Figure FDA0003545803550000021
Where x represents the aggregate value of the bucket's counter, u represents the source vertex of the directed edge, v represents the destination vertex of the directed edge,
Figure FDA0003545803550000022
and k represents the number of the two-dimensional arrays.
6. The method of claim 5, further comprising:
for the counter of the bucket updated by the weighted algebra and the rule, the median of each bucket is taken as the weight predicted value of the directed edge
Figure FDA0003545803550000023
7. A binary-counting summarization apparatus based on graph data stream, the apparatus comprising:
the two-dimensional array abstract structure constructing module is used for constructing a two-dimensional array abstract structure; the two-dimensional array abstract structure comprises a plurality of two-dimensional arrays; each position in the two-dimensional array is called a barrel, and each barrel maintains a counter;
the weighted directed edge inserting module is used for inserting weighted directed edges into the two-dimensional array abstract structure, and calculating a source vertex and a destination vertex of each weighted directed edge by using two hash functions to obtain a row index and a column index of a bucket of each weighted directed edge;
the two-dimensional array abstract structure updating module is used for finding a corresponding bucket in the two-dimensional array abstract structure according to the row index and the column index of the bucket of the weighted directed edge, and updating a counter of the corresponding bucket by utilizing an algebraic sum rule and a weighted algebraic sum rule to obtain an updated two-dimensional array abstract structure;
the binary counting type abstract constructing module is used for calculating the source vertex and the destination vertex of the directed edge of the updated two-dimensional array abstract structure according to the hash function to obtain the row index and the column index of the bucket corresponding to the directed edge; respectively carrying out weight prediction value calculation according to the algebra sum rule and the counter of the bucket updated by the weighted algebra sum rule to obtain the weight prediction value of the directed edge; and updating the updated two-dimensional array abstract structure by using the weight predicted value of the directed edge to obtain a binary counting abstract.
8. The apparatus of claim 7, wherein the two-dimensional array digest structure updating module is further configured to update the counter of the corresponding bucket by using a summation rule and a weighted summation rule to obtain an updated counter of the bucket; and constructing an updated two-dimensional array abstract structure according to the updated counter of the bucket.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 6 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.
CN202210248361.2A 2022-03-14 2022-03-14 Binary counting type summarization method and device based on graph data stream and computer equipment Pending CN114722242A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210248361.2A CN114722242A (en) 2022-03-14 2022-03-14 Binary counting type summarization method and device based on graph data stream and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210248361.2A CN114722242A (en) 2022-03-14 2022-03-14 Binary counting type summarization method and device based on graph data stream and computer equipment

Publications (1)

Publication Number Publication Date
CN114722242A true CN114722242A (en) 2022-07-08

Family

ID=82238262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210248361.2A Pending CN114722242A (en) 2022-03-14 2022-03-14 Binary counting type summarization method and device based on graph data stream and computer equipment

Country Status (1)

Country Link
CN (1) CN114722242A (en)

Similar Documents

Publication Publication Date Title
CN109918382A (en) Data processing method, device, terminal and storage medium
CN111159329A (en) Sensitive word detection method and device, terminal equipment and computer-readable storage medium
CN110162692B (en) User label determination method and device, computer equipment and storage medium
CN112269830A (en) Big data analysis method, system, computer equipment and storage medium thereof
CN106991080A (en) A kind of quantile of data determines method and device
CN111078689B (en) Data processing method and system of discontinuous pre-ordering traversal tree algorithm
CN105677645A (en) Data sheet comparison method and device
CN109240893B (en) Application running state query method and terminal equipment
US7734456B2 (en) Method and apparatus for priority based data processing
CN112767032A (en) Information processing method and device, electronic equipment and storage medium
CN114722242A (en) Binary counting type summarization method and device based on graph data stream and computer equipment
CN111158732A (en) Access data processing method and device, computer equipment and storage medium
CN116304251A (en) Label processing method, device, computer equipment and storage medium
CN115759742A (en) Enterprise risk assessment method and device, computer equipment and storage medium
Adan et al. Analysis of structured Markov processes
Li et al. Optimizing streaming graph partitioning via a heuristic greedy method and caching strategy
CN113780666A (en) Missing value prediction method and device and readable storage medium
CN113360218A (en) Service scheme selection method, device, equipment and storage medium
Kang Stochastic coordinate-exchange optimal designs with complex constraints
Carrasco Transient analysis of large Markov models with absorbing states using regenerative randomization
CN116501993B (en) House source data recommendation method and device
Bordenave et al. Markovian linearization of random walks on groups
CN113411395B (en) Access request routing method, device, computer equipment and storage medium
CN112380494B (en) Method and device for determining object characteristics
CN115827930B (en) Data query optimization method, system and device for graph database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination