CN103593433A - Graph data processing method and system for massive time series data - Google Patents

Graph data processing method and system for massive time series data Download PDF

Info

Publication number
CN103593433A
CN103593433A CN201310559846.4A CN201310559846A CN103593433A CN 103593433 A CN103593433 A CN 103593433A CN 201310559846 A CN201310559846 A CN 201310559846A CN 103593433 A CN103593433 A CN 103593433A
Authority
CN
China
Prior art keywords
graph structure
data
summit
application job
piece
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310559846.4A
Other languages
Chinese (zh)
Other versions
CN103593433B (en
Inventor
周薇
高赟
冉攀峰
韩冀中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201310559846.4A priority Critical patent/CN103593433B/en
Publication of CN103593433A publication Critical patent/CN103593433A/en
Application granted granted Critical
Publication of CN103593433B publication Critical patent/CN103593433B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models

Abstract

The invention relates to a graph data processing method and system for massive time series data. The graph data processing method for the massive time series data comprises the steps of carrying out preprocessing on social network data, and abstracting a graph structure which uses vertexes to represent figures, and uses a plurality of edges with timestamps to represent the interactive relationship among the figures, wherein the representing method can effectively represent a social network relationship with an interactive sequential relationship; dividing the graph structure into a plurality of graph structure blocks according to the celebrity charm and a preset Euclidean distance, and numbering the graph structure blocks and the vertexes of the interiors of the graph structure blocks; leading the graph structure blocks in corresponding positions of a memory according to the organization mode of the memory, wherein the storage mode of the memory makes full use of the distribution characteristics of graph data, and efficient storage performance and efficient query performance can be achieved. According to the graph data processing method for the massive time series data, an original programming model using the vertexes as calculation units is improved on the basis of the principle that the calculation time and the memory space are saved, and a programming model using messages as calculation units is adopted, so that the calculation time is saved through the mode to a large extent, and the storage space is also saved.

Description

A kind of diagram data disposal route and system towards magnanimity time series data
Technical field
The present invention relates to Large Scale Graphs data processing field, particularly relate to a kind of diagram data disposal route and system towards magnanimity time series data.
Background technology
Along with the fast development of internet, social network-i i-platform also develops rapidly and popularizes in recent years.With diagram data, represent the character relation in social networks, in conjunction with nomography, can excavate hiding Info in social networks character relation.So along with the development of social networks, diagram data has welcome again upsurge.
But along with popularizing gradually of social networks, social network data also presents the trend of exponential growth.So, just can not realize the analyzing and processing to these Large Scale Graphs data with standalone version diagram data handling procedure.
Extensive along with the temperature of social networks and social network data, the scheme of these large-scale datas of processing that adopt is at present these data of parallel processing on many machines.These parallel schemes are all followed following thinking substantially: first by huge diagram data, according to certain rule cutting, (the larger region of the consistency of take is that benchmark is set the foundation of dividing memory block, this has just caused memory block can store the data of the graph structure that lower consistency is large, but for more sparse graph structure, although also can store, but caused the waste of a large amount of storage spaces), be cut into many parts, every part of diagram data is all stored in wherein on a machine.When starting to process, from the machine, load the diagram data that the machine is stored, then calculate, last again by the mechanism exchange results of intermediate calculations of transmission of messages, through iteration repeatedly, thereby obtain final result of calculation.
In the solution that existing diagram data is processed, more representational solution has two kinds.A Pregel that to be Google proposed in 2009, Pregel is used BSP(Bulk Synchronous Parallel, Integral synchronous parallel computational model) complete calculating.Due to figure application, can not once calculate and just obtain net result, need repeatedly iterative computation.So synchronous process once just between every twice iteration, this synchronous process refers to waits for that all tasks all calculate completely, then could carry out synchronous.This mode of Pregel is fairly simple, easily understand, but the large synchronizing process between every twice iteration is more consuming time, and treatment effeciency is not high.Another one Typical Representative is the Ttrinity system of Microsoft, this system is to announce out for 2012, relative Pregel system, there is following advantage: first, for diagram data, process this typical case's application, diagram data is stored in distributed memory from being stored in file system instead, has accelerated the loading efficiency of diagram data; Secondly, for concrete figure application, the asynchronous replacement of synchronizing process between every twice iteration, can reduce the overall performance expense of synchronously bringing.
But diagram data is processed the following several problem that is still faced with:
In diagram data, except having interpersonal annexation, also have the interactive relationship relevant to time series.This interactive relationship is different from annexation, and annexation can represent with a limit in graph structure, but interactive relationship is directly related with the time, and it represents according to time relationship, have multiple expression and interactive relationship between two summits.But at present, Large Scale Graphs data handling system has all only been processed interpersonal annexation, do not process interpersonal according to the interactive relationship of asynchronism(-nization).
The distribution situation of social networks presents celebrity effect, and famous person is more concerned than ordinary people in social networks.So famous person colony is part dense in graph structure, and the people of this circle is familiar with mutually substantially.So in representing the graph structure of social networks, dense part is relatively concentrated, remaining most ordinary people is exactly the sparse part in this graph structure.So for this dense and sparse graph structure coexisting, adopt which kind of diagram data storage can reach efficient characteristic with method for organizing? in existing diagram data disposal system, do not consider data characteristic and take unified storage policy, do not customize and just cannot reach high-performance, so it is also the significant problem that this field should be considered that efficient diagram data is organized.
Diagram data is all usingd diagram data summit as processing unit in processing, and when arriving a collection of message, for each summit of diagram data, travels through all message, after diagram data summit is traveled through, the message of all arrivals could be deleted, and is equivalent to a matching process.This matching process not only expends time in, and travel through all message to each summit, and will take that a large amount of memory headrooms are stored these intermediary message until all summits have all traveled through.
Summary of the invention
Technical matters to be solved by this invention is to provide a kind of diagram data disposal route and system towards magnanimity time series data, and the storage that solves be beyond expression in existing diagram data treatment technology sequential interactive relationship, diagram data do not fully take into account data distribution character, the diagram data that summit is unit of take is processed programming model and exist the problems such as significant wastage on computing time and storage space.
The technical scheme that the present invention solves the problems of the technologies described above is as follows: a kind of diagram data disposal route towards magnanimity time series data, comprises the steps:
Step 1: pre-service social network data, and take out with summit representative figure, with the graph structure of interactive relationship between the limit representative figure of the free stamp of some bands;
Step 2: according to celebrity effect, graph structure is cut into several graph structure pieces according to predetermined Euclidean distance, and gives graph structure piece and inner summit numbering thereof;
Step 3: graph structure piece is distributed to different nodes and process, the graph structure piece that each node is got according to memory organization mode imports in the relevant position of internal memory;
Step 4: user writes application job according to message based programming model, and application job is submitted to application job processing unit;
Step 5: application job processing unit obtains the data of required graph structure piece from internal memory, and carry out application job according to message based tupe, obtain operation result.
The invention has the beneficial effects as follows: the present invention sets up many limits between two summits, every limit represents two interactive relationship between summit, on every limit with representing this limit timestamp of Time Created, namely the interactive time occurring, this method for expressing can effectively indicate the social network relationships of interactive sequential relationship; The special distribution (celebrity effect) that the present invention is directed to diagram data has designed a kind of memory mode of diagram data, takes full advantage of the distribution character of diagram data, can reach efficient storage and query performance; The present invention is in line with the principle of saving computing time and memory headroom, improved the original programming model that summit is computing unit of take, but adopt, take the programming model that message is computing unit, this mode has been saved computing time to a great extent, has also saved storage space.
On the basis of technique scheme, the present invention can also do following improvement.
Further, described step 1 pair social network data is carried out pretreated concrete steps and is:
Step 1.1: extract concrete personage from social network data, the summit of composition diagram structure;
Step 1.2: extract the interaction between personage and personage from social network data, a limit of each interactive composition diagram structure;
Step 1.3: arrange on every limit of graph structure and represent this limit timestamp of Time Created.
Further, step 2 is cut into several graph structure piece by graph structure according to Euclidean distance according to celebrity effect, and the specific implementation of numbering to graph structure piece and inner summit thereof is:
Step 2.1: the number on the limit being connected with each summit in statistical graph structure, the mean value of calculating chart structure consistency;
Step 2.2: set the Euclidean distance for cutting graph structure according to the mean value of graph structure consistency, and according to Euclidean distance, graph structure is carried out to cutting, obtain several graph structure pieces;
Step 2.3: the graph structure piece after cutting is numbered and obtains block number, and each summit in graph structure piece is numbered, the block number that is numbered on each summit adds summit numbering.
Concrete steps in the relevant position of the graph structure piece importing internal memory that further, in described step 3, each node is got according to memory organization mode are as follows:
Step 3.1: open up in internal memory between a slice memory field, and will be divided into the memory partitioning of N fixed size between memory field, for storing N graph structure piece;
Step 3.2: distribute the memory block of a fixed size for each summit in each memory partitioning, for storing vertex data and the relation data on this summit;
Step 3.3: judge whether this summit has celebrity effect, if had, and described memory block can not be stored vertex data and all relation datas on this summit, the vertex data on this summit and part relations data are stored in memory block, and open up again the add-in memories piece of one or more fixed sizes, remaining relation data is stored in add-in memories piece, and with add-in memories piece described in pointed; Otherwise directly the vertex data on this summit and all relation datas are stored in correspondence memory piece;
Step 3.4: in annex memory block, set up and take the time as primary key, the index that the summit of take in graph structure is secondary key.
Further, in step 5, application job processing unit obtains desired data from internal memory, and carries out application job according to message based tupe, and the specific implementation that obtains operation result is:
Step 5.1: application job performance element is carried out application job, and application job comprises several tasks, graph structure piece of each task management;
Step 5.2: each task is obtained desired data by summit numbering from the graph structure piece of its management, processes the data in graph structure piece according to the processing logic of application job, generates some message, and message is sent to other tasks after finishing dealing with;
Step 5.3: the task of receipt message is resolved every a piece of news of arrival successively, extracts the object summit that this message should arrive;
Step 5.4: upgrade the value on object summit, and then delete this message;
Step 5.5: judge whether to also have untreated message, if had, return to step 5.3, otherwise finish.
The technical scheme that the present invention solves the problems of the technologies described above is as follows: a kind of diagram data disposal system towards magnanimity time series data, comprises accumulation layer, computation layer and client layer;
Described accumulation layer, it is for setting up the graph structure that can represent interactive relationship between personage, and according to celebrity effect, realizes the personalization storage of graph structure;
Described client layer, it is for writing the application job based on Message Processing, and submits to computation layer;
Described computation layer, it is for obtain the data of required graph structure piece from accumulation layer, and carries out application job according to message based tupe, obtains operation result.
On the basis of technique scheme, the present invention can also do following improvement.
Further, described accumulation layer comprises data pretreatment unit, graph structure cutter unit, data importing unit and internal storage location;
Described data pretreatment unit, it is for social network data is carried out to pre-service, and takes out with summit representative figure, with the graph structure of interactive relationship between the limit representative figure of the free stamp of some bands;
Described graph structure cutter unit, it is for according to celebrity effect, graph structure being cut into several graph structure pieces according to predetermined Euclidean distance, and the summit numbering of giving graph structure piece and inside thereof;
Described data importing unit, it is processed for graph structure piece is distributed to different nodes, and the graph structure piece that each node is got according to memory organization mode imports in the relevant position of internal memory;
Described internal storage location, comprises the internal memory of several nodes, is respectively used to the graph structure blocks of data that storage imports.
Further, described client layer comprises that application job writes unit and application job commit unit,
Described application job is write unit, and it is for writing the application job based on Message Processing, and application job is sent to application job commit unit;
Described application job commit unit, it is for submitting to application job the application job processing unit of computation layer.
Further, described computation layer comprises application job processing unit, and it is for carrying out application job according to Message Processing pattern.
Further, in described figure application job, comprise several tasks, each task is responsible for processing a graph structure piece.
Accompanying drawing explanation
Fig. 1 is a kind of diagram data process flow figure towards magnanimity time series data of the present invention;
Fig. 2 is the specific implementation process flow diagram of step 1 of the present invention;
Fig. 3 is the specific implementation process flow diagram of step 2 of the present invention;
Fig. 4 is the specific implementation process flow diagram of step 3 of the present invention;
Fig. 5 is the specific implementation process flow diagram of step 5 of the present invention;
Fig. 6 is a kind of diagram data disposal system structured flowchart towards magnanimity time series data of the present invention;
Fig. 7 is partial graph structural representation in embodiment 1 of the present invention;
Fig. 8 is the storage organization schematic diagram of a memory partitioning in the internal memory of 2 one nodes of embodiment of the present invention;
Fig. 9 be take the processing procedure schematic diagram that summit is processing unit in prior art;
Figure 10 is the processing procedure schematic diagram that message is processing unit of take of the present invention.
In accompanying drawing, the list of parts of each label representative is as follows:
1, accumulation layer, 2, client layer, 3, computation layer, 1-1, data pretreatment unit, 1-2, graph structure cutter unit, 1-3, data importing unit, 1-4, internal storage location, 2-1, application job are write unit, 2-2, application job commit unit, 3-1, application job processing unit; 101, memory block, 102, add-in memories piece, 201, summit, 202, message.
Embodiment
Below in conjunction with accompanying drawing, principle of the present invention and feature are described, example, only for explaining the present invention, is not intended to limit scope of the present invention.
As shown in Figure 1, a kind of diagram data disposal route towards magnanimity time series data, comprises the steps:
Step 1: pre-service social network data, and take out with summit representative figure, with the graph structure of interactive relationship between the limit representative figure of the free stamp of some bands;
Step 2: according to celebrity effect, graph structure is cut into several graph structure pieces according to predetermined Euclidean distance, and gives graph structure piece and inner summit numbering thereof;
Step 3: graph structure piece is distributed to different nodes and process, the graph structure piece that each node is got according to memory organization mode imports in the relevant position of internal memory;
Step 4: user writes application job according to message based programming model, and application job is submitted to application job processing unit;
Step 5: application job processing unit obtains the data of required graph structure piece from internal memory, and carry out application job according to message based tupe, obtain operation result.
As shown in Figure 2, described step 1 pair social network data is carried out pretreated concrete steps and is:
Step 1.1: extract concrete personage from social network data, the summit of composition diagram structure;
Step 1.2: extract the interaction between personage and personage from social network data, a limit of each interactive composition diagram structure;
Step 1.3: arrange on every limit of graph structure and represent this limit timestamp of Time Created.
As shown in Figure 3, step 2 is cut into several graph structure piece by graph structure according to Euclidean distance according to celebrity effect, and the specific implementation of numbering to graph structure piece and inner summit thereof is:
Step 2.1: the number on the limit being connected with each summit in statistical graph structure, the mean value of calculating chart structure consistency;
Step 2.2: set the Euclidean distance for cutting graph structure according to the mean value of graph structure consistency, and according to Euclidean distance, graph structure is carried out to cutting, obtain several graph structure pieces;
Step 2.3: the graph structure piece after cutting is numbered and obtains block number, and each summit in graph structure piece is numbered, the block number that is numbered on each summit adds summit numbering.
Concrete steps in the relevant position of the graph structure piece importing internal memory that as shown in Figure 4, in described step 3, each node is got according to memory organization mode are as follows:
Step 3.1: open up in internal memory between a slice memory field, and will be divided into the memory partitioning of N fixed size between memory field, for storing N graph structure piece;
Step 3.2: distribute the memory block of a fixed size for each summit in each memory partitioning, for storing vertex data and the relation data on this summit;
Step 3.3: judge whether this summit has celebrity effect, if had, and described memory block can not be stored vertex data and all relation datas on this summit, the vertex data on this summit and part relations data are stored in memory block, and open up again the add-in memories piece of one or more fixed sizes, remaining relation data is stored in add-in memories piece, and with add-in memories piece described in pointed; Otherwise directly the vertex data on this summit and all relation datas are stored in correspondence memory piece;
Step 3.4: in annex memory block, set up and take the time as primary key, the index that the summit of take in graph structure is secondary key.
As shown in Figure 5, in step 5, application job processing unit obtains desired data from internal memory, and carries out application job according to message based tupe, and the specific implementation that obtains operation result is:
Step 5.1: application job performance element is carried out application job, and application job comprises several tasks, graph structure piece of each task management;
Step 5.2: each task is obtained desired data by summit numbering from the graph structure piece of its management, processes the data in graph structure piece according to the processing logic of application job, generates some message, and message is sent to other tasks after finishing dealing with;
Step 5.3: the task of receipt message is resolved every a piece of news of arrival successively, extracts the object summit that this message should arrive;
Step 5.4: upgrade the value on object summit, and then delete this message;
Step 5.5: judge whether to also have untreated message, if had, return to step 5.3, otherwise finish.
As shown in Figure 6: a kind of diagram data disposal system towards magnanimity time series data, comprises accumulation layer 1, client layer 2 and computation layer 3;
Described accumulation layer 1, it is for setting up the graph structure that can represent interactive relationship between personage, and according to celebrity effect, realizes the personalization storage of graph structure;
Described client layer 2, it is for writing the application job based on Message Processing, and submits to computation layer 3;
Described computation layer 3, it is for obtain the data of required graph structure piece from accumulation layer 1, and carries out application job according to message based tupe, obtains operation result.
Wherein, described accumulation layer 1 comprises data pretreatment unit 1-1, graph structure cutter unit 1-2, data importing unit 1-3 and internal storage location 1-4;
Described data pretreatment unit 1-1, it is for social network data is carried out to pre-service, and takes out with summit representative figure, with the graph structure of interactive relationship between the limit representative figure of the free stamp of some bands;
Described graph structure cutter unit 1-2, it is for according to celebrity effect, graph structure being cut into several graph structure pieces according to predetermined Euclidean distance, and the summit numbering of giving graph structure piece and inside thereof;
Described data importing unit 1-3, it is processed for graph structure piece is distributed to different nodes, and the graph structure piece that each node is got according to memory organization mode imports in the relevant position of internal memory;
Described internal storage location 1-4, comprises the internal memory of several nodes, is respectively used to the graph structure blocks of data that storage imports.
Wherein, described client layer 2 comprises that application job writes unit 2-1 and application job commit unit 2-2,
Described application job is write unit 2-1, and it is for writing the application job based on Message Processing, and application job is sent to application job commit unit 2-2;
Described application job commit unit 2-2, it is for submitting to application job the application job processing unit 3-1 of computation layer.
Wherein, described computation layer 3 comprises application job processing unit 3-1, and it is for carrying out application job according to Message Processing pattern.
Wherein, in described figure application job, comprise several tasks, each task is responsible for processing one or more and is numbered adjacent graph structure.
In diagram data, except having interpersonal annexation, also have the interpersonal interactive relationship relevant to time series.This interactive relationship is different from annexation, and annexation can represent with a limit in graph structure, but interactive relationship is directly related with the time, and it represents according to time relationship, have multiple expression and interactive relationship between two summits.But at present, Large Scale Graphs data handling system has all only been processed interpersonal annexation, do not process interpersonal according to the interactive relationship of asynchronism(-nization).
The present invention has designed a kind of with summit representative figure, graph structure with the interactive relationship between the limit representative figure with timestamp, this method for expressing is set up many limits between two summits, Fig. 7 is in the embodiment of the present invention 1, the partial graph structural representation that can represent interactive relationship between personage, value on every limit is the Time Created on this limit, and namely the interactive time occurring, this method for expressing can effectively indicate the social network relationships of interactive sequential relationship.Many versions of data are except representing the graph structure of social networks, take the interactive relationship that the time is unit in can also presentation graphs structure between these summits, referred to as sequential relationship.After being abstracted into graph structure, there is essential difference with original graph structure.In original graph structure, between every two summits, only have a limit (situation of non-directed graph) or two limits (situation of digraph) to be connected, in graph structure after improvement, between every two summits, there are many limits to be connected, every limit represents between these two summits the interactive relationship with timestamp, as comment on photo, comment is had a talk about, reprint " praising " etc.
Existing data storage method is that to take the larger region of consistency be that benchmark is set the foundation of dividing memory block, this has just caused memory block can store the data of the graph structure that lower consistency is large, but for more sparse graph structure, although also can store, caused the waste of a large amount of storage spaces; On the other hand, inquiry for data, prior art is by all data (vertex data and the relation data of this piece graph structure, wherein relation data comprises direction and its timestamp on the limit that represents interactive relationship) be all stored in the memory block of getting, during data query, to travel through all data in memory partitioning, efficiency data query is reduced.
In social networks, everyone active degree and pouplarity are not quite similar.Well imagine, famous person's microblogging is more concerned compared with ordinary populace.So, represent that there is more polygon coupled, same reason on famous person's summit, also have more interactive relationship.So, can be understood as in social networks, famous person's circle is a dense graph structure, and ordinary people is a sparse graph structure.But famous person, with respect to ordinary people, is a very little colony after all.The special distribution (celebrity effect) that the present invention is directed to diagram data has designed a kind of memory mode of diagram data, takes full advantage of the distribution character of diagram data, can reach efficient storage and query performance.As shown in Figure 8, schematic diagram for a memory partitioning storage map block structure in the internal memory of a node in the embodiment of the present invention 2, the present invention is according to the cutting situation of graph structure piece, in internal memory, open up a slice memory headroom, and be divided into the memory partitioning of N piece fixed size, graph structure piece is stored in corresponding memory partitioning.Concrete storage means is: the memory block 101 that distributes a fixed size in memory partitioning for each summit, wherein the data on each summit comprise vertex data and relation data, relation data comprises limit data and time stamp data, the data on each summit are deposited in a memory block, when certain summit consistency is larger (its relation data comprising is more), the vertex data on this summit and a small amount of relation data are stored in the memory block of getting, open up in addition again several add-in memories pieces 102, and with add-in memories piece 102 described in pointed, and remaining relation data is stored in add-in memories piece 102.The present invention also sets up and take the time as the first key word in annex memory block 102, take the index that summit is the second key word, is convenient to searching of data.
As shown in Figure 9, for take the processing procedure schematic diagram that summit is processing unit in prior art, traditional figure computation model is all to take summit as processing unit, at each, take turns in iterative process, received the message 202 that neighbours' task sends, concrete processing procedure is exactly for each summit 201, travels through all message 202, find the message 202 of mating with this summit 201, then this message 202 is extracted.After all summits 201 all travel through, these message 202 could be deleted.So, taken a large amount of computational resources and storage resources.
Figure 10 is the processing procedure schematic diagram that message is processing unit of take of the present invention, in the present invention, thisly take programming model that message is processing unit and refer to and often carry out a piece of news 202, directly resolve this message 202, then find out the summit corresponding with this message 202 201 and calculate, can delete this message 202 simultaneously.Under this mode, calculating is only relevant to message 202 numbers, and all message numbers of unnecessary preservation simultaneously, from saving computing time and storage space to a great extent.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (10)

1. towards a diagram data disposal route for magnanimity time series data, it is characterized in that, comprise the steps:
Step 1: pre-service social network data, and take out with summit representative figure, with the graph structure of interactive relationship between the limit representative figure of the free stamp of some bands;
Step 2: according to celebrity effect, graph structure is cut into several graph structure pieces according to predetermined Euclidean distance, and gives graph structure piece and inner summit numbering thereof;
Step 3: graph structure piece is distributed to different nodes and process, the graph structure piece that each node is got according to memory organization mode imports in the relevant position of internal memory;
Step 4: user writes application job according to message based programming model, and application job is submitted to application job processing unit;
Step 5: application job processing unit obtains the data of required graph structure piece from internal memory, and carry out application job according to message based tupe, obtain operation result.
2. a kind of diagram data disposal route towards magnanimity time series data according to claim 1, is characterized in that, described step 1 pair social network data is carried out pretreated concrete steps and is:
Step 1.1: extract concrete personage from social network data, the summit of composition diagram structure;
Step 1.2: extract the interaction between personage and personage from social network data, a limit of each interactive composition diagram structure;
Step 1.3: arrange on every limit of graph structure and represent this limit timestamp of Time Created.
3. a kind of diagram data disposal route towards magnanimity time series data according to claim 1, it is characterized in that, step 2 is cut into several graph structure piece by graph structure according to Euclidean distance according to celebrity effect, and the specific implementation of numbering to graph structure piece and inner summit thereof is:
Step 2.1: the number on the limit being connected with each summit in statistical graph structure, the mean value of calculating chart structure consistency;
Step 2.2: set the Euclidean distance for cutting graph structure according to the mean value of graph structure consistency, and according to Euclidean distance, graph structure is carried out to cutting, obtain several graph structure pieces;
Step 2.3: the graph structure piece after cutting is numbered and obtains block number, and each summit in graph structure piece is numbered, the block number that is numbered on each summit adds summit numbering.
4. a kind of diagram data disposal route towards magnanimity time series data according to claim 1, is characterized in that, the concrete steps that the graph structure piece that in described step 3, each node is got according to memory organization mode imports in the relevant position of internal memory are as follows:
Step 3.1: open up in internal memory between a slice memory field, and will be divided into the memory partitioning of N fixed size between memory field, for storing N graph structure piece;
Step 3.2: distribute the memory block of a fixed size for each summit in each memory partitioning, for storing vertex data and the relation data on this summit;
Step 3.3: judge whether this summit has celebrity effect, if had, and described memory block can not be stored vertex data and all relation datas on this summit, the vertex data on this summit and part relations data are stored in memory block, and open up again the add-in memories piece of one or more fixed sizes, remaining relation data is stored in add-in memories piece, and with add-in memories piece described in pointed; Otherwise directly the vertex data on this summit and all relation datas are stored in correspondence memory piece;
Step 3.4: in annex memory block, set up and take the time as primary key, the index that the summit of take in graph structure is secondary key.
5. a kind of diagram data disposal route towards magnanimity time series data according to claim 1, it is characterized in that, in step 5, application job processing unit obtains desired data from internal memory, and carries out application job according to message based tupe, and the specific implementation that obtains operation result is:
Step 5.1: application job performance element is carried out application job, and application job comprises several tasks, graph structure piece of each task management;
Step 5.2: each task is obtained desired data by summit numbering from the graph structure piece of its management, processes the data in graph structure piece according to the processing logic of application job, generates some message, and message is sent to other tasks after finishing dealing with;
Step 5.3: the task of receipt message is resolved every a piece of news of arrival successively, extracts the object summit that this message should arrive;
Step 5.4: upgrade the value on object summit, and then delete this message;
Step 5.5: judge whether to also have untreated message, if had, return to step 5.3, otherwise finish.
6. towards a diagram data disposal system for magnanimity time series data, it is characterized in that, comprise accumulation layer, computation layer and client layer;
Described accumulation layer, it is for setting up the graph structure that can represent interactive relationship between personage, and according to celebrity effect, realizes the personalization storage of graph structure;
Described client layer, it is for writing the application job based on Message Processing, and submits to computation layer;
Described computation layer, it is for obtain the data of required graph structure piece from accumulation layer, and carries out application job according to message based tupe, obtains operation result.
7. a kind of diagram data disposal system towards magnanimity time series data according to claim 6, is characterized in that, described accumulation layer comprises data pretreatment unit, graph structure cutter unit, data importing unit and internal storage location;
Described data pretreatment unit, it is for social network data is carried out to pre-service, and takes out with summit representative figure, with the graph structure of interactive relationship between the limit representative figure of the free stamp of some bands;
Described graph structure cutter unit, it is for according to celebrity effect, graph structure being cut into several graph structure pieces according to predetermined Euclidean distance, and the summit numbering of giving graph structure piece and inside thereof;
Described data importing unit, it is processed for graph structure piece is distributed to different nodes, and the graph structure piece that each node is got according to memory organization mode imports in the relevant position of internal memory;
Described internal storage location, comprises the internal memory of several nodes, is respectively used to the graph structure blocks of data that storage imports.
8. a kind of diagram data disposal system towards magnanimity time series data according to claim 6, is characterized in that, described client layer comprises that application job writes unit and application job commit unit,
Described application job is write unit, and it is for writing the application job based on Message Processing, and application job is sent to application job commit unit;
Described application job commit unit, it is for submitting to application job the application job processing unit of computation layer.
9. a kind of diagram data disposal system towards magnanimity time series data according to claim 6, is characterized in that, described computation layer comprises application job processing unit, and it is for carrying out application job according to Message Processing pattern.
10. a kind of diagram data disposal system towards magnanimity time series data according to claim 6, is characterized in that, in described figure application job, comprise several tasks, each task is responsible for processing a graph structure piece.
CN201310559846.4A 2013-11-12 2013-11-12 A kind of diagram data processing method towards magnanimity time series data and system Active CN103593433B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310559846.4A CN103593433B (en) 2013-11-12 2013-11-12 A kind of diagram data processing method towards magnanimity time series data and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310559846.4A CN103593433B (en) 2013-11-12 2013-11-12 A kind of diagram data processing method towards magnanimity time series data and system

Publications (2)

Publication Number Publication Date
CN103593433A true CN103593433A (en) 2014-02-19
CN103593433B CN103593433B (en) 2016-11-02

Family

ID=50083574

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310559846.4A Active CN103593433B (en) 2013-11-12 2013-11-12 A kind of diagram data processing method towards magnanimity time series data and system

Country Status (1)

Country Link
CN (1) CN103593433B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970860A (en) * 2014-05-07 2014-08-06 华为技术有限公司 Method, device and system for processing data
CN106325756A (en) * 2015-06-15 2017-01-11 阿里巴巴集团控股有限公司 Data storage and data computation methods and devices
CN109582808A (en) * 2018-11-22 2019-04-05 北京锐安科技有限公司 A kind of user information querying method, device, terminal device and storage medium
CN111177188A (en) * 2019-12-30 2020-05-19 浙江邦盛科技有限公司 Rapid massive time sequence data processing method based on aggregation edge and time sequence aggregation edge
CN111241410A (en) * 2020-01-22 2020-06-05 深圳司南数据服务有限公司 Industry news recommendation method and terminal
CN111382319A (en) * 2020-03-18 2020-07-07 军事科学院系统工程研究院系统总体研究所 Map data representation and mapping method for knowledge graph
WO2021134318A1 (en) * 2019-12-30 2021-07-08 浙江邦盛科技有限公司 Rapid mass time-series data processing method based on aggregated edge and time-series aggregated edge
CN113722576A (en) * 2021-05-07 2021-11-30 北京达佳互联信息技术有限公司 Network security information processing method, query method and related device
CN114254164A (en) * 2022-03-01 2022-03-29 全球能源互联网研究院有限公司 Graph data storage method and device
WO2023083237A1 (en) * 2021-11-11 2023-05-19 支付宝(杭州)信息技术有限公司 Graph data management

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110252073A1 (en) * 2010-04-06 2011-10-13 Justone Database, Inc. Apparatus, systems and methods for data storage and/or retrieval based on a database model-agnostic, schema-agnostic and workload-agnostic data storage and access models
CN103336808A (en) * 2013-06-25 2013-10-02 中国科学院信息工程研究所 System and method for real-time graph data processing based on BSP (Board Support Package) model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110252073A1 (en) * 2010-04-06 2011-10-13 Justone Database, Inc. Apparatus, systems and methods for data storage and/or retrieval based on a database model-agnostic, schema-agnostic and workload-agnostic data storage and access models
CN103336808A (en) * 2013-06-25 2013-10-02 中国科学院信息工程研究所 System and method for real-time graph data processing based on BSP (Board Support Package) model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JOHN BRESLIN等: "The Future of Social Networks on the Internet: The Need for Semantics", 《IEEE INTERNET COMPUTING》 *
YANG YUEHUA等: "Study on SNS graph generation and prediction", 《INTERNATIONAL CONFERENCE ON CONTROL AUTOMATION AND SYSTEMS 2010 (ICCAS)》 *
于戈等: "云计算环境下的大规模图数据处理技术", 《计算机学报》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970860B (en) * 2014-05-07 2017-05-24 华为技术有限公司 Method, device and system for processing data
CN103970860A (en) * 2014-05-07 2014-08-06 华为技术有限公司 Method, device and system for processing data
CN106325756A (en) * 2015-06-15 2017-01-11 阿里巴巴集团控股有限公司 Data storage and data computation methods and devices
CN109582808A (en) * 2018-11-22 2019-04-05 北京锐安科技有限公司 A kind of user information querying method, device, terminal device and storage medium
WO2021134318A1 (en) * 2019-12-30 2021-07-08 浙江邦盛科技有限公司 Rapid mass time-series data processing method based on aggregated edge and time-series aggregated edge
CN111177188A (en) * 2019-12-30 2020-05-19 浙江邦盛科技有限公司 Rapid massive time sequence data processing method based on aggregation edge and time sequence aggregation edge
CN111241410A (en) * 2020-01-22 2020-06-05 深圳司南数据服务有限公司 Industry news recommendation method and terminal
CN111241410B (en) * 2020-01-22 2023-08-22 深圳司南数据服务有限公司 Industry news recommendation method and terminal
CN111382319A (en) * 2020-03-18 2020-07-07 军事科学院系统工程研究院系统总体研究所 Map data representation and mapping method for knowledge graph
CN111382319B (en) * 2020-03-18 2021-04-09 军事科学院系统工程研究院系统总体研究所 Map data representation and mapping method for knowledge graph
CN113722576A (en) * 2021-05-07 2021-11-30 北京达佳互联信息技术有限公司 Network security information processing method, query method and related device
WO2023083237A1 (en) * 2021-11-11 2023-05-19 支付宝(杭州)信息技术有限公司 Graph data management
CN114254164A (en) * 2022-03-01 2022-03-29 全球能源互联网研究院有限公司 Graph data storage method and device

Also Published As

Publication number Publication date
CN103593433B (en) 2016-11-02

Similar Documents

Publication Publication Date Title
CN103593433A (en) Graph data processing method and system for massive time series data
CN103336808B (en) A kind of real-time diagram data processing system and method based on BSP models
CN104063507B (en) A kind of figure computational methods and system
CN107515878B (en) Data index management method and device
CN107943952B (en) Method for realizing full-text retrieval based on Spark framework
Yan et al. Quegel: A general-purpose query-centric framework for querying big graphs
CN104090901B (en) A kind of method that data are processed, device and server
Wang et al. Research and implementation on spatial data storage and operation based on Hadoop platform
CN103258049A (en) Association rule mining method based on mass data
CN102915365A (en) Hadoop-based construction method for distributed search engine
Ngu et al. B+-tree construction on massive data with Hadoop
CN107515952A (en) The method and its system of cloud data storage, parallel computation and real-time retrieval
CN103345508A (en) Data storage method and system suitable for social network graph
CN103198097A (en) Massive geoscientific data parallel processing method based on distributed file system
Khumoyun et al. Spark based distributed deep learning framework for big data applications
CN111881326A (en) Graph data storage method, device and equipment and readable storage medium
CN103235811A (en) Data storage method and device
CN108170535A (en) A kind of method of the promotion table joint efficiency based on MapReduce model
CN107679127A (en) Point cloud information parallel extraction method and its system based on geographical position
CN103678490A (en) Deep Web query interface clustering method based on Hadoop platform
Chen et al. HiClus: Highly scalable density-based clustering with heterogeneous cloud
CN107257356B (en) Social user data optimal placement method based on hypergraph segmentation
CN108319604A (en) The associated optimization method of size table in a kind of hive
CN109254844B (en) Triangle calculation method of large-scale graph
Chen et al. Bipartite-oriented distributed graph partitioning for big learning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant