CN103593433B - A kind of diagram data processing method towards magnanimity time series data and system - Google Patents

A kind of diagram data processing method towards magnanimity time series data and system Download PDF

Info

Publication number
CN103593433B
CN103593433B CN201310559846.4A CN201310559846A CN103593433B CN 103593433 B CN103593433 B CN 103593433B CN 201310559846 A CN201310559846 A CN 201310559846A CN 103593433 B CN103593433 B CN 103593433B
Authority
CN
China
Prior art keywords
graph structure
data
summit
block
application job
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310559846.4A
Other languages
Chinese (zh)
Other versions
CN103593433A (en
Inventor
周薇
高赟
冉攀峰
韩冀中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201310559846.4A priority Critical patent/CN103593433B/en
Publication of CN103593433A publication Critical patent/CN103593433A/en
Application granted granted Critical
Publication of CN103593433B publication Critical patent/CN103593433B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of diagram data processing method towards magnanimity time series data and system, pretreatment social network data, and take out with summit representative figure, have the graph structure of interactive relationship between the limit representative figure of timestamp with some bands;This method for expressing can effectively indicate the social network relationships of interactive sequential relationship;According to celebrity effect, graph structure is cut into several graph structure blocks according to predetermined Euclidean distance, and gives the summit numbering of graph structure block and inside thereof;According to memory organization mode by the relevant position of graph structure block importing internal memory;This memory storage mode takes full advantage of the distribution character of diagram data, can reach to store efficiently and query performance;The present invention is in line with saving calculating time and the principle of memory headroom, improve the original programming model with summit as computing unit, but using the programming model with message as computing unit, this mode saves the calculating time to a great extent, also saves memory space.

Description

A kind of diagram data processing method towards magnanimity time series data and system
Technical field
The present invention relates to large-scale graph data process field, particularly relate to a kind of towards magnanimity time series data Diagram data processing method and system.
Background technology
Recently as the fast development of the Internet, social network-i i-platform also develops rapidly and popularizes.With figure Data represent the character relation in social networks, can excavate social networks character relation in conjunction with nomography In hiding information.So, along with the development of social networks, diagram data has welcome again upsurge.
But, along with gradually popularizing of social networks, what social network data the most exponentially increased becomes Gesture.So, cannot realize these large-scale graph data are divided by uniprocessor version diagram data processing routine Analysis processes.
Extensive along with the temperature of social networks and social network data, the process used at present these The scheme of large-scale data is these data of parallel processing on multiple stage machine.These parallel schemes are substantially All follow following thinking: first that huge diagram data is (bigger with consistency according to certain rule cutting Setting the foundation dividing memory block on the basis of region, it is big that this has resulted in consistency under memory block can store The data of graph structure, but for sparse graph structure, although also can store, but cause a large amount of The waste of memory space), it is cut into many parts, every part of diagram data is stored in wherein on a machine.When When starting to process, from the machine, load the diagram data of storage in the machine, then calculate, lead to the most again Cross the mechanism exchange results of intermediate calculations of message transmission, through successive ignition, thus obtain final calculating Result.
In the solution that existing diagram data processes, relatively representational solution has two kinds.One Individual is that Pregel, Pregel that Google proposed in 2009 use BSP (Bulk Synchronous Parallel, Integral synchronous parallel computational model) complete to calculate.Feature due to figure application, it is impossible to Once calculate and just obtain final result, need successive ignition to calculate.So just having between every twice iteration One hyposynchronous process, it is complete that this process synchronized refers to wait that all tasks all calculate, and then could Perform synchronization.This mode of Pregel is fairly simple, it is readily appreciated that, but big between every twice iteration Synchronizing process ratio is relatively time-consuming, and treatment effeciency is the highest.Another one Typical Representative is Microsoft Ttrinity system, this system is to announce out for 2012, and relative Pregel system has a following advantage: First, process this typical case's application for diagram data, make diagram data into storage from being stored in file system In distributed memory, accelerate the loading efficiency of diagram data;Secondly, apply for concrete figure, The asynchronous replacement of synchronizing process between every twice iteration, it is possible to reduce the overall performance synchronizing to bring is opened Pin.
But, diagram data processes and still suffers from following Railway Project:
In diagram data in addition to having interpersonal annexation, also have the interaction relevant to time series Relation.This interactive relationship is different from annexation, and a limit in annexation graph structure gets final product table Showing, but interactive relationship is directly related with the time, it represents has many according to time relationship between two summits Plant and represent and interactive relationship.But at present, large-scale graph data processing system the most only processed person to person it Between annexation, do not process interactive relationship different according to the time between men.
The distribution situation of social networks presents celebrity effect, and famous person is more closed than ordinary people in social networks Note.So famous person colony is part dense in graph structure, and the people of this circle is substantially all mutually Understanding.So in the graph structure representing social networks, dense parts Relatively centralized, remaining most Ordinary people is exactly the sparse part in this graph structure.So tie for this dense and sparse figure coexisted Which kind of diagram data storage structure, use can reach efficient characteristic with method for organizing?Existing diagram data In processing system, do not account for data characteristic and take unified storage strategy, do not customize with regard to nothing Method reaches high-performance, so, efficient diagram data tissue be also this field it is also contemplated that one great ask Topic.
All using diagram data summit as processing unit in diagram data process, when a collection of message of arrival, for Each summit of diagram data travels through all message, after diagram data summit is traveled through, and could be all The message arrived deletes, and is equivalent to a matching process.This matching process not only expends the time, right Each summit travels through all message, and to take a large amount of memory headroom and store these intermediary message until institute Summit is had all to travel through.
Summary of the invention
The technical problem to be solved is to provide a kind of towards at the diagram data of magnanimity time series data Reason method and system, solve to be beyond expression in existing diagram data treatment technology sequential interactive relationship, diagram data Storage do not fully take into account data distribution character, diagram data in units of summit processes programming model Calculating time and memory space also exist the problems such as significant wastage.
The technical scheme is that a kind of figure towards magnanimity time series data Data processing method, comprises the steps:
Step 1: pretreatment social network data, and take out with summit representative figure, with some bands There is the graph structure of interactive relationship between the limit representative figure of timestamp;
Step 2: graph structure is cut into several figures knot according to predetermined Euclidean distance according to celebrity effect Building block, and give the summit numbering of graph structure block and inside thereof;
Step 3: graph structure block being distributed to different nodes and processes, each node is according to memory group The graph structure block that the mode of knitting is got imports in the relevant position of internal memory;
Step 4: user writes application job according to message based programming model, and is carried by application job Give application job processing unit;
Step 5: application job processing unit obtains the data of required graph structure block from internal memory, and according to Message based tupe performs application job, obtains operation result.
The invention has the beneficial effects as follows: the present invention sets up a plurality of limit between two summits, each edge represents Interactive relationship between two summits, with representing that the timestamp of time is set up on this limit, the most just in each edge Being the interactive time occurred, this method for expressing can effectively indicate the social networks of interactive sequential relationship Relation;The special distribution (celebrity effect) that the present invention is directed to diagram data devises the internal memory of a kind of diagram data Storage mode, takes full advantage of the distribution character of diagram data, can reach to store efficiently and query performance; The present invention, in line with saving calculating time and the principle of memory headroom, improves original with summit as computing unit Programming model, but use programming model with message as computing unit, this mode is to a great extent On save the calculating time, also save memory space.
On the basis of technique scheme, the present invention can also do following improvement.
Further, described step 1 carries out pretreatment to social network data and concretely comprises the following steps:
Step 1.1: extract concrete personage, the summit of composition diagram structure from social network data;
Step 1.2: extract the interaction between personage and personage from social network data, the most interactive group Become a limit of graph structure;
Step 1.3: the timestamp representing that the time is set up on this limit is set in each edge of graph structure.
Further, graph structure is cut into several figures knot according to Euclidean distance according to celebrity effect by step 2 Building block, and being implemented as to the summit numbering of graph structure block and inside thereof:
Step 2.1: the bar number on the limit being connected with each summit in cartogram structure, calculates graph structure dense The meansigma methods of degree;
Step 2.2: set the Euclidean distance for cutting graph structure according to the meansigma methods of graph structure consistency, And according to Euclidean distance, graph structure is carried out cutting, obtain several graph structure blocks;
Step 2.3: the graph structure block after cutting is numbered and obtains block number, and to graph structure block In each summit be numbered, the numbered block number on each summit add summit numbering.
Further, the graph structure block that in described step 3, each node is got according to memory organization mode Import specifically comprising the following steps that in the relevant position of internal memory
Step 3.1: open up a piece of internal memory in internal memory interval, and internal memory interval is divided into N number of fixing big Little memory partitioning, is used for storing N number of graph structure block;
Step 3.2: be the memory block of one fixed size of each summit distribution in each memory partitioning, For storing vertex data and the relation data on this summit;
Step 3.3: judge whether this summit has celebrity effect, if it has, and described memory block not The vertex data on this summit and all relation datas can be stored, then the vertex data on this summit and part are closed Coefficient is according to being stored in memory block, and opens up the add-in memories block of one or more fixed sizes again, Remaining relation data is stored in add-in memories block, and points to described add-in memories block with pointer;No Then directly vertex data and all relation datas on this summit are stored in correspondence memory block;
Step 3.4: in add-in memories block, sets up with the time as major key, with the top in graph structure Point is the index of secondary key.
Further, in step 5, application job processing unit obtains desired data from internal memory, and according to base Tupe in message performs application job, obtains being implemented as of operation result:
Step 5.1: application job performance element performs application job, and application job comprises several tasks, One graph structure block of each task management;
Step 5.2: each task obtains desired data by summit numbering from the graph structure block that it manages, According to the data in the process logical process graph structure block of application job, generate some after having processed and disappear Breath, and transmit the message to other tasks;
Step 5.3: the task of receiving message resolves every a piece of news of arrival successively, extracts this message Purpose summit that should be to be arrived;
Step 5.4: update the value on purpose summit, and then delete this message;
Step 5.5: judge whether the most untreated message, if it has, return step 5.3, otherwise Terminate.
The technical scheme is that a kind of figure towards magnanimity time series data Data handling system, including accumulation layer, computation layer and client layer;
Described accumulation layer, it can represent the graph structure of interactive relationship between personage for setting up, and according to name People's effect realizes the personalized storage of graph structure;
Described client layer, it is for writing application job based on Message Processing, and submits to computation layer;
Described computation layer, it for obtaining the data of required graph structure block from accumulation layer, and according to based on disappearing The tupe of breath performs application job, obtains operation result.
On the basis of technique scheme, the present invention can also do following improvement.
Further, described accumulation layer includes that data pre-processing unit, graph structure cutter unit, data import Unit and internal storage location;
Described data pre-processing unit, it for carrying out pretreatment to social network data, and take out with Summit representative figure, has the graph structure of interactive relationship between the limit representative figure of timestamp with some bands;
Described graph structure cutter unit, its for according to celebrity effect by graph structure according to predetermined European away from From being cut into several graph structure blocks, and give the summit numbering of graph structure block and inside thereof;
Described data import unit, it processes, often for graph structure block is distributed to different nodes Individual node imports in the relevant position of internal memory according to the graph structure block that memory organization mode is got;
Described internal storage location, including the internal memory of several nodes, is respectively used to the graph structure block that storage imports Data.
Further, described client layer includes that application job writes unit and application job submits unit to,
Described application job writes unit, and it is for writing application job based on Message Processing, and should It is sent to application job with operation and submits unit to;
Described application job submits unit to, and it is for submitting to application job at the application job of computation layer Reason unit.
Further, described computation layer includes application job processing unit, and it is for according to Message Processing pattern Perform application job.
Further, comprising several tasks in described application job, each task is responsible for processing a figure knot Building block.
Accompanying drawing explanation
Fig. 1 is a kind of diagram data process flow figure towards magnanimity time series data of the present invention;
Fig. 2 be step 1 of the present invention implement flow chart;
Fig. 3 be step 2 of the present invention implement flow chart;
Fig. 4 be step 3 of the present invention implement flow chart;
Fig. 5 be step 5 of the present invention implement flow chart;
Fig. 6 is a kind of diagram data processing system structured flowchart towards magnanimity time series data of the present invention;
Fig. 7 is component structural representation in the middle part of embodiment 1 of the present invention;
Fig. 8 be 2 one nodes of embodiment of the present invention internal memory in the storage organization of a memory partitioning Schematic diagram;
Fig. 9 is the processing procedure schematic diagram in prior art with summit as processing unit;
Figure 10 is the processing procedure schematic diagram with message as processing unit of the present invention.
In accompanying drawing, the list of parts representated by each label is as follows:
1, accumulation layer, 2, client layer, 3, computation layer, 1-1, data pre-processing unit, 1-2, figure Structure cutter unit, 1-3, data import unit, 1-4, internal storage location, 2-1, application job are write Unit, 2-2, application job submit unit, 3-1, application job processing unit to;101, memory block, 102, add-in memories block, 201, summit, 202, message.
Detailed description of the invention
Being described principle and the feature of the present invention below in conjunction with accompanying drawing, example is served only for explaining this Invention, is not intended to limit the scope of the present invention.
As it is shown in figure 1, a kind of diagram data processing method towards magnanimity time series data, comprise the steps:
Step 1: pretreatment social network data, and take out with summit representative figure, with some bands There is the graph structure of interactive relationship between the limit representative figure of timestamp;
Step 2: graph structure is cut into several figures knot according to predetermined Euclidean distance according to celebrity effect Building block, and give the summit numbering of graph structure block and inside thereof;
Step 3: graph structure block being distributed to different nodes and processes, each node is according to memory group The graph structure block that the mode of knitting is got imports in the relevant position of internal memory;
Step 4: user writes application job according to message based programming model, and is carried by application job Give application job processing unit;
Step 5: application job processing unit obtains the data of required graph structure block from internal memory, and according to Message based tupe performs application job, obtains operation result.
Concretely comprise the following steps as in figure 2 it is shown, described step 1 carries out pretreatment to social network data:
Step 1.1: extract concrete personage, the summit of composition diagram structure from social network data;
Step 1.2: extract the interaction between personage and personage from social network data, the most interactive group Become a limit of graph structure;
Step 1.3: the timestamp representing that the time is set up on this limit is set in each edge of graph structure.
As it is shown on figure 3, graph structure is cut into several according to Euclidean distance according to celebrity effect by step 2 Graph structure block, and being implemented as to the summit numbering of graph structure block and inside thereof:
Step 2.1: the bar number on the limit being connected with each summit in cartogram structure, calculates graph structure dense The meansigma methods of degree;
Step 2.2: set the Euclidean distance for cutting graph structure according to the meansigma methods of graph structure consistency, And according to Euclidean distance, graph structure is carried out cutting, obtain several graph structure blocks;
Step 2.3: the graph structure block after cutting is numbered and obtains block number, and to graph structure block In each summit be numbered, the numbered block number on each summit add summit numbering.
As shown in Figure 4, the figure knot that in described step 3, each node is got according to memory organization mode Building block imports specifically comprising the following steps that in the relevant position of internal memory
Step 3.1: open up a piece of internal memory in internal memory interval, and internal memory interval is divided into N number of fixing big Little memory partitioning, is used for storing N number of graph structure block;
Step 3.2: be the memory block of one fixed size of each summit distribution in each memory partitioning, For storing vertex data and the relation data on this summit;
Step 3.3: judge whether this summit has celebrity effect, if it has, and described memory block not The vertex data on this summit and all relation datas can be stored, then the vertex data on this summit and part are closed Coefficient is according to being stored in memory block, and opens up the add-in memories block of one or more fixed sizes again, Remaining relation data is stored in add-in memories block, and points to described add-in memories block with pointer;No Then directly vertex data and all relation datas on this summit are stored in correspondence memory block;
Step 3.4: in add-in memories block, sets up with the time as major key, with the top in graph structure Point is the index of secondary key.
As it is shown in figure 5, application job processing unit obtains desired data from internal memory in step 5, and press Perform application job according to message based tupe, obtain being implemented as of operation result:
Step 5.1: application job performance element performs application job, and application job comprises several tasks, One graph structure block of each task management;
Step 5.2: each task obtains desired data by summit numbering from the graph structure block that it manages, According to the data in the process logical process graph structure block of application job, generate some after having processed and disappear Breath, and transmit the message to other tasks;
Step 5.3: the task of receiving message resolves every a piece of news of arrival successively, extracts this message Purpose summit that should be to be arrived;
Step 5.4: update the value on purpose summit, and then delete this message;
Step 5.5: judge whether the most untreated message, if it has, return step 5.3, otherwise Terminate.
As shown in Figure 6: a kind of diagram data processing system towards magnanimity time series data, including accumulation layer 1, Client layer 2 and computation layer 3;
Described accumulation layer 1, it can represent the graph structure of interactive relationship between personage for setting up, and according to Celebrity effect realizes the personalized storage of graph structure;
Described client layer 2, it is for writing application job based on Message Processing, and submits to computation layer 3;
Described computation layer 3, it for obtaining the data of required graph structure block from accumulation layer 1, and according to base Tupe in message performs application job, obtains operation result.
Wherein, described accumulation layer 1 include data pre-processing unit 1-1, graph structure cutter unit 1-2, Data import unit 1-3 and internal storage location 1-4;
Described data pre-processing unit 1-1, it is for carrying out pretreatment to social network data and abstract Go out with summit representative figure, have with some bands the figure of interactive relationship between the limit representative figure of timestamp to tie Structure;
Described graph structure cutter unit 1-2, its for according to celebrity effect by graph structure according to predetermined Europe Formula distance is cut into several graph structure blocks, and gives the summit numbering of graph structure block and inside thereof;
Described data import unit 1-3, it processes for graph structure block is distributed to different nodes, Each node imports in the relevant position of internal memory according to the graph structure block that memory organization mode is got;
Described internal storage location 1-4, including the internal memory of several nodes, is respectively used to the figure knot that storage imports Building block data.
Wherein, described client layer 2 includes that application job writes unit 2-1 and application job submits unit to 2-2,
Described application job writes unit 2-1, and it is for writing application job based on Message Processing, and Application job is sent to application job and submits unit 2-2 to;
Described application job submits unit 2-2 to, and it for submitting to should being used as of computation layer by application job Industry processing unit 3-1.
Wherein, described computation layer 3 includes application job processing unit 3-1, and it is for according to Message Processing Pattern performs application job.
Wherein, comprising several tasks in described application job, each task is responsible for processing one or one More than number adjacent graph structure.
In diagram data in addition to having interpersonal annexation, also has interpersonal and time series Relevant interactive relationship.This interactive relationship is different from annexation, in annexation graph structure Bar limit can represent, but interactive relationship is directly related with the time, when it represents foundation between two summits Between relation have multiple expression and interactive relationship.But at present, large-scale graph data processing system the most only processes Interpersonal annexation, does not process interactive passes different according to the time between men System.
The present invention devises a kind of with summit representative figure, mutual with between the limit representative figure with timestamp The graph structure of dynamic relation, this method for expressing sets up a plurality of limit between two summits, and Fig. 7 is that the present invention is real Executing in example 1, can represent the part figure structure schematic representation of interactive relationship between personage, the value in each edge is for being somebody's turn to do Bar limit set up the time, namely the interactive time occurred, this method for expressing can effectively indicate mutual The social network relationships of dynamic sequential relationship.The multi version of data is tied except the figure that can represent social networks Structure, moreover it is possible to represent in graph structure the interactive relationship in units of the time, referred to as sequential between these summits Relation.After being abstracted into graph structure, with the difference that original graph structure has essence.In original graph structure A limit (situation of non-directed graph) or two limit (situation of directed graph) phases are only had between each two summit Even, having a plurality of limit to be connected in the graph structure after improvement between each two summit, each edge represents the two top The interactive relationship of band timestamp between point, such as comment photo, comment is had a talk about, and reprints, " praising " etc..
Existing data storage method is to set to divide memory block on the basis of the region that consistency is bigger Foundation, this has resulted in memory block can store the data of the big graph structure of lower consistency, but for sparse Graph structure for, although also can store, but cause the waste of a large amount of memory space;On the other hand, For the inquiry of data, prior art is by all data (vertex data and the pass coefficient of this block graph structure According to, wherein relation data includes the direction representing the limit of interactive relationship and its timestamp) it is stored in getting Memory block in, during data query, all data in memory partitioning to be traveled through, make efficiency data query reduce.
In social networks, everyone active degree and pouplarity are not quite similar.Well imagine, The microblogging of famous person is more concerned compared with for ordinary populace.So, the summit representing famous person has more polygon Coupled, same reason, also has more interactive relationship.So it can be understood as in social activity In network, famous person's circle is a dense graph structure, and ordinary people is a sparse graph structure.But It is that famous person, for ordinary people, is a colony the least after all.The present invention is directed to diagram data Special distribution (celebrity effect) devises the memory storage mode of a kind of diagram data, takes full advantage of figure number According to distribution character, can reach to store efficiently and query performance.As shown in Figure 8, implement for the present invention The schematic diagram of memory partitioning storage graph structure block in the internal memory of a node in example 2, the present invention according to The cutting situation of graph structure block, opens up a piece of memory headroom in internal memory, and is divided into N block fixed size Memory partitioning, stores graph structure block in corresponding memory partitioning.Concrete storage method is: at internal memory Subregion is the memory block 101 of one fixed size of each summit distribution, the packet on the most each summit Including vertex data and relation data, relation data includes limit data and time stamp data, by each summit Data are stored in a memory block, and when certain summit consistency is bigger, (its relation data comprised is relatively Many), i.e. vertex data and a small amount of relation data on this summit is stored in the memory block got, still further Open up several add-in memories blocks 102, and point to described add-in memories block 102 with pointer, and will residue Relation data be stored in add-in memories block 102.The present invention also set up in add-in memories block 102 with Time is the first keyword, with the index that summit is the second keyword, it is simple to the lookup of data.
As it is shown in figure 9, be processing procedure schematic diagram with summit as processing unit in prior art, tradition Figure computation model be all with summit as processing unit, take turns in iterative process each, have received neighbours The message 202 that task sends, concrete processing procedure is aiming at each summit 201, travels through institute There is message 202, find the message 202 mated with this summit 201, then this message 202 is extracted Come.After all summits 201 all travel through, these message 202 could be deleted.So, occupy Substantial amounts of calculating resource and storage resource.
Figure 10 is the processing procedure schematic diagram with message as processing unit of the present invention, in the present invention, This programming model with message as processing unit refers to often carry out a piece of news 202, directly resolves this and disappears Breath 202, then finds out the summit 201 corresponding with this message 202 and calculates, and can delete this simultaneously Message 202.Under this mode, calculating the most relevant to message 202 number, the most unnecessary preservation is owned Message bar number, from largely saving calculating time and memory space.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all in the present invention Spirit and principle within, any modification, equivalent substitution and improvement etc. made, should be included in this Within bright protection domain.

Claims (9)

1. the diagram data processing method towards magnanimity time series data, it is characterised in that include as follows Step:
Step 1: pretreatment social network data, and take out with summit representative figure, with some bands There is the graph structure of interactive relationship between the limit representative figure of timestamp;
Step 2: graph structure is cut into several figures knot according to predetermined Euclidean distance according to celebrity effect Building block, and give the summit numbering of graph structure block and inside thereof;
Step 3: graph structure block being distributed to different nodes and processes, each node is according to memory group The graph structure block that the mode of knitting is got imports in the relevant position of internal memory;
Step 4: user writes application job according to message based programming model, and is carried by application job Give application job processing unit;
Step 5: application job processing unit obtains the data of required graph structure block from internal memory, and according to Message based tupe performs application job, obtains operation result.
A kind of diagram data processing method towards magnanimity time series data, its Being characterised by, described step 1 carries out pretreatment to social network data and concretely comprises the following steps:
Step 1.1: extract concrete personage, the summit of composition diagram structure from social network data;
Step 1.2: extract the interaction between personage and personage from social network data, the most interactive group Become a limit of graph structure;
Step 1.3: the timestamp representing that the time is set up on this limit is set in each edge of graph structure.
A kind of diagram data processing method towards magnanimity time series data, its Being characterised by, graph structure is cut into several graph structures according to Euclidean distance according to celebrity effect by step 2 Block, and being implemented as to the summit numbering of graph structure block and inside thereof:
Step 2.1: the bar number on the limit being connected with each summit in cartogram structure, calculates graph structure dense The meansigma methods of degree;
Step 2.2: set the Euclidean distance for cutting graph structure according to the meansigma methods of graph structure consistency, And according to Euclidean distance, graph structure is carried out cutting, obtain several graph structure blocks;
Step 2.3: the graph structure block after cutting is numbered and obtains block number, and to graph structure block In each summit be numbered, the numbered block number on each summit add summit numbering.
A kind of diagram data processing method towards magnanimity time series data, its Being characterised by, the graph structure block that in described step 3, each node is got according to memory organization mode is led Enter specifically comprising the following steps that in the relevant position of internal memory
Step 3.1: open up a piece of internal memory in internal memory interval, and internal memory interval is divided into N number of fixing big Little memory partitioning, is used for storing N number of graph structure block;
Step 3.2: be the memory block of one fixed size of each summit distribution in each memory partitioning, For storing vertex data and the relation data on this summit;
Step 3.3: judge whether this summit has celebrity effect, if it has, and described memory block not The vertex data on this summit and all relation datas can be stored, then the vertex data on this summit and part are closed Coefficient is according to being stored in memory block, and opens up the add-in memories block of one or more fixed sizes again, Remaining relation data is stored in add-in memories block, and points to described add-in memories block with pointer;No Then directly vertex data and all relation datas on this summit are stored in correspondence memory block;
Step 3.4: in add-in memories block, sets up with the time as major key, with the top in graph structure Point is the index of secondary key.
A kind of diagram data processing method towards magnanimity time series data, its Being characterised by, in step 5, application job processing unit obtains desired data from internal memory, and according to based on The tupe of message performs application job, obtains being implemented as of operation result:
Step 5.1: application job performance element performs application job, and application job comprises several tasks, One graph structure block of each task management;
Step 5.2: each task obtains desired data by summit numbering from the graph structure block that it manages, According to the data in the process logical process graph structure block of application job, generate some after having processed and disappear Breath, and transmit the message to other tasks;
Step 5.3: the task of receiving message resolves every a piece of news of arrival successively, extracts this message Purpose summit that should be to be arrived;
Step 5.4: update the value on purpose summit, and then delete this message;
Step 5.5: judge whether the most untreated message, if it has, return step 5.3, otherwise Terminate.
6. the diagram data processing system towards magnanimity time series data, it is characterised in that include storage Layer, computation layer and client layer;
Described accumulation layer, it can represent the graph structure of interactive relationship between personage for setting up, and according to name People's effect realizes the personalized storage of graph structure;
Described client layer, it is for writing application job based on Message Processing, and submits to computation layer;
Described computation layer, it for obtaining the data of required graph structure block from accumulation layer, and according to based on disappearing The tupe of breath performs application job, obtains operation result;
Described accumulation layer includes data pre-processing unit, graph structure cutter unit, data import unit and interior Memory cell;
Described data pre-processing unit, it for carrying out pretreatment to social network data, and take out with Summit representative figure, has the graph structure of interactive relationship between the limit representative figure of timestamp with some bands;
Described graph structure cutter unit, its for according to celebrity effect by graph structure according to predetermined European away from From being cut into several graph structure blocks, and give the summit numbering of graph structure block and inside thereof;
Described data import unit, it processes, often for graph structure block is distributed to different nodes Individual node imports in the relevant position of internal memory according to the graph structure block that memory organization mode is got;
Described internal storage location, including the internal memory of several nodes, is respectively used to the graph structure block that storage imports Data.
A kind of diagram data processing system towards magnanimity time series data, its Being characterised by, described client layer includes that application job writes unit and application job submits unit to,
Described application job writes unit, and it is for writing application job based on Message Processing, and should It is sent to application job with operation and submits unit to;
Described application job submits unit to, and it is for submitting to application job at the application job of computation layer Reason unit.
A kind of diagram data processing system towards magnanimity time series data, its Being characterised by, described computation layer includes application job processing unit, and it is for holding according to Message Processing pattern Row application job.
A kind of diagram data processing system towards magnanimity time series data, its Being characterised by, comprise several tasks in described application job, each task is responsible for processing a graph structure Block.
CN201310559846.4A 2013-11-12 2013-11-12 A kind of diagram data processing method towards magnanimity time series data and system Active CN103593433B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310559846.4A CN103593433B (en) 2013-11-12 2013-11-12 A kind of diagram data processing method towards magnanimity time series data and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310559846.4A CN103593433B (en) 2013-11-12 2013-11-12 A kind of diagram data processing method towards magnanimity time series data and system

Publications (2)

Publication Number Publication Date
CN103593433A CN103593433A (en) 2014-02-19
CN103593433B true CN103593433B (en) 2016-11-02

Family

ID=50083574

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310559846.4A Active CN103593433B (en) 2013-11-12 2013-11-12 A kind of diagram data processing method towards magnanimity time series data and system

Country Status (1)

Country Link
CN (1) CN103593433B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970860B (en) * 2014-05-07 2017-05-24 华为技术有限公司 Method, device and system for processing data
CN106325756B (en) * 2015-06-15 2020-04-24 阿里巴巴集团控股有限公司 Data storage method, data calculation method and equipment
CN109582808A (en) * 2018-11-22 2019-04-05 北京锐安科技有限公司 A kind of user information querying method, device, terminal device and storage medium
WO2021134318A1 (en) * 2019-12-30 2021-07-08 浙江邦盛科技有限公司 Rapid mass time-series data processing method based on aggregated edge and time-series aggregated edge
CN111177188A (en) * 2019-12-30 2020-05-19 浙江邦盛科技有限公司 Rapid massive time sequence data processing method based on aggregation edge and time sequence aggregation edge
CN111241410B (en) * 2020-01-22 2023-08-22 深圳司南数据服务有限公司 Industry news recommendation method and terminal
CN111382319B (en) * 2020-03-18 2021-04-09 军事科学院系统工程研究院系统总体研究所 Map data representation and mapping method for knowledge graph
CN113722576A (en) * 2021-05-07 2021-11-30 北京达佳互联信息技术有限公司 Network security information processing method, query method and related device
CN113779286B (en) * 2021-11-11 2022-02-08 支付宝(杭州)信息技术有限公司 Method and device for managing graph data
CN114254164B (en) * 2022-03-01 2022-06-28 全球能源互联网研究院有限公司 Graph data storage method and device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103336808A (en) * 2013-06-25 2013-10-02 中国科学院信息工程研究所 System and method for real-time graph data processing based on BSP (Board Support Package) model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2556426A4 (en) * 2010-04-06 2016-10-19 Justone Database Inc Data storage and/or retrieval based on a database model-agnostic, schema-agnostic and workload-agnostic data strorage and access models

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103336808A (en) * 2013-06-25 2013-10-02 中国科学院信息工程研究所 System and method for real-time graph data processing based on BSP (Board Support Package) model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Study on SNS graph generation and prediction;yang yuehua等;《International Conference on Control Automation and Systems 2010 (ICCAS)》;20101030;第1188-1191页 *
The Future of Social Networks on the Internet: The Need for Semantics;John Breslin等;《IEEE Internet Computing》;20071230;第11卷(第6期);第86-90页 *
云计算环境下的大规模图数据处理技术;于戈等;《计算机学报》;20110610;第34卷(第10期);第1753-1767页 *

Also Published As

Publication number Publication date
CN103593433A (en) 2014-02-19

Similar Documents

Publication Publication Date Title
CN103593433B (en) A kind of diagram data processing method towards magnanimity time series data and system
CN103336808B (en) A kind of real-time diagram data processing system and method based on BSP models
CN110990638B (en) Large-scale data query acceleration device and method based on FPGA-CPU heterogeneous environment
CN102663058B (en) URL duplication removing method in distributed network crawler system
CN104881466B (en) The processing of data fragmentation and the delet method of garbage files and device
CN105843933B (en) The index establishing method of distributed memory columnar database
CN104063507A (en) Graph computation method and engine
CN104572983B (en) Construction method, String searching method and the related device of hash table based on internal memory
CN104346458B (en) Date storage method and storage device
CN103324763B (en) Presenting method for tree-form data structure of mobile phone terminal
CN102163226A (en) Adjacent sorting repetition-reducing method based on Map-Reduce and segmentation
CN104317789A (en) Method for building passenger social network
CN106909554B (en) Method and device for loading database text table data
CN104834700A (en) Method for capturing movement data increment based on track change
CN105608142A (en) Storage method and device of Json data
CN103701469A (en) Compression and storage method for large-scale image data
CN106897458A (en) A kind of storage and search method towards electromechanical equipment data
CN101290619A (en) Content based Tibetan website tangka image search engine intelligent robot search method
CN105184321B (en) Data processing method and device for ftrl model
CN109636709A (en) A kind of figure calculation method suitable for heterogeneous platform
CN106776810A (en) The data handling system and method for a kind of big data
CN108319604A (en) The associated optimization method of size table in a kind of hive
CN107679126A (en) Laser three-D cloud data stores and management method and its system
Zhang et al. GraphA: Efficient partitioning and storage for distributed graph computation
CN104376054B (en) A kind of processing method and processing device of persisted instances object

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant