CN106354729A - Graph data handling method, device and system - Google Patents

Graph data handling method, device and system Download PDF

Info

Publication number
CN106354729A
CN106354729A CN201510419390.0A CN201510419390A CN106354729A CN 106354729 A CN106354729 A CN 106354729A CN 201510419390 A CN201510419390 A CN 201510419390A CN 106354729 A CN106354729 A CN 106354729A
Authority
CN
China
Prior art keywords
task
cis
read
analysis
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510419390.0A
Other languages
Chinese (zh)
Other versions
CN106354729B (en
Inventor
葛朋旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201510419390.0A priority Critical patent/CN106354729B/en
Publication of CN106354729A publication Critical patent/CN106354729A/en
Application granted granted Critical
Publication of CN106354729B publication Critical patent/CN106354729B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/432Query formulation
    • G06F16/434Query formulation using image data, e.g. images, photos, pictures taken by a user

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application provides a graph data handling method, device and method. The method includes: according to the requested types waiting to be handled, writing the to-be-handled requests into a graph updating task line or graph analyzing task line; the to-be-handled request types include a graph updating request and a graph analyzing request; according to the first characteristic of each of the tasks in the graph updating task line and graph analyzing task line to ascertain the operating order of each task; according to the operating order to run each task. Adopting the plan in present application can receive respectively graph updating requests and graph analyzing requests, placing them into the graph updating task line and the graph analyzing task line, and managing each of the tasks, ascertaining their operating order, thereby utilizing a whole system of graph updating and graph analyzing and handling, resolving the current situation of using separate scenarios for graph updating and graph analyzing and handling.

Description

A kind of diagram data processing method, device and system
Technical field
The application is related to figure processing technology field, more particularly, to a kind of diagram data processing method, device and system.
Background technology
Have much for the product of figure calculating and solution in industry at present, but the overwhelming majority all rests on Analysis to static diagram data, or the renewal of single diagram data and process aspect;Lack a kind of complete figure Real-time update and real-time analysis solution.
In traditional database field: oltp (on-line transaction processing, online business Reason) and olap (on-line analytical processing, on-line analytical processing) typically separate, Because the producing ratio of data is relatively slow, the analysis for data needs to spend more resource, all to data Processing delay aspect be often merely able to wait more than one day output analysis result.And data model itself Relational model can pass through relational Database Design thinking, in the dependent form of analysis phase ability focused data.
And in diagram data field, inherently for the dependence model of processing data, that is, data is exactly innately to deposit In strong relation;G=(v, e), a graph (figure) comprises vertex (summit) and edge (side) two Class basic model, between summit by side linking together as relation physics.And it is directed to the reality of diagram data Shi Gengxin and real-time analysis, it is desirable to data updates just produces shadow to dependence at once under the scene of business Ring, and the impact producing can trigger corresponding operational analysis operation at once;Thus extend figure real-time update, The business demand of analysis in real time.
To this: the real-time update of figure is occurring it is desirable to have related entity (summit) in natural figure field During relation, can timely finish relation renewal operation;Then the scene according to business, can be quickly complete Become the analyzing and processing task of figure.
Most chart database systems in industry, figure storage and figure Computational frame at present, is all to use for reference mostly The Computational frame of mapreduce or bsp and in these distributed field system of similar gfs or hdfs On system, build a set of non real-time map analysis system;The application scenarios supported are very single, and data dependence is every The full dose data of day, starts multiple tasks (job) parallel parsing, analysing content is often more than hour level Time delay.
Other chart database system, more design concepts simply solving traditional relational, simply Support figure characteristic and although can support than faster figure update, the characteristic of similar oltp, but pin Substantially there is no corresponding characteristic to the characteristic of olap it is more difficult to provide, by two class application features, the solution merged Scheme.
The shortcoming of prior art:
Because relational database has developed a lot of years, it is substantially deep-rooted for a lot of application scenarios;Lead Cause the application scenarios of oltp and olap often detached, thus leading to a lot of technological frames direct Combine both application.And the rise of nosql (data base of non-relational) pattern in recent years, New chart database is attempted breaking this situation, but belongs to new field due on this field, does not still have one Individual ripe technological frame being capable of perfect compatible two kinds of application scenarios.
Content of the invention
The embodiment of the present application proposes a kind of diagram data processing method, device and system, simultaneous using set of system Appearance schemes to update and map analysis is processed, the technical problem that both solutions cannot be compatible.
In one aspect, the embodiment of the present application provides a kind of diagram data processing method, comprising:
According to the type of pending request, described pending request write figure is updated task queue or map analysis Task queue, the type of described pending request includes figure and updates request and map analysis request;
Institute is determined according to the first characteristic that described figure updates each task in task queue and map analysis task queue State the operation order of each task;
Described each task is run according to described operation order.
In yet another aspect, the embodiment of the present application provides a kind of diagram data processing meanss, comprising:
Update task queue, for write figure more new task;
Map analysis task queue, for writing map analysis task;
Scheduler, for updating first of each task in task queue and map analysis task queue according to described figure Characteristic determines the operation order of described each task, is currently treated that operation task is assigned to corresponding computing resource fortune OK.
Another further aspect, the embodiment of the present application provides a kind of diagram data processing system, comprising:
Service interface layer, including more new interface and analysis interface, described more new interface is used for receiving data and updates Task write updates task queue;Described analysis interface is used for receiving data analysis task and writes analysis task team Row;
Task scheduling layer, draws including figure renewal task queue, map analysis task queue, scheduler, figure calculating Hold up, figure storage engines, wherein:
Figure updates task queue, for write figure more new task;
Map analysis task queue, for writing map analysis task;
Scheduler, for updating first of each task in task queue and map analysis task queue according to described figure Characteristic determines the operation order of described each task, is currently treated that operation task is assigned to corresponding computing resource fortune OK;
Figure computing engines, the figure for carrying out task updates operation and/or map analysis operation;
Figure storage engines, for storage figure.
Have the beneficial effect that:
The embodiment of the present application proposes a kind of diagram data processing method, device and system, can receive figure respectively Update request and map analysis request, they are put into figure and updates task queue and map analysis task queue, and will Each task is managed, determine operation order for it such that it is able to updated using the compatible figure of set of system and Map analysis is processed, and solves current figure renewal and map analysis processes the detached situation of application scenarios, enable to No longer there is the daily full dose data of data dependence, the analysing content often time more than hour level in map analysis Situation about postponing.
Brief description
The specific embodiment of the application is described below with reference to accompanying drawings, wherein:
Fig. 1 shows the schematic flow sheet of diagram data processing method in the embodiment of the present application;
Fig. 2 shows the schematic flow sheet of the diagram data processing method in embodiment one;
Fig. 3 shows the schematic flow sheet of the diagram data processing method in embodiment two;
Fig. 4 shows that the inside of two interfaces in embodiment two is abstract and realizes decomposing schematic representation;
Fig. 5 shows schematic flow sheet when carrying out figure storage in embodiment three;
Fig. 6 shows the structural representation of diagram data processing meanss in the embodiment of the present application;
Fig. 7 shows the structural representation of the diagram data processing meanss of an example in the embodiment of the present application;
Fig. 8 shows the structural representation of the diagram data processing meanss of an example in the embodiment of the present application;
Fig. 9 shows the structural representation of diagram data processing system in the embodiment of the present application;
Figure 10 shows the structural representation of the diagram data processing system of an example in the embodiment of the present application.
Specific embodiment
In order that the technical scheme of the application and advantage become more apparent, below in conjunction with accompanying drawing to the application's Exemplary embodiment is described in more detail it is clear that described embodiment is only the one of the application Section Example, rather than the exhaustion of all embodiments.And in the case of not conflicting, in this explanation Feature in embodiment and embodiment can be combined with each other.
Inventor finds, very big in the data storage to figure business at present and the demand that calculates, such as net purchase platform Transaction, do shopping, transfer accounts, per second all more than ten thousand grades;Daily number has exceeded hundred million records.Data Write in real time, renewal are very frequent, after data write, need quickly to update in diagram data model.Base Business scenario in figure: transaction risk identification, accurate recommendation service it is desirable to can quickly to increment figure Data carries out complete analysis calculating, exports result of calculation, and the renewal of figure and write quickly update depositing of figure In storage engine, the analytical calculation of figure can cover up-to-date data as far as possible, analyzes relied on data Snapshot postpones with tolerance second level between up-to-date data renewal.Based on the consideration to these actual demands, this Shen Embodiment please propose a kind of diagram data processing method, device and system, be illustrated below.
Figure updates and refers to that the service application of outside sends instruction and updates the vertex attribute of in figure, increases new top Put or set up the direct side of new summit a to summit b, the attribute on modification side etc..
Map analysis refers to, under the analysis instruction of business, specific subgraph, full figure are analyzed calculating, point Analysis process by graph traversal, statistics, filter certain vertex, the attribute on side the read-only generic operation of inquiry.
Fig. 1 shows the diagram data processing method in the embodiment of the present application, as shown in the figure, comprising:
Step 101, according to the type of pending request, by pending request write figure update task queue or Map analysis task queue, the type of pending request includes figure and updates request and map analysis request;
Step 102, the first characteristic updating each task in task queue and map analysis task queue according to figure is true The operation order of fixed each task;
Step 103, runs each task according to operation order.
Beneficial effect: the figure in the embodiment of the present application updates and map analysis processing method can receive figure more respectively New request and map analysis request, they are put into figure and update task queue and map analysis task queue, and will be each Individual task is managed, and determines operation order for it such that it is able to updating using the compatible figure of set of system and scheming Analyzing and processing, solves current figure renewal and map analysis processes the detached situation of application scenarios, enables to figure No longer there is the daily full dose data of data dependence in analysis, often the time more than hour level prolongs analysing content Slow situation.
Further, in order to lift treatment effeciency, can also implement in the following manner.
In enforcement, after determining the operation order of each task, determine that the first cis-position is appointed according to the state of Read-Write Locks Whether business is currently to treat operation task;
The state of Read-Write Locks is modified in task run take, the quilt in task end of run or time-out It is revised as vacant.
Determine whether the first cis-position task is currently to treat that the method for operation task is permissible according to the state of Read-Write Locks Including following any one or combination:
When the state of Read-Write Locks is vacant, determine that the first cis-position task is currently to treat operation task;
When the state of Read-Write Locks is to take, if current operation task is pure interpreting blueprints analysis task, determine the One cis-position task is currently to treat operation task;
When the state of Read-Write Locks is to take, if current operation task is non-pure interpreting blueprints analysis task or schemes more New task, suspends the first cis-position task, treats next cycle, rejudge the state of Read-Write Locks.
Beneficial effect:
In enforcement, increase Read-Write Locks, the state according to Read-Write Locks is determining whether the first cis-position task is current Treat operation task.As such, it is possible to when the state of Read-Write Locks is to take, if current operation task is pure reading Map analysis task, still determines that the first cis-position task is currently to treat operation task, so that task is simultaneously OK, lift treatment effeciency.
Additionally, it may also be determined that after the operation order of each task, whether judging the first cis-position task in implementing It is the map analysis task that time and/or resource consumption are more than with given threshold, if so, then by the first cis-position task It is split as multiple tasks, multiple tasks are run at interval, treat multiple tasks end of run, merge map analysis result, Complete the first cis-position task.
Can also be after operation task, whether monitoring task runs time-out, and if so, suspended task, treats down A cycle, restarts task.
Beneficial effect: by above two mode, map analysis task is split, and monitoring task is No operation time-out, is to enter to wait in time-out, and a task can be avoided to occupy long time and/or too many money Source, so that the carrying out of task is more reasonable, is that time and/or resource are disappeared particularly in map analysis task When consuming larger, can ensure that figure more new task carries out more efficiently.
Further, after service chart more new task, internal memory mapping object can be simultaneously stored in caching Area and disk;
In service chart analysis task, obtain data from buffer area;
If the data that map analysis task is related in buffer area, does not obtain data from disk.
Because data in magnetic disk is cold data, obtain data more efficiently from buffer area, and the data of buffer area is The data of recent renewal, more can reflect nearest figure update status, and certain application scenes is using caching During area's data, efficiency can be greatly improved.
Further, may include that when carrying out figure storage
Data characteristicses according to figure determine that figure is sparse graph or dense graph;
Calculating feature according to figure determines that figure is based on summit or based on side;
Data characteristicses according to figure and calculate feature and determine the partitioning algorithm of figure, carry out segmentation to the data of figure and deposit Storage.
Because the data characteristicses according to figure and calculating feature determine the partitioning algorithm of figure, so that the figure adopting Partitioning algorithm more reasonable, strengthen the reasonability of data storage so that whole scheme more efficiently.
For the ease of the enforcement of the application, illustrated with example below.
Embodiment one:
Diagram data processing method in embodiment one, as shown in Figure 2, comprising:
Step 201, monitors and whether receives figure renewal request or map analysis request, if so, carry out step 202; Otherwise return to step 201;
Generally in system start-up, can begin listening for after completion system initialization whether receiving figure renewal request Or map analysis request, the time specifically beginning listening for is not limited in this step.
Only the figure receiving is updated with request in the method for the present embodiment or map analysis request carries out subsequent treatment.
Step 202, according to the type of pending request, by pending request write figure update task queue or Map analysis task queue;
That is, figure is updated request write figure and update task queue, write map analysis task team is asked in map analysis Row.
Step 203, the first characteristic updating each task in task queue and map analysis task queue according to figure is true The operation order of fixed each task;
First characteristic includes following any one or combination: timestamp, ageing, priority, data dependence Feature.For example, it is possible to individually each task run be determined according to the timestamp of each task in two task queues Sequentially, that is, be introduced into the task of two queues and first process;Can also comprehensively each task ageing, Priority, data dependence feature determining each task run order, specific first characteristic, can be according to reality Border needs to determine.
After determining the operation order of each task in this step, the first cis-position task that can determine whether is If so, first cis-position is then appointed by the no map analysis task being time and/or resource consumption are more than given threshold Business is split as multiple tasks, and multiple tasks are run at interval, treat multiple tasks end of run, merges map analysis knot Really, complete the first cis-position task.So process and a task can be avoided to occupy long time and/or too many money Source, so that the carrying out of task is more reasonable, is that time and/or resource are disappeared particularly in map analysis task When consuming larger, can ensure that figure more new task carries out more efficiently.
According to the state of Read-Write Locks, step 204, determines whether the first cis-position task is currently to treat operation task, If so, carry out step 205, otherwise, suspend the first cis-position task, treat next cycle, return to step 204;
Using Read-Write Locks be according to the application in same system process figure more new task and map analysis task Situation, introduce for ensureing the affairs final consistency of more new task and analysis task, introduce Read-Write Locks Afterwards, the task that some are independent of each other can be completed parallel, for example pure reading analysis task and more new task, from And raising efficiency, when implementing it is also possible to not adopt Read-Write Locks, in this case, for ensureing more The affairs final consistency of new task and analysis task does not then allow parallel task, only completes it in determination task After carry out next task.The state of Read-Write Locks is modified in task run take, in task run It is modified to vacant when terminating or suspending.This step is it is to be understood that condition adjudgement according to Read-Write Locks Whether one cis-position task is allowed to run.
The concrete operations of this step can include following any one or combination:
When the state of Read-Write Locks is vacant, determine that the first cis-position task is currently to treat operation task;
When the state of Read-Write Locks is to take, if current operation task is pure interpreting blueprints analysis task, determine the One cis-position task is currently to treat operation task;
When the state of Read-Write Locks is to take, if current operation task is non-pure interpreting blueprints analysis task or schemes more New task, suspends the first cis-position task, treats next cycle, rejudge the state of Read-Write Locks.
Wherein, when the state in Read-Write Locks is to take, if current operation task is pure interpreting blueprints analysis task, Determine that the first cis-position task is currently to treat operation task, can be so pure interpreting blueprints analysis in current operation task During task, other tasks parallel, lift treatment effeciency.
In practical implementations it is also possible to not select so to process, when the state of Read-Write Locks is to take, just Suspending the first cis-position task, treat next cycle, rejudging the state of Read-Write Locks that is to say, that adopting This scheme, then do not allow parallel task.
Step 205, according to currently treating that operation task is related to the distributed partition information of data, by current as ready Row task is assigned to corresponding computing resource and runs.
In industry, the storage of figure and calculating are divided into single-point and distributed both of which;Single-point figure is all of figure On a single computer, the calculating of figure also concentrates in single calculate node for storage;Distributed graph model, pin To be figure be stored as be distributed on multiple stage machine, because the amount of figure is big, physics can not be single Machine stores, and the calculating of figure simultaneously is also executed in parallel on the distributed machine in multiple stage.The present embodiment with Illustrate it is therefore desirable to according to currently treating that operation task is related to the distributed subregion letter of data as a example distributed Breath, is currently treated that operation task is assigned to corresponding computing resource and runs, in practical implementations it is also possible to adopt Use single-point figure, then can directly run and treat operation task.
After operation task, whether time-out can be run with monitoring task, if so, suspended task, treat next In the individual cycle, restart task.So process and a task can be avoided to occupy long time and/or too many money Source, so that the carrying out of task is more reasonable, is that time and/or resource are disappeared particularly in map analysis task When consuming larger, can ensure that figure more new task carries out more efficiently.
Embodiment two:
Diagram data processing method in embodiment two, as shown in Figure 3, comprising:
Step 301, monitors and whether receives figure renewal request or map analysis request, if so, carry out step 302; Otherwise return to step 301;
Step 302, according to the type of pending request, by pending request write figure update task queue or Map analysis task queue;
Step 303, the first characteristic updating each task in task queue and map analysis task queue according to figure is true The operation order of fixed each task;
According to the state of Read-Write Locks, step 304, determines whether the first cis-position task is currently to treat operation task, If so, carry out step 305, otherwise, suspend the first cis-position task, treat next cycle, return to step 304;
Step 305, judges whether the first cis-position task schemes more new task, if so, carries out step 306, no Then, carry out step 307;
Due to this flow process only process figure update request and map analysis request, therefore, judge in this step be not Figure more news illustrates that this task is map analysis task.
Step 306, service chart more new task, and internal memory mapping object is simultaneously stored in buffer area and disk;
When realizing, internal memory mapping object is stored and during buffer area, can directly store delta (increment) Upgating object, it is also possible to process to internal memory mapping object, is divided into delta upgating object and focus pair As.Specific focus object can generate according to existing rule, and for example in one hour, number of operations exceedes The delta upgating object of 100 times is considered as focus object, the concrete generation to focus object in the application It is not specifically limited, after generating focus object, map analysis task can be carried out for focus object.
Step 307, obtains data run map analysis task from buffer area;
Step 308, if the data that map analysis task is related to, not in buffer area, obtains data from disk.
In the specific implementation, comprise the calculating behaviour of two category features in the figure computing engines in chart database system Make: update operation, analysis operation, this two generic operation is with the asynchronous shared drive model of bsp parallel task feature For Technical Reference, in conjunction with the feature of graph structure data operation, abstract public calculating interface is as follows:
Figure more new interface definition: updateresult updategraph (graphdata)
Map analysis interface definition: statsresult statsgraph (statsparam)
The abstract realization in inside for this two interfaces is decomposed as shown in Figure 4:
A) internal step of updategraph is as follows:
A1) inquiry prepares summit gatherreadyupdatevertex () updating
A2) update vertex information applyupdategraph () of figure
A3) vertex update information is communicated to each adjacent vertex scatterupdatevertexs ()
A4) being updated successfully the status summary of more new summit to buffer area in queue, this step is asynchronous message Mechanism is processed, and the process that this step does not interfere with preceding step takes summaryupdateresult ().
B) internal step of statsgraph is as follows:
B1) collect source summit information gatherreadystatssourcevertex () preparing analysis
This step can be from the source summit information being updated successfully collection preparation analysis queue of buffer area
B2) execution analysis task applystatsgraph ()
B3) combined analysis statistics task result summarystatssourcevertexs ()
Figure Computational frame in the present embodiment is divided into several stages:
Collect, implement, dissipating [collecting] gather, apply, scatter [summary]
Increase in the present embodiment and collect (summary) step, this step is used for collecting updating in more new task The result of operation, and be used for returning in renewal queue context;For analysis task, for Macro or mass analysis The result of task;Sink information writes in the context (context) of Computational frame with standard rule data, Use for internal calculation framework.
One characteristic point of the present embodiment: update result for the collection updating in figure interface updategraph Summaryupdateresult is by the figure vertex data of current real-time update for operation, can be automatic according to rule After the completion of task, write data in renewal subgraph buffer queue, in corresponding analysis task, can Automatically obtain data from this buffer queue, that is, automatically in statsgraph interface It is automatically performed in gatherreadystatssourcevertex.
Embodiment three
Figure storage in the application adopts distributed figure storage engines, and distributed figure storage engines are high as supporting Effect figure updates, the base layer support engine of map analysis, responsible two big class that solve the problems, such as:
One: effectively dense graph, sparse graph are carried out distributed storage, and as distributed figure core just It is the segmentation (partitions) of figure;
Graph structure to real world, substantially has two classes;The first kind: the summit (vertex) of figure has a small amount of Adjacent side (edge), i.e. sparse graph;Equations of The Second Kind: a small amount of summit (vertex) has substantial amounts of adjacent side, i.e. office Portion's dense graph (claims dense graph) in the application.
Two: succinctly unified access api being provided, (application programming interface applies journey Sequence DLL), call for upper strata figure computing engines.
With regard to first kind problem, in the partitioning algorithm that design field has 3 classes substantially to refer to of splitting of figure:
A1) balanced type side cutting: according to the id on summit, carry out Hash (hash) and calculate, according to machine Number, summit is uniquely divided on different machines, then the storage according to side redundancy is to different machines On;This algorithm in order to keep figure calculate high efficiency, need on different machines redundancy to adjacent vertex and side Information;So the renewal of any opposite side, summit, the network transmission interaction more than comparison will be related to.
A2) balanced type summit cutting: according to the id on side, carry out hash calculating, side is uniquely divided into On different machines, for the summit of side connection, redundancy is carried out on different machines;Due to the uniqueness on side, So the renewal of only opposite vertexes, just it is related to more network transmission interaction.
A3) Greedy summit cutting: be on the basis of a2 algorithm, connected for any a line e Two vertex v (a), v (b) is it is considered to the set situation of the machine of this corresponding vertex of pre-stored, such as a summit The collection of machines distributed is m (a), and the collection of machines that b is distributed on summit is m (b), can assess such as further Under after several situations, then the distribution principle determining side:
If m (a) has common factor with m (b), e is assigned on the machine of common factor.
If m (a) does not occur simultaneously with m (b), but has content, union is not empty, then be assigned to e On the minimum machine in simultaneously centralized distribution side on m (a) and m (b).
If m (a) is the allocated, but m (b) does not distribute, then e is assigned on m (a), otherwise also So.
If m (a) and m (b) does not distribute, e is assigned on a minimum machine of load.
For algorithm a3 in design compare pursue side close on storage, the algorithm due to figure more loads, Storage part branch relative consumption partial properties for figure;But the subsequent calculations part for figure can significantly carry Rise corresponding performance.
The present embodiment is directed to the optimization when carrying out figure storage for the application, can be according to the data characteristicses of figure, calculating Character, the concentration algorithm of summary, specifically as shown in figure 5, comprise the steps:
Step 501, the data characteristicses according to figure determine that figure is sparse graph or dense graph;
Step 502, the calculating feature according to figure determines that figure is based on summit or based on side;
Step 503, the data characteristicses according to figure and calculating feature determine the partitioning algorithm of figure, the data to figure Carry out segmentation storage.
Specific algorithm can be side segmentation, point segmentation, optimize point segmentation etc..
Because the data characteristicses according to figure and calculating feature determine the partitioning algorithm of figure, so that the figure adopting Partitioning algorithm more reasonable, strengthen the reasonability of data storage so that whole scheme more efficiently.
With regard to Equations of The Second Kind problem, for the api of figure storage, according to the business scenario of figure renewal, map analysis, Corresponding interface api is as follows for unified encapsulation:
Create vertex v ertex createvertex (key)
Create side edge createedge (key, sourcevertex, targetvertex)
More new summit result updatevertex (vertex, property)
Update side result updateedge (vertex, property)
Search summit findvertex (key)
Search the side findedgesofvertex (key) on summit
The side findedgesbylabel (label) of label is specified in inquiry
Batch scene summit bulkcreatevertexs (list (key))
Batch establishment side bulkcreateedges (key, list<sourcevertex>, list<targetvertex>)
Search adjacent vertex findadjacentvertxs (vertex)
Search adjacent side findadjacentedges (edge)
Delete summit boolean dropvertex (vertex)
Delete side boolean dropedge (edge)
So, there is provided succinctly unified access api, conveniently call for upper strata figure computing engines.
Based on same inventive concept, in the embodiment of the present application, additionally provide a kind of diagram data processing meanss, due to The principle of these equipment solve problems is similar to a kind of diagram data processing method, and the enforcement of therefore these equipment can With the enforcement referring to method, repeat no more in place of repetition.
As shown in fig. 6, device may include that
Figure updates task queue 601, for write figure more new task;
Map analysis task queue 602, for writing map analysis task;
Scheduler 603, for updating first of each task in task queue and map analysis task queue according to figure Characteristic determines the operation order of each task, is currently treated that operation task is assigned to corresponding computing resource and runs.
In implementing, figure updates task queue 601 can be responsible for maintaining the ageing of more new task and power Control Deng rule;Map analysis task queue 602 can maintain analysis task priority, ageing, unsuccessfully retry Feature.
Further, when applying in distributed system, also will be as shown in fig. 7, comprises subregion identifies Device 701, is supplied to scheduler 603 for currently treating that operation task is related to the distributed partition information of data. In non-distributed systems, then do not need including subregion evaluator.
Further, this device can also be read as shown in figure 8, including read-write lock module 801 for preserving Write the state of lock, the state of Read-Write Locks is modified in task run take, in task end of run or It is modified to vacant during time-out;
According to the state of Read-Write Locks, scheduler 603, after the operation order determining each task, determines that first is suitable Whether position/task is currently to treat operation task;
Further, according to the state of Read-Write Locks, scheduler 603 determines whether the first cis-position task is currently to treat Operation task includes following any one or combination:
When the state of Read-Write Locks is vacant, determine that the first cis-position task is currently to treat operation task;
When the state of Read-Write Locks is to take, if current operation task is pure interpreting blueprints analysis task, determine the One cis-position task is currently to treat operation task;
When the state of Read-Write Locks is to take, if current operation task is non-pure interpreting blueprints analysis task or schemes more New task, suspends the first cis-position task, treats next cycle, rejudge the state of Read-Write Locks.
Further, scheduler 603, after the operation order determining each task, can also judge that first is suitable Whether position/task is the map analysis task that time and/or resource consumption are more than with given threshold, if so, then by the One cis-position task is split as multiple tasks, and interval is run multiple tasks, treated multiple tasks end of run, merges Map analysis result, completes the first cis-position task.
Further, whether scheduler 603, after operation task, can run time-out with monitoring task, If so, suspended task, treats next cycle, restarts task.
Read-write lock module 801 and subregion evaluator 701 can be individually combined with the module of Fig. 6.
A kind of diagram data processing system is additionally provided, as shown in Figure 9 in the embodiment of the present application, comprising:
Service interface layer, including figure more new interface and map analysis interface, figure more new interface is used for receiving to be schemed to update Task write figure updates task queue;Map analysis interface is used for receiving map analysis task write map analysis task team Row;
Task scheduling layer, including above-mentioned diagram data processing meanss;
Figure computing engines, the figure for carrying out task updates operation and/or map analysis operation;
Figure storage engines, for storage figure.
In implementing, the more new interface in service interface layer belongs to the interface for operation level, comprises Corresponding business semantics, basic design rule is: by the data model translation of the non-figure of business be standard Diagram data model, by vertex, edge, relationship, property carry out all of interface rules of standardization.
Analysis interface in service interface layer: the analysis task of driving of accepting business, or timed task, or Person relies on the analysis task updating the data object;Such interface generally accepts two rule-likes: analysis source, analysis Regular index.
Further, figure storage engines include buffer area and disk;
Figure computing engines after service chart more new task, by internal memory mapping object be simultaneously stored in buffer area and Disk, and, when task is for map analysis task, obtain data from buffer area, if map analysis task is related to Data not in buffer area, obtain data from disk.
Further, when internal memory mapping object is stored buffer area by figure storage engines, to internal memory mapping object Processed, be divided into delta upgating object and focus object.
Further, figure storage engines include:
For the data characteristicses according to figure, diagram data feature analyzer, determines that figure is sparse graph or dense graph;
Figure calculates feature analyzer, for according to the calculating feature of figure determine figure be based on summit or side based on;
For the data characteristicses according to figure and calculating feature, figure storage division management device, determines that the segmentation of figure is calculated Method, carries out segmentation storage to the data of figure.
Further, this system including monitoring core, can also collect figure for real-time as shown in Figure 10 Monitoring information is converted to measurable figure meter by the resource load situation of computing engines and figure storage engines in real time Calculate the scheduling evaluation factor, be supplied to the scheduler 603 of task scheduling layer;
Scheduler 603, calculates the assessment scheduler task distribution of the scheduling evaluation factor always according to figure.
When implementing, figure calculates the scheduling evaluation factor and mays include:
Figure is newly-increased to update number of tasks, analysis task number [in a minute]
In service chart more new task, map analysis number of tasks
The more new task of queuing, analysis task number in task queue
The number of partitions of figure, physics cutting situation
The node of overall diagram, side number
Current system reading and writing lock situation
The newly-increased side of caching, number of vertices in storage engines
Side to be combined, number of vertices in storage engines
Deletion, the side of modification, number of vertex in storage engines
Subgraph block number to be divided in storage engines
The memory (internal memory) of computing engines, io expense
The memory size (memory size) of storage engines, cache size (cache size), disk file size (disk file size)
Scheduler 603 according to figure calculate the scheduling evaluation factor can significantly more efficient distribution calculating task, maximum Change the parallel concurrency and between the service of multimachine figure in unit.
Additionally, this monitoring core can also be supplied to figure monitoring display systems by calculating the scheduling evaluation factor, will The situation of system operation is shown.
For convenience of description, each several part of apparatus described above is divided into various modules or unit respectively with function Description.Certainly, when implementing the application can each mould certainly or unit function in same or multiple softwares Or realize in hardware.
Those skilled in the art are it should be appreciated that embodiments herein can be provided as method, system or meter Calculation machine program product.Therefore, the application can be using complete hardware embodiment, complete software embodiment or knot Close the form of the embodiment of software and hardware aspect.And, the application can adopt and wherein wrap one or more Computer-usable storage medium containing computer usable program code (including but not limited to disk memory, Cd-rom, optical memory etc.) the upper computer program implemented form.
The application is to produce with reference to according to the method for the embodiment of the present application, equipment (system) and computer program The flow chart of product and/or block diagram are describing.It should be understood that can by computer program instructions flowchart and / or block diagram in each flow process and/or the flow process in square frame and flow chart and/or block diagram and/ Or the combination of square frame.These computer program instructions can be provided to general purpose computer, special-purpose computer, embed The processor of formula datatron or other programmable data processing device is to produce a machine so that passing through to calculate The instruction of the computing device of machine or other programmable data processing device produces for realizing in flow chart one The device of the function of specifying in individual flow process or multiple flow process and/or one square frame of block diagram or multiple square frame.
These computer program instructions may be alternatively stored in and computer or other programmable datas can be guided to process and set So that being stored in this computer-readable memory in the standby computer-readable memory working in a specific way Instruction produce and include the manufacture of command device, the realization of this command device is in one flow process or multiple of flow chart The function of specifying in flow process and/or one square frame of block diagram or multiple square frame.
These computer program instructions also can be loaded in computer or other programmable data processing device, makes Obtain and series of operation steps is executed on computer or other programmable devices to produce computer implemented place Reason, thus the instruction of execution is provided for realizing in flow chart one on computer or other programmable devices The step of the function of specifying in flow process or multiple flow process and/or one square frame of block diagram or multiple square frame.
Although having been described for the preferred embodiment of the application, those skilled in the art once know base This creative concept, then can make other change and modification to these embodiments.So, appended right will Ask and be intended to be construed to including preferred embodiment and fall into being had altered and changing of the application scope.

Claims (21)

1. a kind of diagram data processing method is it is characterised in that include:
According to the type of pending request, described pending request write figure is updated task queue or map analysis Task queue, the type of described pending request includes figure and updates request and map analysis request;
Institute is determined according to the first characteristic that described figure updates each task in task queue and map analysis task queue State the operation order of each task;
Described each task is run according to described operation order.
2. the method for claim 1 is it is characterised in that described run institute according to described operation order State each task to include:
According to currently treating that operation task is related to the distributed partition information of data, currently treated that operation task is distributed Run to corresponding computing resource.
3. the method for claim 1 it is characterised in that described first characteristic include following any one Individual or combination: timestamp, ageing, priority, data dependence feature.
4. the method for claim 1 is it is characterised in that determine the operation order of described each task Afterwards, determine that whether the first cis-position task is described currently to treat operation task according to the state of Read-Write Locks;
The state of described Read-Write Locks is modified in task run take, in task end of run or time-out When be modified to vacant.
5. method as claimed in claim 4 is it is characterised in that the described state according to Read-Write Locks determines Whether one cis-position task is described currently to treat that operation task includes following any one or combination:
When the state of Read-Write Locks is vacant, determine that described first cis-position task is currently waited to run to appoint for described Business;
When the state of Read-Write Locks is to take, if current operation task is pure interpreting blueprints analysis task, determine institute First cis-position task of stating currently treats operation task for described;
When the state of Read-Write Locks is to take, if current operation task is non-pure interpreting blueprints analysis task or schemes more New task, suspends described first cis-position task, treats next cycle, rejudge the state of Read-Write Locks.
6. the method for claim 1 is it is characterised in that determine the operation order of described each task Afterwards, judge whether the first cis-position task is that the map analysis being more than given threshold to time and/or resource consumption is appointed If so, described first cis-position task is then split as multiple tasks by business, and the plurality of task is run at interval, Treat that the plurality of task run terminates, merge map analysis result, complete described first cis-position task.
7. the method for claim 1 is it is characterised in that after operation task, monitoring is described to appoint Whether business runs time-out, if so, suspends described task, treats next cycle, restarts described task.
8. the method for claim 1, will be interior it is characterised in that after service chart more new task Deposit mapping object and be simultaneously stored in buffer area and disk;
In service chart analysis task, obtain data from described buffer area;
If the data that described map analysis task is related in buffer area, does not obtain data from described disk.
9. the method for claim 1 is it is characterised in that described store internal memory mapping object is delayed When depositing area, internal memory mapping object is processed, be divided into incremental update object and focus object.
10. the method for claim 1 is it is characterised in that include when carrying out figure storage:
Determine that described figure is sparse graph or dense graph according to the data characteristicses of figure;
According to the calculating feature of described figure determine described figure be based on summit or side based on;
Data characteristicses according to described figure determine the partitioning algorithm of described figure, the number to described figure with calculating feature Store according to carrying out segmentation.
A kind of 11. diagram data processing meanss are it is characterised in that include:
Figure updates task queue, for write figure more new task;
Map analysis task queue, for writing map analysis task;
Scheduler, for updating first of each task in task queue and map analysis task queue according to described figure Characteristic determines the operation order of described each task, is currently treated that operation task is assigned to corresponding computing resource fortune OK.
12. devices as claimed in claim 11 are it is characterised in that include subregion evaluator, for ought Before treat that operation task is related to the distributed partition information of data and is supplied to described scheduler.
13. devices as claimed in claim 11 are it is characterised in that include read-write lock module, for preserving The state of Read-Write Locks, the state of described Read-Write Locks is modified in task run take, in task run It is modified to vacant when terminating or suspending;
Described scheduler, after the operation order determining described each task, determines the according to the state of Read-Write Locks Whether one cis-position task is described currently to treat operation task;
14. devices as claimed in claim 13 are it is characterised in that described scheduler is according to the shape of Read-Write Locks State determines that whether the first cis-position task is described currently to treat that operation task includes following any one or combination:
When the state of Read-Write Locks is vacant, determine that described first cis-position task is currently waited to run to appoint for described Business;
When the state of Read-Write Locks is to take, if current operation task is pure interpreting blueprints analysis task, determine institute First cis-position task of stating currently treats operation task for described;
When the state of Read-Write Locks is to take, if current operation task is non-pure interpreting blueprints analysis task or schemes more New task, suspends described first cis-position task, treats next cycle, rejudge the state of Read-Write Locks.
15. devices as claimed in claim 11 are it is characterised in that described scheduler is determining described each After the operation order of business, judge whether the first cis-position task is time and/or resource consumption to be more than set threshold If so, described first cis-position task is then split as multiple tasks by the map analysis task of value, and institute is run at interval State multiple tasks, treat that the plurality of task run terminates, merge map analysis result, complete described first cis-position Task.
16. devices as claimed in claim 11 it is characterised in that described scheduler is after operation task, Monitor whether described task runs time-out, if so, suspend described task, treat next cycle, restart Described task.
A kind of 17. diagram data processing systems are it is characterised in that include:
Service interface layer, including more new interface and analysis interface, described more new interface is used for receiving data and updates Task write updates task queue;Described analysis interface is used for receiving data analysis task and writes analysis task team Row;
Task scheduling layer, including the device as described in claim 11 to 16;
Figure computing engines, the figure for carrying out task updates operation and/or map analysis operation;
Figure storage engines, for storage figure.
18. systems as claimed in claim 17 are it is characterised in that described figure storage engines include buffer area And disk;
Described figure computing engines, after service chart more new task, internal memory mapping object are simultaneously stored in caching Area and disk, and, when task is for map analysis task, obtain data from described buffer area, if described figure The data that analysis task is related in buffer area, does not obtain data from described disk.
19. systems as claimed in claim 18 are it is characterised in that internal memory is mapped by described figure storage engines When object stores buffer area, internal memory mapping object is processed, be divided into incremental update object and focus Object.
20. systems as claimed in claim 17 are it is characterised in that described figure storage engines include:.
According to the data characteristicses of figure, diagram data feature analyzer, for determining that described figure is sparse graph or dense Figure;
Figure calculates feature analyzer, for according to the calculating feature of described figure determine described figure be based on summit or Based on side;
Figure storage division management device, determines described figure for the data characteristicses according to described figure and calculating feature Partitioning algorithm, carries out segmentation storage to the data of described figure.
21. systems as claimed in claim 17 are it is characterised in that also include monitoring core, for real-time Collect described figure computing engines and the resource load situation of described figure storage engines, in real time monitoring information is changed Calculate the scheduling evaluation factor for measurable figure, be supplied to the scheduler of described task scheduling layer;
Described scheduler, calculates the assessment scheduler task distribution of the scheduling evaluation factor always according to described figure.
CN201510419390.0A 2015-07-16 2015-07-16 Graph data processing method, device and system Active CN106354729B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510419390.0A CN106354729B (en) 2015-07-16 2015-07-16 Graph data processing method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510419390.0A CN106354729B (en) 2015-07-16 2015-07-16 Graph data processing method, device and system

Publications (2)

Publication Number Publication Date
CN106354729A true CN106354729A (en) 2017-01-25
CN106354729B CN106354729B (en) 2020-01-07

Family

ID=57842658

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510419390.0A Active CN106354729B (en) 2015-07-16 2015-07-16 Graph data processing method, device and system

Country Status (1)

Country Link
CN (1) CN106354729B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108595251A (en) * 2018-05-10 2018-09-28 腾讯科技(深圳)有限公司 Dynamic Graph update method, device, storage engines interface and program medium
CN108984281A (en) * 2018-05-30 2018-12-11 深圳市买买提信息科技有限公司 A kind of task processing method and server
CN109670089A (en) * 2018-12-29 2019-04-23 颖投信息科技(上海)有限公司 Knowledge mapping system and its figure server
CN111291870A (en) * 2020-05-09 2020-06-16 支付宝(杭州)信息技术有限公司 Method and system for processing high-dimensional sparse features in deep learning of images
CN111309750A (en) * 2020-03-31 2020-06-19 中国邮政储蓄银行股份有限公司 Data updating method and device for graph database
CN113239243A (en) * 2021-07-08 2021-08-10 湖南星汉数智科技有限公司 Graph data analysis method and device based on multiple computing platforms and computer equipment
CN113672636A (en) * 2021-10-21 2021-11-19 支付宝(杭州)信息技术有限公司 Graph data writing method and device
US11256749B2 (en) 2016-11-30 2022-02-22 Huawei Technologies Co., Ltd. Graph data processing method and apparatus, and system
JP2022518127A (en) * 2019-12-25 2022-03-14 上▲海▼商▲湯▼智能科技有限公司 Resource scheduling methods and equipment, electronic devices and recording media
CN115470377A (en) * 2021-06-11 2022-12-13 清华大学 Streaming graph data processing method and system
CN116821250A (en) * 2023-08-25 2023-09-29 支付宝(杭州)信息技术有限公司 Distributed graph data processing method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102906743A (en) * 2010-05-17 2013-01-30 慕尼黑技术大学 Hybrid OLTP and OLAP high performance database system
CN104504003A (en) * 2014-12-09 2015-04-08 北京航空航天大学 Graph data searching method and device
CN104615677A (en) * 2015-01-20 2015-05-13 同济大学 Graph data access method and system
CN104679764A (en) * 2013-11-28 2015-06-03 方正信息产业控股有限公司 Method and device for searching graph data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102906743A (en) * 2010-05-17 2013-01-30 慕尼黑技术大学 Hybrid OLTP and OLAP high performance database system
CN104679764A (en) * 2013-11-28 2015-06-03 方正信息产业控股有限公司 Method and device for searching graph data
CN104504003A (en) * 2014-12-09 2015-04-08 北京航空航天大学 Graph data searching method and device
CN104615677A (en) * 2015-01-20 2015-05-13 同济大学 Graph data access method and system

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11256749B2 (en) 2016-11-30 2022-02-22 Huawei Technologies Co., Ltd. Graph data processing method and apparatus, and system
CN108595251A (en) * 2018-05-10 2018-09-28 腾讯科技(深圳)有限公司 Dynamic Graph update method, device, storage engines interface and program medium
CN108595251B (en) * 2018-05-10 2022-11-22 腾讯科技(深圳)有限公司 Dynamic graph updating method, device, storage engine interface and program medium
CN108984281A (en) * 2018-05-30 2018-12-11 深圳市买买提信息科技有限公司 A kind of task processing method and server
CN109670089A (en) * 2018-12-29 2019-04-23 颖投信息科技(上海)有限公司 Knowledge mapping system and its figure server
WO2020135050A1 (en) * 2018-12-29 2020-07-02 颖投信息科技(上海)有限公司 Knowledge mapping system and map server thereof
JP2022518127A (en) * 2019-12-25 2022-03-14 上▲海▼商▲湯▼智能科技有限公司 Resource scheduling methods and equipment, electronic devices and recording media
CN111309750A (en) * 2020-03-31 2020-06-19 中国邮政储蓄银行股份有限公司 Data updating method and device for graph database
CN111291870A (en) * 2020-05-09 2020-06-16 支付宝(杭州)信息技术有限公司 Method and system for processing high-dimensional sparse features in deep learning of images
CN115470377A (en) * 2021-06-11 2022-12-13 清华大学 Streaming graph data processing method and system
CN113239243A (en) * 2021-07-08 2021-08-10 湖南星汉数智科技有限公司 Graph data analysis method and device based on multiple computing platforms and computer equipment
CN113672636A (en) * 2021-10-21 2021-11-19 支付宝(杭州)信息技术有限公司 Graph data writing method and device
CN113672636B (en) * 2021-10-21 2022-03-22 支付宝(杭州)信息技术有限公司 Graph data writing method and device
CN116821250A (en) * 2023-08-25 2023-09-29 支付宝(杭州)信息技术有限公司 Distributed graph data processing method and system
CN116821250B (en) * 2023-08-25 2023-12-08 支付宝(杭州)信息技术有限公司 Distributed graph data processing method and system

Also Published As

Publication number Publication date
CN106354729B (en) 2020-01-07

Similar Documents

Publication Publication Date Title
CN106354729A (en) Graph data handling method, device and system
Han et al. Benchmarking big data systems: A review
US9792327B2 (en) Self-described query execution in a massively parallel SQL execution engine
CN105049268B (en) Distributed computing resource distribution system and task processing method
CN110704164A (en) Cloud native application platform construction method based on Kubernetes technology
CN103177059B (en) Separate processing path for database computing engines
CN110245023B (en) Distributed scheduling method and device, electronic equipment and computer storage medium
CN103336808B (en) A kind of real-time diagram data processing system and method based on BSP models
CN105453040B (en) The method and system of data flow is handled in a distributed computing environment
CN109983441A (en) Resource management for batch job
CN110661842B (en) Resource scheduling management method, electronic equipment and storage medium
CN110019251A (en) A kind of data processing system, method and apparatus
CN101799773A (en) Memory access method of parallel computing
Arfat et al. Big data for smart infrastructure design: Opportunities and challenges
CN107645410A (en) A kind of virtual machine management system and method based on OpenStack cloud platforms
CN112579586A (en) Data processing method, device, equipment and storage medium
PH12019000172A1 (en) Generating an execution script for configuration of a system
CN105930417A (en) Big data ETL (Extract-Transform-Load) interactive processing platform based on cloud computing
CN110825526B (en) Distributed scheduling method and device based on ER relationship, equipment and storage medium
Ardagna et al. Predicting the performance of big data applications on the cloud
CN108255852B (en) SQL execution method and device
CN108563787A (en) A kind of data interaction management system and method for data center&#39;s total management system
Kllapi et al. Elastic processing of analytical query workloads on iaas clouds
Liu et al. Exploring query processing on cpu-gpu integrated edge device
CN107528871A (en) Data analysis in storage system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200923

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200923

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Patentee before: Alibaba Group Holding Ltd.