CN107273207A - A kind of related data storage method based on hypergraph partitioning algorithm - Google Patents

A kind of related data storage method based on hypergraph partitioning algorithm Download PDF

Info

Publication number
CN107273207A
CN107273207A CN201710388857.9A CN201710388857A CN107273207A CN 107273207 A CN107273207 A CN 107273207A CN 201710388857 A CN201710388857 A CN 201710388857A CN 107273207 A CN107273207 A CN 107273207A
Authority
CN
China
Prior art keywords
hypergraph
demand
data
node
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710388857.9A
Other languages
Chinese (zh)
Inventor
王宝亮
张光荣
常鹏
张荧允
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201710388857.9A priority Critical patent/CN107273207A/en
Publication of CN107273207A publication Critical patent/CN107273207A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention relates to a kind of related data storage method based on hypergraph partitioning algorithm, including:The task of processing data is needed for one, a demand model is called, the demand model needs multiple data for being stored in data center's node, after demand model is determined, predicts its demand factor;The setting of rate according to demand, the standard that selected metric standard, i.e. hypergraph are divided;Set up hypergraph model;Slightly draw the stage;Initial phase;Optimizing phase:Fission reduction will be carried out successively after k subgraph random selection fission node, construct a series of hypergraphs, until scale reaches the original scale for having no right hypergraph, k subgraph after being optimized.

Description

A kind of related data storage method based on hypergraph partitioning algorithm
Technical field
The invention belongs to big data processing technology field, it is related to a kind of related data storage method.
Background technology
Explosive growth is presented in high speed development and rapid popularization with internet, global metadata amount, and we come into Epoch of one information explosion type.In face of magnanimity, complicated data, information processing capacity reaches that TB grades even PB grades have been normal Phenomenon, big data concept is arisen at the historic moment.Relative to traditional data, the feature of big data is summarized as 4 V, i.e. data by people Amount big (Volume), speed fast (Velocity), many (Variety) value densities of type are low (Value).Data volume greatly still can be with Alleviate to a certain extent by extension storage, but requirement is timely responded to, data diversity and data uncertainty are traditional numbers It cann't be solved according to processing method.In order to tackle difficulty and the challenge that this big data is brought, many Large-Scale Interconnected nets are public Department was proposed various types of big data processing systems in recent years.As an emerging technology, big data treatment technology is very Many places also have many deficiencies, postponing as caused by calling distributed data, huge data throughout and not phase The problem of network load caused by the network rate of symbol is serious etc..Therefore, many scholars are looking for preferably always both at home and abroad Date storage method with strengthen big data processing integration capability.
Seem magnanimity, complicated data, with certain relevance in it, required data when handling some specific tasks With some features (such as frequency of use, size and other data are used) simultaneously, if the high data of correlation are deposited as far as possible Calculate node is placed on, then Internet resources need not be taken by waiting when in use, save the time, improve the validity of system.
The popularization of common figure in hypergraph (Hypergraph) discrete mathematics, its mathematical definition is:For hypergraph H, have The node set V of the hypergraph and set E while (while super, Hyperedge) of hypergraph, then have H=(V, E).Wherein, each super side E is a V nonempty set, and the nodal point number that general e is included means that its number of degrees is designated as | e | (being more than or equal to 2).Hypergraph is drawn It by the node division of hypergraph is k roughly equal parts point to be exactly, and the node of same hypergraph connection some occurs Situation be minimized.
The content of the invention
The purpose of the present invention is that proposition is a kind of carries out storage optimization method based on what hypergraph was divided to related data.The party Method predicts the fixed demand of this generic task, is designated as a demand model, makes for the essentially identical same generic task of demand data The data needed for this pattern are moved to the less node of load with hypergraph partitioning algorithm.Technical scheme is as follows:
A kind of related data storage method based on hypergraph partitioning algorithm, comprises the following steps:
(1) task of processing data is needed for one, is called a demand model, the demand model needs multiple deposit The data in data center's node are stored up, after demand model is determined, its demand factor are predicted, it is assumed that each node demand factor is Rpy, The total demand factor of the demand model is
(2) setting of rate according to demand, the standard that selected metric standard, i.e. hypergraph are divided, first is to complete a demand Required overhead CA, second is total relay traffic C needed for completion demandL, it is C (D)=C to draw moduleA+α CL, parameters of the α for two standard values of balance between zero and one.
(3) hypergraph model is set up according to the criterion of step 2, all data item and back end is set to hypergraph Vertex set V, super line set E in include the mapping relations of all demand model and data item and node, every super side e ∈ E is endowed a weight, based on the module in (2), is each weight assignment, in hypergraph, there is two class nodes, storage section Point and data item, two class sides, demand model it is super while and back end surpass while.
(4) it is n output set to refer to hypergraph vertex partition, and each summit is pertaining only to one in n set, reduces super The weight that figure is divided is calculated as reducing the summation of super side right weight, if the summit on a super side is not pertaining only to a set, this Super side is cut up, and super side e summit is fallen in t set, then its power that subtracts is calculated as (t-1) we
(5) stage is slightly drawn:Reduce the weight on super side, will contact close node merging, construction scale it is smaller have no right surpass Figure so that the minification between adjacent two layers hypergraph reaches the minification of setting, the minification is adjacent two layers hypergraph node Reduced number of percentage.
(6) initial phase:The smaller hypergraph of having no right of scale obtained by step (5) is subjected to initial division, obtains most initial K subgraph, division methods are random division;
(7) optimizing phase:By step (6) to k subgraph random selection fission node after carry out fission reduction successively, A series of hypergraphs are constructed, until scale reaches the original scale for having no right hypergraph, k subgraph after being optimized.
Brief description of the drawings
Fig. 1 demand model legends
Fig. 2 bipartite graphs
Fig. 3 hypergraph models
Fig. 4 algorithm flow charts
Embodiment
The basic thought of this patent is, for the demand model of a determination, and data needed for pattern set up one according to demand Individual demand model and the binary crelation of data center's data storage node.According to the binary crelation and the module proposed, Build the Function Mapping relation that a data are stored in back end.It is described as follows.
One, data item and node
X represents to be stored in the set that m data is included on back end, and each task needs to transmit d from set X Individual different data item.Assuming that mode requirement space isDemand model in practical application is the space A subset, is usedRepresent.As shown in Figure 1.There are five data item, three different demand moulds in legend Formula.
Y represents including the set of n memory node.Initially, it is assumed that each data item x ∈ X are stored in exclusive node y ∈ In Y.If crawling storage after data arrives the rule of node for D:x→y.Final purpose of the present invention is just to provide a suitable storage Scheme, can provide an efficient D function.In addition, we use DyExpression is stored in node y data acquisition system.
Two, data are placed
1. demand factor
The data of data center are stored in, the input of another task may be output as, it is also possible to just local Operation.Without loss of generality, back end demand initially accessed as demand source position, so, in a model, data center Or node has two roles simultaneously:The source node location of demand model and the finish node position of data storage.
For each demand model completed on demand nodes y ∈ Y, its workload or demand factor are foreseeable (Forecasting Methodology is very ripe, does not include herein) is designated as Rpy, according to demand rate make data storage decision-making.We define work Measure or demand factor collection is combined into R={ Rpy|p∈P,y∈Y}.As shown in Fig. 2 this is the bipartite model of a hypothesis, data Center and demand model are respectively the summit of bipartite graph, connect two kinds of side demand factor RpyRepresent, and assign weights.Meter The total demand factors of each demand nodes y are calculated, are usedRepresent.For each demand model p, its total need is calculated The rate is asked to be
2. module
The storage of data can influence the performance of system, and the validity and Consumer's Experience for showing as system postpone two aspects. Relation between systematic function is deposited by observed data, we summarize two modules.
1) related data is altogether put
The system necessary processing time that the evaluation criterion of system effectiveness needs for the given workload of completion.In distribution system In system, the system average time needed for completing a demand is not only relevant with the information content of reading, goes back and include each node Processing expense total node number amount it is relevant.Define SpRepresent to complete the data volume needed for a demand model p, SpyRepresent in section The workload needed for demand model p is completed in point y ∈ Y,SpyIt is to have data to deposit mapping function D:X → y is determined A variable, SpIt is a constant.The system necessary time for being defined on node y ∈ Y portions or being fully finished a demand p is Spy+λ·1(Spy), SpyRepresent the Conventional Time that process demand the needs, (S of λ 1py) represent process demand p routine operation needed for The constant process time wanted, such as the connection of TCP.For the demand factor of different mode, the system total time of all demands is completed ForIt is equivalent toService can be lifted by minimizing the formula The validity of device, reduces expense.Requirement can be reached by putting the strong data of correlation altogether.One extreme case, a demand model Required data item be stored on same node, the minimum time of the system needed for completing the demand be Rp(Spy+ λ), and for Any given workloadFor a constant, so the overhead needed for completing a demand is
2) local data services
Demand nodes the different of back end position from needed for storing the demand can also influence the performance of system, show as Relay traffic is produced, so we assign total relay traffic needed for completion demand as second criterion.It is defined as1(x∈Dy) represent whether data item x is stored in node y.
The final purpose of the present invention is to provide a kind of method of related data storage, increases the validity of system, reduces system The expense of system, is specifically just to provide the data of optimization to the mapping function D of storage:x→y.Based on two above standard, most The optimisation criteria of result function is set to C (D)=CA+αCL, α is the parameter of two standards of balance.
Three, hypergraphs are divided
1. the foundation of hypergraph model
All data item and back end are set to the vertex set V, V={ X, Y } of hypergraph.Included in super line set E The mapping relations of all demand models and data item and node, E={ { ep|p∈P},{exy|x∈X,y∈Y}}.Every super side E ∈ E are endowed a weight.Based on optimisation criteria C (D)=CA+αCL, weight is set to As shown in figure 3, in hypergraph, there is two class nodes, memory node and data item, two class sides, the super side of demand model and back end Super side.
2. the foundation that hypergraph is divided
Theorem:For output set I, by method described above it as a hypergraph.Hypergraph is divided into top N set of point, then, obtains data and places function D.Define reduction weight is divided into H, and it is one to meet H=C (D)-B, B Individual constant.
Prove:First, we discuss the super side e of demand modelpSubtract power, use HpRepresent.According to the definition of super side reduction,According toWe can obtainIt is a constant.Second, the less weight on the super side of back end is discussed.It is defined as Hxy.It is right In any data item x, in hypergraph model, it is connected to all nodes.After hypergraph is divided, a node can only be connected. Otherwise some set of division result will be connected by x.Consider that we are placed into each node among different set. Assuming that data item x is ultimately connected to node fx.The super side of the back end related to x subtract power summation beTherefore,
By theorem, the result divided based on hypergraph, we will show that it is of equal value to reduce the weight of n k-path partitions with C (D) 's.
3. the step of hypergraph is divided
1) stage is slightly drawn:Close node will be contacted to merge, the smaller hypergraph of construction scale so that adjacent two layers hypergraph it Between minification reach the minification of setting, the minification is the percentage that adjacent two layers hypergraph interstitial content reduces;
2) initial phase:The smaller hypergraph of having no right of step 1 gained scale is subjected to initial division, most initial k is obtained Individual subgraph, division methods are random division;
3) optimizing phase:By step 2 to k subgraph random selection fission node after carry out fission reduction, structure successively A series of hypergraphs are made, until scale reaches the original scale for having no right hypergraph, k subgraph after being optimized.
Its algorithm flow chart is as shown in Figure 4.
In summary, the present invention proposes a kind of optimization storage method of the related data divided based on hypergraph, improves and is The validity of system, reduces Consumer's Experience delay.

Claims (1)

1. a kind of related data storage method based on hypergraph partitioning algorithm, comprises the following steps:
(1) task of processing data is needed for one, is called a demand model, the demand model needs multiple be stored in The data of data center's node, after demand model is determined, predict its demand factor, it is assumed that each node demand factor is Rpy, this is needed The total demand factor of modulus formula is
(2) setting of rate according to demand, selected metric standard, i.e. hypergraph divide standard, first be complete a demand needed for Overhead CA, second is total relay traffic C needed for completion demandL, it is C (D)=C to draw moduleA+αCL, α is Balance the parameter of two standard values between zero and one.
(3) hypergraph model is set up according to the criterion of step 2, all data item and back end is set to the top of hypergraph The mapping relations of all demand model and data item and node, every super side e ∈ E quilt are included in point set V, super line set E A weight is assigned, is each weight assignment w based on the module in (2)e, in hypergraph, there is two class nodes, storage section Point and data item, two class sides, demand model it is super while and back end surpass while.
(4) it is n output set to refer to hypergraph vertex partition, and each summit is pertaining only to one in n set, reduces hypergraph and draws The weight divided is calculated as reducing the summation of super side right weight, if the summit on a super side is not pertaining only to a set, this super side It is cut up, super side e summit is fallen in t set, then its power that subtracts is calculated as (t-1) we
(5) stage is slightly drawn:Reduce the weight on super side, will contact close node merging, construction scale it is smaller have no right hypergraph, make The minification that the minification between adjacent two layers hypergraph reaches setting is obtained, the minification is that adjacent two layers hypergraph interstitial content subtracts Small percentage.
(6) initial phase:The smaller hypergraph of having no right of scale obtained by step (5) is subjected to initial division, most initial k are obtained Subgraph, division methods are random division;
(7) optimizing phase:By step (6) to k subgraph random selection fission node after carry out fission reduction successively, construct A series of hypergraphs, until scale reaches the original scale for having no right hypergraph, k subgraph after being optimized.
CN201710388857.9A 2017-05-25 2017-05-25 A kind of related data storage method based on hypergraph partitioning algorithm Pending CN107273207A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710388857.9A CN107273207A (en) 2017-05-25 2017-05-25 A kind of related data storage method based on hypergraph partitioning algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710388857.9A CN107273207A (en) 2017-05-25 2017-05-25 A kind of related data storage method based on hypergraph partitioning algorithm

Publications (1)

Publication Number Publication Date
CN107273207A true CN107273207A (en) 2017-10-20

Family

ID=60065723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710388857.9A Pending CN107273207A (en) 2017-05-25 2017-05-25 A kind of related data storage method based on hypergraph partitioning algorithm

Country Status (1)

Country Link
CN (1) CN107273207A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510205A (en) * 2018-04-08 2018-09-07 大连理工大学 A kind of author's technical capability evaluation method based on hypergraph

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100318963A1 (en) * 2009-06-15 2010-12-16 Microsoft Corporation Hypergraph Implementation
KR101417757B1 (en) * 2009-10-30 2014-07-14 에스케이플래닛 주식회사 Apparatus and method for learning and applying hypergraph language model, and apparatus and method for updating hypergraph language model
CN104809242A (en) * 2015-05-15 2015-07-29 成都睿峰科技有限公司 Distributed-structure-based big data clustering method and device
CN105279524A (en) * 2015-11-04 2016-01-27 盐城工学院 High-dimensional data clustering method based on unweighted hypergraph segmentation
CN105681052A (en) * 2016-01-11 2016-06-15 天津大学 Energy-saving method for data center distributed file storage

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100318963A1 (en) * 2009-06-15 2010-12-16 Microsoft Corporation Hypergraph Implementation
KR101417757B1 (en) * 2009-10-30 2014-07-14 에스케이플래닛 주식회사 Apparatus and method for learning and applying hypergraph language model, and apparatus and method for updating hypergraph language model
CN104809242A (en) * 2015-05-15 2015-07-29 成都睿峰科技有限公司 Distributed-structure-based big data clustering method and device
CN105279524A (en) * 2015-11-04 2016-01-27 盐城工学院 High-dimensional data clustering method based on unweighted hypergraph segmentation
CN105681052A (en) * 2016-01-11 2016-06-15 天津大学 Energy-saving method for data center distributed file storage

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510205A (en) * 2018-04-08 2018-09-07 大连理工大学 A kind of author's technical capability evaluation method based on hypergraph
CN108510205B (en) * 2018-04-08 2021-07-16 大连理工大学 Author skill evaluation method based on hypergraph

Similar Documents

Publication Publication Date Title
CN108924198B (en) Data scheduling method, device and system based on edge calculation
CN109033234B (en) Streaming graph calculation method and system based on state update propagation
CN104754053B (en) A kind of distributed software defines network and the wherein method of dynamic control controller
CN103179052A (en) Virtual resource allocation method and system based on proximity centrality
CN103188346A (en) Distributed decision making supporting massive high-concurrency access I/O (Input/output) server load balancing system
CN105824686A (en) Selecting method and selecting system of host machine of virtual machine
CN105843679A (en) Adaptive many-core resource scheduling method
CN107450855A (en) A kind of model for distributed storage variable data distribution method and system
CN112333260A (en) Cloud computing task scheduling method and cloud computing system
CN105302858A (en) Distributed database system node-spanning check optimization method and system
CN104536831B (en) A kind of multinuclear SoC software image methods based on multiple-objection optimization
CN116706917A (en) Intelligent park collaborative regulation and control method and system based on rapid alternating direction multiplier method
CN108595255A (en) Workflow task dispatching method based on shortest path first in geographically distributed cloud
CN105471893A (en) Distributed equivalent data stream connection method
CN111324429A (en) Micro-service combination scheduling method based on multi-generation ancestry reference distance
CN110689174A (en) Personnel route planning method and device based on public transport
CN114239960A (en) Distribution network project group progress management method and system based on dynamic resource optimization
CN101141315A (en) Network resource scheduling simulation system
CN112948123B (en) Spark-based grid hydrological model distributed computing method
CN108764510B (en) Urban rail transit parallel simulation task decomposition method facing large-scale road network
CN112162837B (en) Edge calculation scheduling method and system based on software definition
CN107273207A (en) A kind of related data storage method based on hypergraph partitioning algorithm
CN106681795A (en) Virtual network mapping method utilizing local topological attributes of nodes and available resource capacity values
CN116303219A (en) Grid file acquisition method and device and electronic equipment
CN103984737A (en) Optimization method for data layout of multi-data centres based on calculating relevancy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned

Effective date of abandoning: 20210209

AD01 Patent right deemed abandoned