CN107273207A - A kind of related data storage method based on hypergraph partitioning algorithm - Google Patents
A kind of related data storage method based on hypergraph partitioning algorithm Download PDFInfo
- Publication number
- CN107273207A CN107273207A CN201710388857.9A CN201710388857A CN107273207A CN 107273207 A CN107273207 A CN 107273207A CN 201710388857 A CN201710388857 A CN 201710388857A CN 107273207 A CN107273207 A CN 107273207A
- Authority
- CN
- China
- Prior art keywords
- hypergraph
- demand
- data
- node
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention relates to a kind of related data storage method based on hypergraph partitioning algorithm, including:The task of processing data is needed for one, a demand model is called, the demand model needs multiple data for being stored in data center's node, after demand model is determined, predicts its demand factor;The setting of rate according to demand, the standard that selected metric standard, i.e. hypergraph are divided;Set up hypergraph model;Slightly draw the stage;Initial phase;Optimizing phase:Fission reduction will be carried out successively after k subgraph random selection fission node, construct a series of hypergraphs, until scale reaches the original scale for having no right hypergraph, k subgraph after being optimized.
Description
Technical field
The invention belongs to big data processing technology field, it is related to a kind of related data storage method.
Background technology
Explosive growth is presented in high speed development and rapid popularization with internet, global metadata amount, and we come into
Epoch of one information explosion type.In face of magnanimity, complicated data, information processing capacity reaches that TB grades even PB grades have been normal
Phenomenon, big data concept is arisen at the historic moment.Relative to traditional data, the feature of big data is summarized as 4 V, i.e. data by people
Amount big (Volume), speed fast (Velocity), many (Variety) value densities of type are low (Value).Data volume greatly still can be with
Alleviate to a certain extent by extension storage, but requirement is timely responded to, data diversity and data uncertainty are traditional numbers
It cann't be solved according to processing method.In order to tackle difficulty and the challenge that this big data is brought, many Large-Scale Interconnected nets are public
Department was proposed various types of big data processing systems in recent years.As an emerging technology, big data treatment technology is very
Many places also have many deficiencies, postponing as caused by calling distributed data, huge data throughout and not phase
The problem of network load caused by the network rate of symbol is serious etc..Therefore, many scholars are looking for preferably always both at home and abroad
Date storage method with strengthen big data processing integration capability.
Seem magnanimity, complicated data, with certain relevance in it, required data when handling some specific tasks
With some features (such as frequency of use, size and other data are used) simultaneously, if the high data of correlation are deposited as far as possible
Calculate node is placed on, then Internet resources need not be taken by waiting when in use, save the time, improve the validity of system.
The popularization of common figure in hypergraph (Hypergraph) discrete mathematics, its mathematical definition is:For hypergraph H, have
The node set V of the hypergraph and set E while (while super, Hyperedge) of hypergraph, then have H=(V, E).Wherein, each super side
E is a V nonempty set, and the nodal point number that general e is included means that its number of degrees is designated as | e | (being more than or equal to 2).Hypergraph is drawn
It by the node division of hypergraph is k roughly equal parts point to be exactly, and the node of same hypergraph connection some occurs
Situation be minimized.
The content of the invention
The purpose of the present invention is that proposition is a kind of carries out storage optimization method based on what hypergraph was divided to related data.The party
Method predicts the fixed demand of this generic task, is designated as a demand model, makes for the essentially identical same generic task of demand data
The data needed for this pattern are moved to the less node of load with hypergraph partitioning algorithm.Technical scheme is as follows:
A kind of related data storage method based on hypergraph partitioning algorithm, comprises the following steps:
(1) task of processing data is needed for one, is called a demand model, the demand model needs multiple deposit
The data in data center's node are stored up, after demand model is determined, its demand factor are predicted, it is assumed that each node demand factor is Rpy,
The total demand factor of the demand model is
(2) setting of rate according to demand, the standard that selected metric standard, i.e. hypergraph are divided, first is to complete a demand
Required overhead CA, second is total relay traffic C needed for completion demandL, it is C (D)=C to draw moduleA+α
CL, parameters of the α for two standard values of balance between zero and one.
(3) hypergraph model is set up according to the criterion of step 2, all data item and back end is set to hypergraph
Vertex set V, super line set E in include the mapping relations of all demand model and data item and node, every super side e ∈
E is endowed a weight, based on the module in (2), is each weight assignment, in hypergraph, there is two class nodes, storage section
Point and data item, two class sides, demand model it is super while and back end surpass while.
(4) it is n output set to refer to hypergraph vertex partition, and each summit is pertaining only to one in n set, reduces super
The weight that figure is divided is calculated as reducing the summation of super side right weight, if the summit on a super side is not pertaining only to a set, this
Super side is cut up, and super side e summit is fallen in t set, then its power that subtracts is calculated as (t-1) we。
(5) stage is slightly drawn:Reduce the weight on super side, will contact close node merging, construction scale it is smaller have no right surpass
Figure so that the minification between adjacent two layers hypergraph reaches the minification of setting, the minification is adjacent two layers hypergraph node
Reduced number of percentage.
(6) initial phase:The smaller hypergraph of having no right of scale obtained by step (5) is subjected to initial division, obtains most initial
K subgraph, division methods are random division;
(7) optimizing phase:By step (6) to k subgraph random selection fission node after carry out fission reduction successively,
A series of hypergraphs are constructed, until scale reaches the original scale for having no right hypergraph, k subgraph after being optimized.
Brief description of the drawings
Fig. 1 demand model legends
Fig. 2 bipartite graphs
Fig. 3 hypergraph models
Fig. 4 algorithm flow charts
Embodiment
The basic thought of this patent is, for the demand model of a determination, and data needed for pattern set up one according to demand
Individual demand model and the binary crelation of data center's data storage node.According to the binary crelation and the module proposed,
Build the Function Mapping relation that a data are stored in back end.It is described as follows.
One, data item and node
X represents to be stored in the set that m data is included on back end, and each task needs to transmit d from set X
Individual different data item.Assuming that mode requirement space isDemand model in practical application is the space
A subset, is usedRepresent.As shown in Figure 1.There are five data item, three different demand moulds in legend
Formula.
Y represents including the set of n memory node.Initially, it is assumed that each data item x ∈ X are stored in exclusive node y ∈
In Y.If crawling storage after data arrives the rule of node for D:x→y.Final purpose of the present invention is just to provide a suitable storage
Scheme, can provide an efficient D function.In addition, we use DyExpression is stored in node y data acquisition system.
Two, data are placed
1. demand factor
The data of data center are stored in, the input of another task may be output as, it is also possible to just local
Operation.Without loss of generality, back end demand initially accessed as demand source position, so, in a model, data center
Or node has two roles simultaneously:The source node location of demand model and the finish node position of data storage.
For each demand model completed on demand nodes y ∈ Y, its workload or demand factor are foreseeable
(Forecasting Methodology is very ripe, does not include herein) is designated as Rpy, according to demand rate make data storage decision-making.We define work
Measure or demand factor collection is combined into R={ Rpy|p∈P,y∈Y}.As shown in Fig. 2 this is the bipartite model of a hypothesis, data
Center and demand model are respectively the summit of bipartite graph, connect two kinds of side demand factor RpyRepresent, and assign weights.Meter
The total demand factors of each demand nodes y are calculated, are usedRepresent.For each demand model p, its total need is calculated
The rate is asked to be
2. module
The storage of data can influence the performance of system, and the validity and Consumer's Experience for showing as system postpone two aspects.
Relation between systematic function is deposited by observed data, we summarize two modules.
1) related data is altogether put
The system necessary processing time that the evaluation criterion of system effectiveness needs for the given workload of completion.In distribution system
In system, the system average time needed for completing a demand is not only relevant with the information content of reading, goes back and include each node
Processing expense total node number amount it is relevant.Define SpRepresent to complete the data volume needed for a demand model p, SpyRepresent in section
The workload needed for demand model p is completed in point y ∈ Y,SpyIt is to have data to deposit mapping function D:X → y is determined
A variable, SpIt is a constant.The system necessary time for being defined on node y ∈ Y portions or being fully finished a demand p is
Spy+λ·1(Spy), SpyRepresent the Conventional Time that process demand the needs, (S of λ 1py) represent process demand p routine operation needed for
The constant process time wanted, such as the connection of TCP.For the demand factor of different mode, the system total time of all demands is completed
ForIt is equivalent toService can be lifted by minimizing the formula
The validity of device, reduces expense.Requirement can be reached by putting the strong data of correlation altogether.One extreme case, a demand model
Required data item be stored on same node, the minimum time of the system needed for completing the demand be Rp(Spy+ λ), and for
Any given workloadFor a constant, so the overhead needed for completing a demand is
2) local data services
Demand nodes the different of back end position from needed for storing the demand can also influence the performance of system, show as
Relay traffic is produced, so we assign total relay traffic needed for completion demand as second criterion.It is defined as1(x∈Dy) represent whether data item x is stored in node y.
The final purpose of the present invention is to provide a kind of method of related data storage, increases the validity of system, reduces system
The expense of system, is specifically just to provide the data of optimization to the mapping function D of storage:x→y.Based on two above standard, most
The optimisation criteria of result function is set to C (D)=CA+αCL, α is the parameter of two standards of balance.
Three, hypergraphs are divided
1. the foundation of hypergraph model
All data item and back end are set to the vertex set V, V={ X, Y } of hypergraph.Included in super line set E
The mapping relations of all demand models and data item and node, E={ { ep|p∈P},{exy|x∈X,y∈Y}}.Every super side
E ∈ E are endowed a weight.Based on optimisation criteria C (D)=CA+αCL, weight is set to
As shown in figure 3, in hypergraph, there is two class nodes, memory node and data item, two class sides, the super side of demand model and back end
Super side.
2. the foundation that hypergraph is divided
Theorem:For output set I, by method described above it as a hypergraph.Hypergraph is divided into top
N set of point, then, obtains data and places function D.Define reduction weight is divided into H, and it is one to meet H=C (D)-B, B
Individual constant.
Prove:First, we discuss the super side e of demand modelpSubtract power, use HpRepresent.According to the definition of super side reduction,According toWe can obtainIt is a constant.Second, the less weight on the super side of back end is discussed.It is defined as Hxy.It is right
In any data item x, in hypergraph model, it is connected to all nodes.After hypergraph is divided, a node can only be connected.
Otherwise some set of division result will be connected by x.Consider that we are placed into each node among different set.
Assuming that data item x is ultimately connected to node fx.The super side of the back end related to x subtract power summation beTherefore,
By theorem, the result divided based on hypergraph, we will show that it is of equal value to reduce the weight of n k-path partitions with C (D)
's.
3. the step of hypergraph is divided
1) stage is slightly drawn:Close node will be contacted to merge, the smaller hypergraph of construction scale so that adjacent two layers hypergraph it
Between minification reach the minification of setting, the minification is the percentage that adjacent two layers hypergraph interstitial content reduces;
2) initial phase:The smaller hypergraph of having no right of step 1 gained scale is subjected to initial division, most initial k is obtained
Individual subgraph, division methods are random division;
3) optimizing phase:By step 2 to k subgraph random selection fission node after carry out fission reduction, structure successively
A series of hypergraphs are made, until scale reaches the original scale for having no right hypergraph, k subgraph after being optimized.
Its algorithm flow chart is as shown in Figure 4.
In summary, the present invention proposes a kind of optimization storage method of the related data divided based on hypergraph, improves and is
The validity of system, reduces Consumer's Experience delay.
Claims (1)
1. a kind of related data storage method based on hypergraph partitioning algorithm, comprises the following steps:
(1) task of processing data is needed for one, is called a demand model, the demand model needs multiple be stored in
The data of data center's node, after demand model is determined, predict its demand factor, it is assumed that each node demand factor is Rpy, this is needed
The total demand factor of modulus formula is
(2) setting of rate according to demand, selected metric standard, i.e. hypergraph divide standard, first be complete a demand needed for
Overhead CA, second is total relay traffic C needed for completion demandL, it is C (D)=C to draw moduleA+αCL, α is
Balance the parameter of two standard values between zero and one.
(3) hypergraph model is set up according to the criterion of step 2, all data item and back end is set to the top of hypergraph
The mapping relations of all demand model and data item and node, every super side e ∈ E quilt are included in point set V, super line set E
A weight is assigned, is each weight assignment w based on the module in (2)e, in hypergraph, there is two class nodes, storage section
Point and data item, two class sides, demand model it is super while and back end surpass while.
(4) it is n output set to refer to hypergraph vertex partition, and each summit is pertaining only to one in n set, reduces hypergraph and draws
The weight divided is calculated as reducing the summation of super side right weight, if the summit on a super side is not pertaining only to a set, this super side
It is cut up, super side e summit is fallen in t set, then its power that subtracts is calculated as (t-1) we。
(5) stage is slightly drawn:Reduce the weight on super side, will contact close node merging, construction scale it is smaller have no right hypergraph, make
The minification that the minification between adjacent two layers hypergraph reaches setting is obtained, the minification is that adjacent two layers hypergraph interstitial content subtracts
Small percentage.
(6) initial phase:The smaller hypergraph of having no right of scale obtained by step (5) is subjected to initial division, most initial k are obtained
Subgraph, division methods are random division;
(7) optimizing phase:By step (6) to k subgraph random selection fission node after carry out fission reduction successively, construct
A series of hypergraphs, until scale reaches the original scale for having no right hypergraph, k subgraph after being optimized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710388857.9A CN107273207A (en) | 2017-05-25 | 2017-05-25 | A kind of related data storage method based on hypergraph partitioning algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710388857.9A CN107273207A (en) | 2017-05-25 | 2017-05-25 | A kind of related data storage method based on hypergraph partitioning algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107273207A true CN107273207A (en) | 2017-10-20 |
Family
ID=60065723
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710388857.9A Pending CN107273207A (en) | 2017-05-25 | 2017-05-25 | A kind of related data storage method based on hypergraph partitioning algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107273207A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108510205A (en) * | 2018-04-08 | 2018-09-07 | 大连理工大学 | A kind of author's technical capability evaluation method based on hypergraph |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100318963A1 (en) * | 2009-06-15 | 2010-12-16 | Microsoft Corporation | Hypergraph Implementation |
KR101417757B1 (en) * | 2009-10-30 | 2014-07-14 | 에스케이플래닛 주식회사 | Apparatus and method for learning and applying hypergraph language model, and apparatus and method for updating hypergraph language model |
CN104809242A (en) * | 2015-05-15 | 2015-07-29 | 成都睿峰科技有限公司 | Distributed-structure-based big data clustering method and device |
CN105279524A (en) * | 2015-11-04 | 2016-01-27 | 盐城工学院 | High-dimensional data clustering method based on unweighted hypergraph segmentation |
CN105681052A (en) * | 2016-01-11 | 2016-06-15 | 天津大学 | Energy-saving method for data center distributed file storage |
-
2017
- 2017-05-25 CN CN201710388857.9A patent/CN107273207A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100318963A1 (en) * | 2009-06-15 | 2010-12-16 | Microsoft Corporation | Hypergraph Implementation |
KR101417757B1 (en) * | 2009-10-30 | 2014-07-14 | 에스케이플래닛 주식회사 | Apparatus and method for learning and applying hypergraph language model, and apparatus and method for updating hypergraph language model |
CN104809242A (en) * | 2015-05-15 | 2015-07-29 | 成都睿峰科技有限公司 | Distributed-structure-based big data clustering method and device |
CN105279524A (en) * | 2015-11-04 | 2016-01-27 | 盐城工学院 | High-dimensional data clustering method based on unweighted hypergraph segmentation |
CN105681052A (en) * | 2016-01-11 | 2016-06-15 | 天津大学 | Energy-saving method for data center distributed file storage |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108510205A (en) * | 2018-04-08 | 2018-09-07 | 大连理工大学 | A kind of author's technical capability evaluation method based on hypergraph |
CN108510205B (en) * | 2018-04-08 | 2021-07-16 | 大连理工大学 | Author skill evaluation method based on hypergraph |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108924198B (en) | Data scheduling method, device and system based on edge calculation | |
CN109033234B (en) | Streaming graph calculation method and system based on state update propagation | |
CN104754053B (en) | A kind of distributed software defines network and the wherein method of dynamic control controller | |
CN103179052A (en) | Virtual resource allocation method and system based on proximity centrality | |
CN103188346A (en) | Distributed decision making supporting massive high-concurrency access I/O (Input/output) server load balancing system | |
CN105824686A (en) | Selecting method and selecting system of host machine of virtual machine | |
CN105843679A (en) | Adaptive many-core resource scheduling method | |
CN107450855A (en) | A kind of model for distributed storage variable data distribution method and system | |
CN112333260A (en) | Cloud computing task scheduling method and cloud computing system | |
CN105302858A (en) | Distributed database system node-spanning check optimization method and system | |
CN104536831B (en) | A kind of multinuclear SoC software image methods based on multiple-objection optimization | |
CN116706917A (en) | Intelligent park collaborative regulation and control method and system based on rapid alternating direction multiplier method | |
CN108595255A (en) | Workflow task dispatching method based on shortest path first in geographically distributed cloud | |
CN105471893A (en) | Distributed equivalent data stream connection method | |
CN111324429A (en) | Micro-service combination scheduling method based on multi-generation ancestry reference distance | |
CN110689174A (en) | Personnel route planning method and device based on public transport | |
CN114239960A (en) | Distribution network project group progress management method and system based on dynamic resource optimization | |
CN101141315A (en) | Network resource scheduling simulation system | |
CN112948123B (en) | Spark-based grid hydrological model distributed computing method | |
CN108764510B (en) | Urban rail transit parallel simulation task decomposition method facing large-scale road network | |
CN112162837B (en) | Edge calculation scheduling method and system based on software definition | |
CN107273207A (en) | A kind of related data storage method based on hypergraph partitioning algorithm | |
CN106681795A (en) | Virtual network mapping method utilizing local topological attributes of nodes and available resource capacity values | |
CN116303219A (en) | Grid file acquisition method and device and electronic equipment | |
CN103984737A (en) | Optimization method for data layout of multi-data centres based on calculating relevancy |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
AD01 | Patent right deemed abandoned |
Effective date of abandoning: 20210209 |
|
AD01 | Patent right deemed abandoned |