CN108153585A - A kind of method and apparatus of the operational efficiency based on locality expression function optimization MapReduce frames - Google Patents

A kind of method and apparatus of the operational efficiency based on locality expression function optimization MapReduce frames Download PDF

Info

Publication number
CN108153585A
CN108153585A CN201711249478.8A CN201711249478A CN108153585A CN 108153585 A CN108153585 A CN 108153585A CN 201711249478 A CN201711249478 A CN 201711249478A CN 108153585 A CN108153585 A CN 108153585A
Authority
CN
China
Prior art keywords
locality
function
data
value
expression function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711249478.8A
Other languages
Chinese (zh)
Other versions
CN108153585B (en
Inventor
汪小林
潘成
陈峯
陈一峯
罗英伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201711249478.8A priority Critical patent/CN108153585B/en
Publication of CN108153585A publication Critical patent/CN108153585A/en
Application granted granted Critical
Publication of CN108153585B publication Critical patent/CN108153585B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of method and apparatus of the operational efficiency based on locality expression function optimization MapReduce frames.This method establishes locality expression function before the Map stages, and locality expresses function and expresses locality by the magnitude relationship of real number value;In the Map stages according to the loading of the value progress data of the locality expression function calculated, different data is distributed into different calculate nodes to carry out operation according to the numerical intervals of locality expression function.In mostly wheel MapReduce tasks, the value of locality expression function is subsidiary in Map the and Reduce stages as intermediate result, and it is transmitted between each calculate node, to iterate to calculate the value of locality expression function, after the convergence threshold of setting is reached, the redistribution of data is carried out according to the numerical intervals of locality expression function.The purpose of the present invention is the operational efficiency of function optimization MapReduce frames is expressed by locality.

Description

A kind of operational efficiency based on locality expression function optimization MapReduce frames Method and apparatus
Technical field
The present invention relates to a kind of methods and dress of the operational efficiency based on locality expression function optimization MapReduce frames It puts, belongs to distributed parallel programming field.
Technical background
The basic programming model of MapReduce extends traditional multiple programming by managing the grouping of key.Spark then leads to It crosses program control and has further expanded MapReduce, target is to maximize memory to calculate, and reduces the IO of disk.But in Spark Communications cost accounts for about 80%, so traffic load is localized and reduced using data becomes principal concern.
Current processing method includes two kinds.The system as Hadoop and Spark allows data partition, distribution and row The code of sequence is by user's heavy duty.But this wide range of development technique using back door function is unwelcome, because they It is inconsistent with coding style succinct in MapReduce model.Have some research shows that, with offline figure partitioning algorithm pre-process number Locality can be improved according to collection.Unfortunately, iterative calculation task usually requires (such as the web page interlinkage of different types of locality Locality is the page that page link is gone out, and the locality of webpage statistics is web page contents similarity degree).Therefore, allow to count Locality rather than the fixed preprocess method that places one's entire reliance upon is controlled to be advantageous according to treatment people.Data processing personnel also may be used It can wish control to find the complexity of locality approximate indication, because it is typically NP difficult to find optimal solution.
In the past, key (keys) plays two roles:1) the identical key-value pair of key is combined;2) according to the value of key Sequence.The former is related to semantic correctness, and the latter influences the traffic load of iterative calculation.The two roles are being semantically phases Mutually wind and obscure.Many keys among these, such as be for identification data item, without expressing bottom in PageRank Any locality information of Web link figure will lead to high communications cost by the data value sequence of these keys.Carry out table using key Classification and locality up to key assignments group there is also an issue that key may have different length (such as in word frequency statistics), And may be more than the size of total memory --- this will lead to some performance issues occur in sequence and load balance.
Invention content
The purpose of the present invention is expressing the operational efficiency of function optimization MapReduce frames by locality, allow data Treatment people controls data locality in the calculating of iteration memory and reduces the communication of shuffle operation (shuffle operation) Cost.The style of this extended operation and MapReduce are completely the same, can be directly various based on MapReduce frames It is quickly realized in distributed system.
The present invention according to the styles of existing MapReduce normal forms, propose a kind of entitled Locality Keys (locality key, Abbreviation Loceys) programming concept, to illustrate effect of the key in MapReduce.Different from key, (one-dimensional) Locey is Some really calculating start before by Map operations come it is calculating or only as a part for initial data calculate float Points or fixed-point number.The effect of Locey is exactly the relative position for illustrating some data in other data --- i.e. locality. In practice, programmer can determine the calculation of Locey in experiment by continuous trial and error.Since Locey values always have Fixed size, so can assume that the Locey values of all variables in data set can be placed into machine naturally here In memory, and it is polymerize in memory, sampled and is sorted.
A kind of method of operational efficiency based on locality expression function optimization MapReduce frames of the present invention, master Want step as shown in Figure 1, including:
1) before the Map stages, programmer is according to the experience of oneself or the mode of trial and error, and design one is with pending number Function Locey is expressed as the locality of output according to as input, real number, the real number value which calculates passes through magnitude relationship table Up to locality --- i.e. when Locey (x) and Locey (y) close (in exponential quantity close), data x and data y should be more placed on Operation in identical calculate node;
2) it by the Locey values calculated before the Map stages, can select to be useable immediately for the loading of guide data, by different numbers Operation (being distributed according to Loecy numerical intervals) is carried out according to different calculate nodes is distributed to.
Further, in mostly wheel MapReduce tasks, can also using Locey values as intermediate result it is subsidiary Map with The Reduce stages transmit between each calculate node, iterate to calculate Locey values, reach one setting convergence threshold it Afterwards, it is redistributed according still further to the Locey values size of each data according to Loecy numerical intervals guide data.
Locality expression function in the present invention can be by programmer according to the experience of oneself or by the way of trial and error It establishes, is respectively described below:
1) according to understanding of the programmer to data, by experience, design one is from initial data to turn of a real number value Exchange the letters number, the transfer function can excavate out the locality of data distribution, and the division of guide data allows Map the and Reduce stages The traffic and the performance bottlenecks such as disk be resolved.
2) some can not by virtue of experience be designed with the situation of locality expression function, then take the mode of trial and error, pass through Before specific tasks execution, sampling analysis is carried out to data to be treated and adds the means such as experiment, it is didactic to design some Function simultaneously verifies its feasibility, selects a preferable heuristic function of effect and expresses function as locality.
Before Locey is defined, the present invention will be abstracted the processing scheme of key assignments in traditional MapReduce frames first To gather the expression with function.This helps to understand hereinafter to the abstract connection with existing MapReduce frames of Locey functions System.
Enabling (k, v) ∈ Key × Val, wherein "×" represents cartesian product for key-value pair:
1) the equivalent functions Eq of key:Key × Key → Bool is the equation by binary digit --- i.e. two Key are equal Necessary and sufficient condition is that all binary digits of two Key are all equal, this is the necessary operation in Reduce stages;
2) hash function Hash:One key is mapped to some fixed-point number by Key → HashValue, and identical key must produce Raw identical cryptographic Hash;Wherein HashValue represents the value set of cryptographic Hash;
3) bond order function Leq:Key × Key → Bool is to be less than or equal to relationship by one kind that programmer defines, can sentence Whether a fixed Key is less than or equal to another Key, ensures finally form a total sequence in the Shuffle stages in this way.
These functions may be considered object-oriented method that can be heavily loaded in practice.For example, sort to a pile key-value pair, All keys are ranked up first, in accordance with Leq, then local value is merged to and is sent to the mesh that another wheel of execution locally sorts Mark node.
On the frame of traditional MapReduce, Locey functions can be expanded to wherein with a kind of naturally mode, Eq, Hash and Leq function are defined with same form.In some sense, Locey is that key is divided into two parts:Conventional key With locality key.But here using another form of presentation:One key is considered as a part for data value.Such key assignments binding Value in set Uni be known as monodrome (Univalue), wherein set Uni refers to the value being newly worth after key-value pair is bundled Space.In order to by packet, it is necessary to provide a relationship between expression Sim for whether two monodromes are similar:Uni × Uni → Bool, For quickly detecting the equivalence of monodrome key (judging whether two Uni data should be assigned to same calculate node).
Further need exist for (one-dimensional) Locey functions Locey:Uni → Real believes for being extracted from Univalue Breath.Real refers to real number value, such as extracts IP address and switching to binary system real number from webpage.
The equivalent functions Eq of monodrome (Univalue) is defined as the combination of several functions mentioned above, the sequence letter of monodrome Number is defined as the size order of Locey values:
eq(u1,u2)≡sim(u1,u2)
∧ hash (u1)=hash (u2)
∧ locey (u1)=locey (u2)
leq(u1,u2)≡locey(u1)≤locey(u2)
Wherein, " ≡ " be mathematical meaning on " identical " relationship, " ∧ " be mathematical meaning on " and " relationship.
At this point, traditional key assignments judges just to become a special circumstances, wherein similarity Sim is defined as Univalue pairs As the equivalence test on middle key attributes, hash function is only applicable to key attributes, and Locey functions allow key to be arranged Sequence.This special circumstances are predefined as default method, so as to data processing personnel can simply assume it is right before them The understanding of key.By new representation method, can another key be switched to realize from a key attribute by method overloading Attribute, without performing any practical operation.
This new pattern also opens wider Univalue occupation modes:For example, if the value of processing is triangle Shape, then the similarity function between triangle is exactly their similarity degrees geometrically, and each Locey functions can be defined For the maximum angle of triangle, and performed between similar triangle and merge compression.In this case, similar triangle Between express a triangle without apparent representative attribute, but the thinking of this trial but can be in practical application It is middle that us is helped to define the similitude between such as webpage.
Function optimization MapReduce frames are expressed based on locality corresponding to the above method the present invention also provides a kind of The device of operational efficiency, including:
Locality expression function establish module, for before the Map stages, establish using pending data as input, with Real number expresses function as the locality of output, and the locality expression function expresses part by the magnitude relationship of real number value Property;
Data allocation module, for carrying out adding for data according to the value of locality expression function calculated in the Map stages It carries, different data is distributed into different calculate nodes to carry out operation according to the numerical intervals of locality expression function.
Further, iterative calculation module is further included, in mostly wheel MapReduce tasks, locality to be expressed letter Several values is subsidiary in Map the and Reduce stages as intermediate result, and is transmitted between each calculate node, to iterate to calculate office Portion's property expresses the value of function, to express function in the convergence threshold for reaching setting and then according to the locality of each data The size of value carries out the redistribution of data according to the numerical intervals of locality expression function.
Compared with prior art, the invention has the advantages that:
1) locality of data can be fully excavated, accelerates the calculating time of MapReduce tasks;And this locality Excavation mode be elastic, user can need specified different locality to express function according to oneself.
2) big data treatment people is enhanced to data handling procedure control.In many cases, data processing personnel are It can empirically or convention, it is known that how data are distributed can be so that the efficiency calculated improves, and locality expression function is just Such a mode is easily given, data processing personnel can be caused to come redistribution data, intervention data iterative process etc., finally Achieve the purpose that optimization processing efficiency.
Description of the drawings
Fig. 1 is the key step flow chart of the method for the present invention.
Fig. 2 is link between webpage with the changes in distribution figure after Locey iteration.
Fig. 3 is the traffic of preceding 8 step in 100 step PageRank relative to the ratio chart of no redistribution data.
Specific embodiment
From above to the introduction of Locey, it can be found that the unified form of expression of Locey functions:
Locey:Uni → Real, wherein Uni are the Univalue after key-value pair binding, and Real represents real number.
The function representation extracts the information of locality from initial data, and is mapped in real number field, passes through Locey real numbers Between magnitude relationship divide data.But it is specific in practical application, how to design Locey functions, how to calculate Locey Value is very flexible.Specifically show that locality expresses function how under MapReduce frames below by two embodiments Using and optimize computational efficiency.
Embodiment 1:The locality calculated using Locey expression PageRank
Here the effect for iterating to calculate PageRank in memory using Locey is shown.Used data set is " web-Google " has 88K node and 511K side in figure.Cluster has 16 or 32 servers, and data are uniformly distributed.It is logical Believe that cost corresponds to the quantity on side between node.The target of optimization is an attempt to will be in connected node motion to same server Same subregion.Such locality is represented as the Locey floating-point values on each node.Subregion and distribution follow Locey values Sequence, this is also required to iterative calculation to polymerize adjacent but positioned at different subregions nodes (see Fig. 2).Fig. 2 performance be Link between webpage, the depth represent link number, abscissa and ordinate all represent Locey values, whenever between two webpages There is link, then their Locey values are formed into a two-dimensional coordinate, in figure one stain of acceptance of the bid.During initial niters=0 (see (a) figure), random alignment between webpage.After 1 wheel Locey is calculated and is sorted by Locey (niters=1 is shown in (b) figure), The link closed between webpage is significantly more than between remote webpage.(the niters=after 8 wheel Locey are calculated and are sorted by Locey 8, see (c) figure), link, which focuses primarily upon, between webpage closes between webpage.Herein, the computational methods of Locey rough calculations are letters Single ground is averaged to obtain (including its own) by the Locey values of adjacent node, iteratively calculates and is walked in former step PageRank The fluctuation situation of communication side ratio in rapid.
It is different that basic Locey computational methods, which treat all nodes,.By repetition test, programmer is easy to realize It should make great efforts adjacent multiple knot of the polymerization with height inwardly or outwardly edge (and high potential communications cost) to system, so as to Lighter node can be dragged to adjacent heavy center.It is desirable that there are a kind of mode, node that cohesion degree can be allowed high and its The Locey values of neighbouring point as close possible to.One basic method is to calculate the barycenter of adjacent node, and make by quality Weight for iteration Locey.
First provide following several formula:
Vu={ v:(u,v)∈E|(v,u)∈E}
locey0(u)=RANDOM
Wherein, the point set in web page interlinkage figure, i.e. collections of web pages are represented;E represents the side collection in web page interlinkage figure, that is, links Set of relationship;VuRepresent the neighborhood of node u, i.e., the node and all node sets for being directed toward u that all u are directed toward;V, u represent webpage Any one point, i.e. any one webpage in linked, diagram;(u, v) represents that webpage u has a connection for being directed toward webpage v, (v, u) table Show that webpage v has a connection for being directed toward webpage u;locey0Represent the 0th wheel Locey values, that is, the Locey values initialized;RANDOM Represent a random function, a random number can all be generated every time by calling;loceykAfter representing kth wheel iteration, Ge Gejie The locey functional values of point.Weight (v) is the weight of user-defined node, can be the number of degrees, the number of degrees square etc. Value, is that experimental effect carrys out determining value.
Enable the weight that weight is each node.Locey (v) is pressed according to all Locey values of the neighborhood node of node v Weight values carry out " weighted average " and calculate.In some sense, basic unweighted method uses weight0, and weighting side Method uses weight1.Facts proved that weighted average can quickly reduce communication.In order to place more power on multiple knot Weight, programmer can be considered using high math power weightnpow(wherein npow=2,3 ...) attempts weight.Fig. 3 is shown more High-grade weight further improves the reduction speed of communications cost.How many power taken to weight functions for wherein npow expressions It is iterated, npow is 0 in (a) figure, and npow is 1 in (b) figure, and npow is 5 in (c) figure.Nparts represents the block that data divide Number, that is, distribute how many a calculate nodes.Meanwhile Locey values carry out recursion in each round iteration using last round of result, And the locey values according to obtaining determine whether present node needs to be transferred in other fragment again.By several wheels early period Iteration, the value of Locey can gradually restrain, and then stop the iterative calculation operation of Locey at this time, and it is preferable to obtain vertex distribution Locality.
Embodiment 2:Use the locality of Locey expression sort algorithms
It sorts, shuffling is often thought of as the benchmark test of MapReduce frame performances.Such as on Spark, sequence The data (memory size for being far longer than real machine) of 100TB, it will generate 500TB magnetic disc i/os and the communication of 200TB data (including fault tolerant mechanism).
In such a scenario, it is desirable to initial data can in advance be sorted using Locey so that sort later Data have had significantly locality, so as to reduce the IO of disk and data communication.It can be big from the definition of Locey Its realization is found out in cause.There are one shorter fixed dimensions (for example, 4 byte floating numbers or integer) for each Locey values.Therefore, It assume that all Locey values may be stored in memory and be ranked up.Example whereby is needed exist for come illustratively The way of realization of Locey functions.
Assuming that each key length is the character types of 32, each character length is 8 bits.Now by Locey functions It is defined as:
Locey (S)=Π biti[0], wherein S=Π biti[0,1,2,…7]
Wherein, Locey (S) represents the Locey values of character string S, and S represents the character string of a length 32, bitiRepresent S In i-th character bit composition array, ∏ represents character concatenation.
That is, the highest order of each character is taken out, it is stitched together, that is, obtains LoceyValue.Such table It reaches, the key that script length is allowed to be 32 bytes, becoming the integer of 4 bytes can just represent.Compression ratio has once reached 1/8. The locality of original key is also maintained simultaneously.
In addition, before data block (being each 64MB with fixed common size) is dumped to disk, overall situation sequence Can partly it start in Map before.Consider that there are 100TB data on 200 nodes, each node has more than 20GB Memory (or 4TB in total).If the Univalue of each 100 byte includes a 4 byte LoceyValue, memory can be lucky All Locey values are preserved, but fail to lay down traditional key assignments of any bigger.
Sequence is divided into two stages.In the first stage, data block is grouped into the set of 100TB/4TB=25 blocks so that Each block can be completely stored in total memory, and sorting in parallel is simultaneously evenly distributed on the disk of cluster.For any given LoceyValue, a block collection include smaller piece of some Locey, block larger some Locey, while ensure most blocks, Range includes given Locey.Then the Locey values of first stage all data are extracted, and are remained in memory, are arranged Sequence is to determine to shuffle the target Locey ranges of rear block.Second stage performs sequence to each object block on each destination node. 16 blocks that include the Locey value of the maximum value of the range of block at most from set of blocks.Such block will be retained in memory In, until being stored entirely in external memory until sequence.Therefore, minimum space is about 50 blocks (or 3GB) on each node, The algorithm can perform sequence and storage parallel.
It note that the first stage can be operated by Map before storing to perform along band.Therefore, if data are every A block concentrates sequence, and following Reduce only needs a read operation and a write operation to amount to the magnetic disc i/o of 200TB;Otherwise, such as Fruit data are completely unsorted, then only need the magnetic disc i/o of 400TB.
Another advantage using Locey values rather than key assignments is that Locey functions allow to arrange multidimensional Locey values Sequence.If more than one attribute preserves the information in relation to locality, various dimensions Locey values become convenient, and by Locey letters Number distributes to each attribute.Peano-Hilbert space filling curves can be used (very common in cosmology is calculated during operation A kind of method), multiple dimensional distribution is converted into linear domain, and keep locality.It is all these to data treatment people simultaneously All it is transparent.
The above embodiments are merely illustrative of the technical solutions of the present invention rather than is limited, the ordinary skill of this field Personnel can be modified or replaced equivalently technical scheme of the present invention, without departing from the spirit and scope of the present invention, this The protection domain of invention should be subject to described in claims.

Claims (10)

1. a kind of method of the operational efficiency based on locality expression function optimization MapReduce frames, step include:
1) it before the Map stages, establishes the locality using pending data as input, using real number as output and expresses function, The locality expression function expresses locality by the magnitude relationship of real number value;
2) in the Map stages, the loading of data is carried out according to the value of locality expression function calculated, by different data according to office The numerical intervals of portion's property expression function distribute to different calculate nodes to carry out operation.
2. the method as described in claim 1, which is characterized in that in mostly wheel MapReduce tasks, locality is expressed into function Value it is subsidiary in Map the and Reduce stages as intermediate result, and transmitted between each calculate node, to iterate to calculate part Property expression function value, the convergence threshold for reaching setting and then according to each data locality expression function value it is big It is small, the redistribution of data is carried out according to the numerical intervals of locality expression function.
3. the method as described in claim 1, which is characterized in that according to understanding of the programmer to data and experience, design from original Beginning data express function as locality, so as to excavate out the locality of data distribution, instruct number to the transfer function of real number value According to division.
4. the method as described in claim 1, which is characterized in that for can not by virtue of experience design the feelings of locality expression function Shape takes the mode of trial and error, by carrying out sampling analysis and experiment to data to be treated before specific tasks perform, if It counts some didactic functions and verifies its feasibility, therefrom select a preferable heuristic function of effect as locality table Up to function.
5. the method as described in claim 1, which is characterized in that the definition of locality expression function is:
Locey:Uni → Real,
Wherein Uni is the monodrome after key-value pair binding, and Real represents real number;The function representation extracts part from initial data Property information, and be mapped in real number field, the magnitude relationship passed through between real number divides data.
6. the method as described in claim 1, which is characterized in that for PageRank calculating process, the locality expresses letter Several calculation formula are:
Vu={ v:(u, v) ∈ E | (v, u) ∈ E },
locey0(u)=RANDOM,
Wherein, V represents the point set in web page interlinkage figure, i.e. collections of web pages;E represents the side collection in web page interlinkage figure, i.e. link is closed Assembly is closed;VuRepresent the neighborhood of node u, i.e., the node and all node sets for being directed toward u that all u are directed toward;V, u represent webpage chain Any one point, i.e. any one webpage in map interlinking;(u, v) represents that webpage u has a connection for being directed toward webpage v, and (v, u) is represented Webpage v has a connection for being directed toward webpage u;locey0Represent the value of the 0th wheel locality expression function, that is, the locality initialized Express the value of function;RANDOM represents a random function, and a random number can all be generated every time by calling;loceykRepresent the After k wheel iteration, the value of the locality expression function of each node;Weight (v) is the weight of user-defined node.
7. the method as described in claim 1, which is characterized in that for sequencer procedure, it is assumed that each key length is 32 Character types, each character length is 8 bits, then locality expression function is defined as:
Locey (S)=∏ biti[0], wherein S=∏ biti[0,1,2 ... 7],
Wherein, Locey (S) represents the Locey values of character string S, and S represents the character string of a length 32, bitiIt represents i-th in S The array of the bit composition of a character, ∏ represent character concatenation.
8. the method as described in claim 1 or 7, which is characterized in that for sequencer procedure, preserved using more than one attribute Then information in relation to locality is ranked up the value of the locality of multidimensional expression function.
A kind of 9. device of the operational efficiency based on locality expression function optimization MapReduce frames, which is characterized in that packet It includes:
Locality expression function establishes module, for before the Map stages, establishing using pending data as input, with real number Locality as output expresses function, and the locality expression function expresses locality by the magnitude relationship of real number value;
Data allocation module, will for the loading in the Map stages according to the value progress data of the locality expression function calculated Different data distributes to different calculate nodes to carry out operation according to the numerical intervals of locality expression function.
10. device as claimed in claim 9, which is characterized in that iterative calculation module is further included, for more taking turns It is in MapReduce tasks, the value of locality expression function is subsidiary in Map the and Reduce stages as intermediate result, and each It is transmitted between a calculate node, to iterate to calculate the value of locality expression function, so as to after the convergence threshold of setting is reached, According still further to the size of the value of the locality expression function of each data, data are carried out according to the numerical intervals of locality expression function Redistribution.
CN201711249478.8A 2017-12-01 2017-12-01 Method and device for optimizing operation efficiency of MapReduce framework based on locality expression function Active CN108153585B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711249478.8A CN108153585B (en) 2017-12-01 2017-12-01 Method and device for optimizing operation efficiency of MapReduce framework based on locality expression function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711249478.8A CN108153585B (en) 2017-12-01 2017-12-01 Method and device for optimizing operation efficiency of MapReduce framework based on locality expression function

Publications (2)

Publication Number Publication Date
CN108153585A true CN108153585A (en) 2018-06-12
CN108153585B CN108153585B (en) 2021-08-20

Family

ID=62465993

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711249478.8A Active CN108153585B (en) 2017-12-01 2017-12-01 Method and device for optimizing operation efficiency of MapReduce framework based on locality expression function

Country Status (1)

Country Link
CN (1) CN108153585B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177103A (en) * 2019-12-26 2020-05-19 北京亚信数据有限公司 Hadoop-based MapReduce framework data association method
CN111413974A (en) * 2020-03-30 2020-07-14 清华大学 Automobile automatic driving motion planning method and system based on learning sampling type
CN111930731A (en) * 2020-07-28 2020-11-13 苏州亿歌网络科技有限公司 Data dump method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102737114A (en) * 2012-05-18 2012-10-17 北京大学 MapReduce-based big picture distance connection query method
US20130024412A1 (en) * 2011-06-28 2013-01-24 Salesforce.Com, Inc. Methods and systems for using map-reduce for large-scale analysis of graph-based data
CN106250240A (en) * 2016-08-02 2016-12-21 北京科技大学 A kind of optimizing and scheduling task method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130024412A1 (en) * 2011-06-28 2013-01-24 Salesforce.Com, Inc. Methods and systems for using map-reduce for large-scale analysis of graph-based data
CN102737114A (en) * 2012-05-18 2012-10-17 北京大学 MapReduce-based big picture distance connection query method
CN106250240A (en) * 2016-08-02 2016-12-21 北京科技大学 A kind of optimizing and scheduling task method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
汪小林等: "LBS-p:一个支持在线地图服务的LBS支撑平台", 《计算机科学与探索》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177103A (en) * 2019-12-26 2020-05-19 北京亚信数据有限公司 Hadoop-based MapReduce framework data association method
CN111413974A (en) * 2020-03-30 2020-07-14 清华大学 Automobile automatic driving motion planning method and system based on learning sampling type
CN111413974B (en) * 2020-03-30 2021-03-30 清华大学 Automobile automatic driving motion planning method and system based on learning sampling type
CN111930731A (en) * 2020-07-28 2020-11-13 苏州亿歌网络科技有限公司 Data dump method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN108153585B (en) 2021-08-20

Similar Documents

Publication Publication Date Title
CN103699606B (en) A kind of large-scale graphical partition method assembled with community based on summit cutting
CN108764273A (en) A kind of method, apparatus of data processing, terminal device and storage medium
CN102737126B (en) Classification rule mining method under cloud computing environment
CN108153585A (en) A kind of method and apparatus of the operational efficiency based on locality expression function optimization MapReduce frames
CN112598080A (en) Attention-based width map convolutional neural network model and training method thereof
CN107391549A (en) News based on artificial intelligence recalls method, apparatus, equipment and storage medium
CN107911300B (en) Multicast routing optimization method based on whale algorithm and application of multicast routing optimization method on Spark platform
CN104731925A (en) MapReduce-based FP-Growth load balance parallel computing method
CN114282678A (en) Method for training machine learning model and related equipment
CN107908796A (en) E-Government duplicate checking method, apparatus and computer-readable recording medium
CN112819157A (en) Neural network training method and device and intelligent driving control method and device
CN115965058A (en) Neural network training method, entity information classification method, device and storage medium
Ali et al. Improved differential evolution algorithm with decentralisation of population
CN104090995B (en) The automatic generation method of rebar unit grids in a kind of ABAQUS tire models
CN108389152A (en) A kind of figure processing method and processing device of graph structure perception
CN107291935A (en) The CPIR V arest neighbors privacy protection enquiring methods encoded based on Spark and Huffman
CN110502611A (en) Character string retrieving method and device
Govada et al. Distributed multi-class rule based classification using ripper
CN109977977A (en) A kind of method and corresponding intrument identifying potential user
CN111984842B (en) Bank customer data processing method and device
CN107168795B (en) Codon deviation factor model method based on CPU-GPU isomery combined type parallel computation frame
He et al. A nearly optimal parallel algorithm for constructing depth first spanning trees in planar graphs
CN102779025A (en) Parallel PLSA (Probabilistic Latent Semantic Analysis) method based on Hadoop
Plant et al. Data compression as a comprehensive framework for graph drawing and representation learning
CN105989284B (en) The recognition methods and equipment of homepage invasion script feature

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant