CN103309942B - A kind of scheduler and reduce the method for redundancy overhead in asynchronous iteration process - Google Patents

A kind of scheduler and reduce the method for redundancy overhead in asynchronous iteration process Download PDF

Info

Publication number
CN103309942B
CN103309942B CN201310173239.4A CN201310173239A CN103309942B CN 103309942 B CN103309942 B CN 103309942B CN 201310173239 A CN201310173239 A CN 201310173239A CN 103309942 B CN103309942 B CN 103309942B
Authority
CN
China
Prior art keywords
data
data group
group
pri
submodule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310173239.4A
Other languages
Chinese (zh)
Other versions
CN103309942A (en
Inventor
廖小飞
金海�
张宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201310173239.4A priority Critical patent/CN103309942B/en
Publication of CN103309942A publication Critical patent/CN103309942A/en
Application granted granted Critical
Publication of CN103309942B publication Critical patent/CN103309942B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of method reducing redundancy overhead in asynchronous iteration process, comprise the following steps: set up a Hash table, the corresponding data group of each list item, wherein each list item comprises again three territories, receive the data D coming from message receiver, according to the ITC value of this data D and the weights Pri (D) of this data D of IN value calculating, judge whether to exist the data group G (D) with this data D with identical key assignments in Hash table to exist, if exist, upgrade weights and the data list of these data group G (D), otherwise in Hash list, create the data group G (D) of key assignments identical with this data D, and carry out initialization, judge that whether task performer is idle, if it is from Hash table, select a data group of maximum weight, send data corresponding for this data group to task performer process.This method can solve the redundant computation existed in existing method and the problem that communication overhead is large, computer resource is wasted, the speed of convergence of iterative computation is slow.

Description

A kind of scheduler and reduce the method for redundancy overhead in asynchronous iteration process
Technical field
The invention belongs to large data processing field, more specifically, relate to the method for redundancy overhead in a kind of scheduler and the process of minimizing asynchronous iteration thereof.
Background technology
Asynchronous iteration process is prevalent in web app, in the field such as data mining and scientific algorithm, such as: the PageRank algorithm in Web search engine, Adsorption algorithm in link analysis and commending system, and for solving the Jacobi method of system of linear equations, the intermediate result produced in its permission previous iteration is used to the calculating in current iteration immediately, accelerate the speed of convergence of iterative computation, and there is not problem of load balancing and cloud computing infrastructure is better applied.
But, when there being much available intermediate result available, asynchronous iteration process will due to a large amount of computing cost unnecessary and communication overhead in two reason triggering following iteration below: one) selection intermediate result blindly processes, and does not consider that each data are on the impact of speed of convergence with first process the expense that they will cause, two) for each intermediate result triggers a large amount of calculating and communication overhead in successive iterations cascade, thus make asynchronous iteration process bulk redundancy can be caused to calculate and communication overhead to most of iterated application, slow down the speed of convergence of asynchronous iteration process, waste a large amount of computer resource, in fact, these redundancies trigger to calculate to send out by scheduling and are avoided, but the dispatching algorithms for asynchronous iteration process all at present, such as: priority scheduling (Priorityscheduling), polling dispatching (Round-robinscheduling) etc., when selection intermediate result processes, do not consider the expense first processing these intermediate results.Simultaneously, these dispatching algorithms are not that the form organized carries out dispatching, thus need for the calculating in each intermediate result data cascaded triggering successive iterations with communicate, finally make to use the asynchronous iteration process of these dispatching algorithms still to there is calculating and the communication redundancy expense of a large amount of cascades for a lot of iterated application.But what these redundant computation and communication overhead can be a large amount of wastes computer resource, and the speed of convergence of iterative computation that slowed down.
Summary of the invention
For above defect or the Improvement requirement of prior art, the invention provides a kind of method reducing redundancy overhead in asynchronous iteration process, its object is to solve the redundant computation existed in existing method and the problem that communication overhead is large, computer resource is wasted, the speed of convergence of iterative computation is slow.
For achieving the above object, according to one aspect of the present invention, provide a kind of method reducing redundancy overhead in asynchronous iteration process, it is applied in a kind of scheduler, this scheduler is connected with task performer and message receiver communication respectively, and the method comprises the following steps:
(1) Hash table is set up, the corresponding data group of each list item, wherein each list item comprises again three territories: first territory is for storing the key assignments of data group, second territory is used for the weights of data group, 3rd territory is for storing the data list in data group, and data list comprises the value of data and the iteration level at data place;
(2) the data D coming from message receiver is received;
(3) according to the ITC value of this data D and the weights Pri (D) of this data D of IN value calculating, following sub-step is specifically comprised:
(3-1) ITC value ITC (D) and the IN value IN (D) of data D is calculated, wherein ITC (D)=± D, IN (D) is the information be recorded in data D, number of times handled during its original raw data being specially data D changes to data D;
(3-2) according to ITC(D) and IN (D) utilize following equation to calculate weights Pri (D): Pri (the D)=t of data D 1× ITC (D)+t 2× IN (D)/T, wherein t 1and t 2be respectively and represent ITC(D) and the weighted value of IN (D) importance, and its value is the decimal between 0 to 1, and T is the value adjusting IN (D) span, its span be greater than 1 integer;
(4) judge whether to exist the data group G (D) with this data D with identical key assignments in Hash table to exist, if exist, upgrade weights and the data list of these data group G (D), otherwise in Hash list, create the data group G (D) of key assignments identical with this data D, and carry out initialization
(5) judge that whether task performer is idle, if yes then enter step (6), otherwise return step (2);
(6) from Hash table, select a data group of maximum weight, send data corresponding for this data group to task performer process, then enter step (7);
(7) judge whether the application program run in task performer terminates, and if it is process terminates, otherwise proceed to step (8);
(8) judge whether also have untreated data group in Hash table, if had, return step (5), otherwise return step (2).
Preferably, step (4) comprises following sub-step:
(4-1) the key assignments D of data D is obtained key, hash function process is carried out to obtain a unique group identity K to this key assignments;
(4-2) inquire about in Hash table according to this unique identification K, to have judged whether that key assignments is for D keydata group G (D), if having, then proceed to step (4-3), otherwise proceed to step (4-4);
(4-3) data D insertion had key assignments D keydata group G (D) in, then proceed to step (4-5);
(4-4) creating a key assignments is D keydata G (D), and by data D data inserting group G (D), then turn to step (4-6);
(4-5) adopting following formula to upgrade key assignments in Hash table is D keythe weights of data group, Pri (G (D))=Pri (G (D))+Pri (D), then turns to step (5);
(4-6) weights of data group G (D) are set to Pri (D), then turn to step (5).
According to another aspect of the present invention, provide a kind of scheduler, it is connected with task performer and message receiver communication respectively, and this scheduler comprises:
First module, for setting up a Hash table, the corresponding data group of each list item, wherein each list item comprises again three territories: first territory is for storing the key assignments of data group, second territory is used for the weights of data group, 3rd territory is for storing the data list in data group, and data list comprises the value of data and the iteration level at data place;
Second module, for receiving the data D coming from message receiver;
3rd module, for calculating the weights Pri (D) of this data D according to the ITC value of this data D and IN value, specifically comprises following sub-step:
First submodule, for calculating ITC value ITC (D) and the IN value IN (D) of data D, wherein ITC (D)=± D, IN (D) is the information be recorded in data D, number of times handled during its original raw data being specially data D changes to data D;
Second submodule, for according to ITC(D) and IN (D) utilize following equation to calculate weights Pri (D): Pri (the D)=t of data D 1× ITC (D)+t 2× IN (D)/T, wherein t 1and t 2be respectively and represent ITC(D) and the weighted value of IN (D) importance, and its value is the decimal between 0 to 1, and T is the value adjusting IN (D) span, its span be greater than 1 integer;
Four module, exist for judging whether to exist the data group G (D) with this data D with identical key assignments in Hash table, if exist, upgrade weights and the data list of these data group G (D), otherwise in Hash list, create the data group G (D) of key assignments identical with this data D, and carry out initialization;
5th submodule, for judging that whether task performer is idle, if yes then enter the 6th module, otherwise returns the second module;
6th module, for selecting a data group of maximum weight from Hash table, sending data corresponding for this data group to task performer process, then entering the 7th module;
7th module, for judging whether the application program run in task performer terminates, and if it is process terminates, otherwise proceeds to the 8th module;
8th module judges whether also have untreated data group in Hash table, if had, returns the 5th module, otherwise returns the second module.
Preferably, four module comprises:
3rd submodule, for obtaining the key assignments D of data D key, hash function process is carried out to obtain a unique group identity K to this key assignments;
4th submodule, for inquiring about in Hash table according to this unique identification K, to have judged whether that key assignments is for D keydata group G (D), if having, then proceed to the 5th submodule, otherwise proceed to the 6th submodule;
5th submodule, has key assignments D for being inserted by data D keydata group G (D) in, then proceed to the 7th submodule;
6th submodule is D for creating a key assignments keydata G (D), and by data D data inserting group G (D), then turn to the 8th submodule;
7th submodule, upgrading key assignments in Hash table for adopting following formula is D keythe weights of data group, Pri (G (D))=Pri (G (D))+Pri (D), then turns to the 8th submodule;
8th submodule, for the weights of data group G (D) are set to Pri (D), then turns to step the seven submodule.
In general, the above technical scheme conceived by the present invention compared with prior art, can obtain following beneficial effect:
1, redundant computation amount and redundancy communication amount little: solve owing to adopting step (1) some redundant computation and communication overhead that each data causes cascaded triggering to bring, step (3-4) and step (6) solve the redundancy overhead brought with random order blindness process data, and therefore this method effectively can eliminate redundant computation expense and the communication overhead of a large amount of existence in asynchronous iteration process.
2, the fast convergence rate of iterative computation: owing to have employed step (1), step (3-4) and step (6), make asynchronous iteration process is restrained to more effective calculating and communicates faster and be preferentially processed, therefore this method accelerates the speed of convergence of asynchronous iteration process, improves the resource utilization of cloud computing infrastructure.
Accompanying drawing explanation
Fig. 1 is the applied environment figure that the present invention reduces the method for redundancy overhead in asynchronous iteration process.
Fig. 2 is the process flow diagram that the present invention reduces the method for redundancy overhead in asynchronous iteration process.
Embodiment
In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.In addition, if below in described each embodiment of the present invention involved technical characteristic do not form conflict each other and just can mutually combine.
General thought of the present invention is, proposing a dispatching algorithm is that the form organized is carried out, and consider that when dispatching intermediate result processing sequence data are to the importance of the speed of convergence of iterative computation and the expense first processing these data simultaneously, this dispatching algorithm uses a gatherer pending data to be collected with the form of group, calculate weights all respectively for often organizing simultaneously, and the calculating of weights of each group needs to consider the importance that data collected by this group play iterative computation speed of convergence and first process the expense that these data will bring, then this dispatching algorithm just selects the group of a maximum weight, the data allowing this organize first are processed, thus reach these redundant computation expense and communication overheads of effectively eliminating a large amount of existence in asynchronous iteration process, accelerate the speed of convergence of asynchronous iteration process, improve the resource utilization of cloud computing infrastructure.
As shown in Figure 1, the method that the present invention reduces redundancy overhead in asynchronous iteration process is applied in the system (abbreviation runtime system) of a kind of support program operation, this system comprises task manager and data processor, wherein task manager is used for initialization and the end of management data processor, data processor is used for receipt message and process data, and comprise message receiver, scheduler of the present invention, and task performer, the message comprising process data that message receiver is sent for receiving remainder data processor, scheduler be used for scheduled reception to message in the processing sequence of data, the pending data that task performer exports to process scheduler for performing user program.
As shown in Figure 2, the method that the present invention reduces redundancy overhead in asynchronous iteration process is applied in a kind of scheduler, and this scheduler is connected with task performer and message receiver communication respectively, and the method comprises the following steps:
(1) Hash table is set up, the corresponding data group of each list item, wherein each list item comprises again three territories: first territory is for storing the key assignments of data group, second territory is used for the weights of data group, 3rd territory is for storing the data list in data group, data list comprises the value of data, and the iteration level (Iterationnumber) at data place, and it is specifically as shown in table 1;
The advantage of this step is, by setting up Hash table, the fast finding that can be conducive to data group inserts and upgrades.
Data group The key assignments of data group The weights of data group Data list
Table 1
(2) the data D coming from message receiver is received;
(3) calculate the weights Pri (D) of this data D according to the convergence importance of this data D (ImportanceToConvergence is called for short ITC) value and iteration level (IterationNumber is called for short IN) value, specifically comprise following sub-step:
(3-1) ITC value ITC (D) and the IN value IN (D) of data D is calculated; Specifically, ITC (D)=± D, wherein sign needs to be provided inside configuration file by the application program in task performer, for an application program, if the value of data D is larger larger to convergence role, so use positive sign, otherwise with negative sign, and IN (D) is the information be recorded in data D, number of times handled during its original raw data being specially data D changes to data D.
The advantage of this sub-step is, by approximate treatment ITC and IN value, can effectively reduce weight computing expense.
(3-2) according to ITC(D) and IN (D) utilize following equation to calculate weights Pri (D): Pri (the D)=t of data D 1× ITC (D)+t 2× IN (D)/T, wherein t 1and t 2be respectively and represent ITC(D) and the weighted value of IN (D) importance, and its value is the decimal between 0 to 1, T for adjustment IN (D) span value, mainly make its span and ITC(D) basically identical, its span be greater than 1 integer.
The advantage of this sub-step is, by calculating weights for each data, the value increase by data D on the basis that the weights of data group can be made former obtains, the effective real-time weights obtaining data group.The reason of calculating Pri (D) like this can be explained below.
(4) judge whether to exist the data group G (D) with this data D with identical key assignments in Hash table to exist, if exist, upgrade weights and the data list of these data group G (D), otherwise in Hash list, create the data group G (D) of key assignments identical with this data D, and carry out initialization; Specifically, this step comprises following sub-step:
(4-1) the key assignments D of data D is obtained key, hash function process is carried out to obtain a unique group identity K to this key assignments;
(4-2) inquire about in Hash table according to this unique identification K, to have judged whether that key assignments is for D keydata group G (D), if having, then proceed to step (4-3), otherwise proceed to step (4-4);
(4-3) data D insertion had key assignments D keydata group G (D) in, namely complete the renewal of these data group G (D) Hash list in Hash table, then proceed to step (4-5);
(4-4) creating a key assignments is D keydata G (D), and by data D data inserting group G (D), then turn to step (4-6);
(4-5) adopting following formula to upgrade key assignments in Hash table is D keythe weights of data group, Pri (G (D))=Pri (G (D))+Pri (D), then turns to step (5);
The advantage of this sub-step is, by calculating the weights of data group G (D), can make the former basis of the weights Pri of data group (G (D)) is obtained by weights Pri (D) increment of data D, the effective real-time weights obtaining data group.The reason of calculating Pri (G (D)) like this can be explained below.
(4-6) weights of data group G (D) are set to Pri (D), then turn to step (5);
(5) judge that whether task performer is idle, if yes then enter step (6), otherwise return step (2);
(6) from Hash table, select a data group of maximum weight, send data corresponding for this data group to task performer process, then enter step (7);
(7) judge whether the application program run in task performer terminates, and if it is process terminates, otherwise proceed to step (8);
(8) judge whether also have untreated data group in Hash table, if had, return step (5), otherwise return step (2).
Explained later calculates the reason of Pri (D) and Pri (G (D)) by mode described in this method.
Before explanation reasons, first we provide the weights Pri (G (D)) how defining group G (D), and in the weights definition of data group, we mainly consider so two factors.
Factor one: ITC, it is faster that group G (D) having a larger ITC value will make iterative computation restrain, thus the data group avoiding ITC value little triggers a large amount of follow-up redundant computation and communication overhead.In fact the ITC value of a group, namely ITC (G (D)) can be defined as: ITC (G (D))=∑ d ∈ G (D)iTC (D).
Factor two: CTG (Costtofirstlyprocessingthisgroup), namely the expense of these group data is first processed, when one group of data is selected processed, the data arriving this group will need again to be processed by all the other, and some again in triggering following iteration calculate and communicate, and these calculate and communication overhead is exactly the CTG value that this organizes data, if the group that those CTG values are large can be postponed, and first process the little group of those CTG, the group that so CTG value is large will have more multimachine can collect more data, reduce redundancy overhead, in fact, CTG is exactly the importance (data causing triggering more are more important) of the data for improving all groups of average amount of collecting and collection.When providing the CTG how calculating a group, we introduce two new variablees: 1) IterationNumber (orIN), the iteration number of plies at the data place that Here it is is collected; 2) CompletionRatiof0rCRl, this is according to the ratio having all processed data of key assignments n to account for all need data to be processed in a certain iteration level, i.e. CR (i, n)=Num p(i, n)/Num t(i, n), wherein Num p(i, n) and Num t(i, n) is the data volume that the data volume (i) processed for more new data-objects R when iteration level is i and all needs are processed, due to Num t(i, n) is difficult to knowing in advance, and for most of data object R Num (i) t(i, n) is identical, so approximate Num t(i, n) value is fixed value T, so has CR (i, n)=Num p(i, n)/T.
We provide the CTG value of group G (D) now, i.e. CTG (G (D)), computing method, the CTG value due to each group of G (D) depend on will arrive this group data volume and each by arrive data by trigger calculating and communication overhead size.Because the data volume that will arrive group G (D) in the i-th th iteration depends on CR (i, D key), the triggering amount that the data D reaching data group G (n) will cause depends on the IN value IN (D) of data D, so CTG (G (D))=-Σ i ∈ Si × CR (i, D key), and S={n|D ∈ G (D) and IN (D)=N}, D keyfor the key assignments of data D, due to Pri (G (D)) and ITC (G (D)) positive correlation, and with CTG (G (D)) negative correlation, so have:
Pri(G(D))=t 1×ITC(G(D))-t 2×CTG(G(D))
In order to effectively obtain the weights Pri (G (D)) of each group of G (D) in real time, Pri (G (D)) can calculate as follows:
Pri ( G ( D ) ) = Pri ( G ( D ) ) + Pri ( D ) Pri ( D ) = t 1 × ITC ( D ) + t 2 × IN ( D ) / T .
Thus, whenever a data D be scheduled device receive time, just can be obtained the weights of data group G (D) by above-mentioned computing formula increment ground.
Scheduler of the present invention, it is connected with task performer and message receiver communication respectively, and comprises:
First module, for setting up a Hash table, the corresponding data group of each list item, wherein each list item comprises again three territories: first territory is for storing the key assignments of data group, second territory is used for the weights of data group, 3rd territory is for storing the data list in data group, and data list comprises the value of data and the iteration level at data place;
Second module, for receiving the data D coming from message receiver;
3rd module, for calculating the weights Pri (D) of this data D according to the ITC value of this data D and IN value, specifically comprises following sub-step:
First submodule, for calculating ITC value ITC (D) and the IN value IN (D) of data D, wherein ITC (D)=± D, IN (D) is the information be recorded in data D, number of times handled during its original raw data being specially data D changes to data D;
Second submodule, for according to ITC(D) and IN (D) utilize following equation to calculate weights Pri (D): Pri (the D)=t of data D 1× ITC (D)+t 2× IN (D)/T, wherein t 1and t 2be respectively and represent ITC(D) and the weighted value of IN (D) importance, and its value is the decimal between 0 to 1, and T is the value adjusting IN (D) span, its span be greater than 1 integer;
Four module, exist for judging whether to exist the data group G (D) with this data D with identical key assignments in Hash table, if exist, upgrade weights and the data list of these data group G (D), otherwise in Hash list, create the data group G (D) of key assignments identical with this data D, and carry out initialization; Four module comprises:
3rd submodule, for obtaining the key assignments D of data D key, hash function process is carried out to obtain a unique group identity K to this key assignments;
4th submodule, for inquiring about in Hash table according to this unique identification K, to have judged whether that key assignments is for D keydata group G (D), if having, then proceed to the 5th submodule, otherwise proceed to the 6th submodule;
5th submodule, has key assignments D for being inserted by data D keydata group G (D) in, then proceed to the 7th submodule;
6th submodule is D for creating a key assignments keydata G (D), and by data D data inserting group G (D), then turn to the 8th submodule;
7th submodule, upgrading key assignments in Hash table for adopting following formula is D keythe weights of data group, Pri (G (D))=Pri (G (D))+Pri (D), then turns to the 8th submodule;
8th submodule, for the weights of data group G (D) are set to Pri (D), then turns to step the seven submodule.
5th submodule, for judging that whether task performer is idle, if yes then enter the 6th module, otherwise returns the second module;
6th module, for selecting a data group of maximum weight from Hash table, sending data corresponding for this data group to task performer process, then entering the 7th module;
7th module, for judging whether the application program run in task performer terminates, and if it is process terminates, otherwise proceeds to the 8th module;
8th module judges whether also have untreated data group in Hash table, if had, returns the 5th module, otherwise returns the second module.
Scheduler of the present invention can be stored in computer-readable medium.
Those skilled in the art will readily understand; the foregoing is only preferred embodiment of the present invention; not in order to limit the present invention, all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.

Claims (2)

1. reduce a method for redundancy overhead in asynchronous iteration process, it is applied in a kind of scheduler, and this scheduler is connected with task performer and message receiver communication respectively, it is characterized in that, the method comprises the following steps:
(1) Hash table is set up, the corresponding data group of each list item, wherein each list item comprises again three territories: first territory is for storing the key assignments of data group, second territory is used for the weights of data group, 3rd territory is for storing the data list in data group, and data list comprises the value of data and the iteration level at data place;
(2) the data D coming from message receiver is received;
(3) according to the ITC value of this data D and the weights Pri (D) of this data D of IN value calculating, following sub-step is specifically comprised:
(3-1) ITC value ITC (D) and the IN value IN (D) of data D is calculated, wherein ITC (D)=± D, IN (D) is the information be recorded in data D, number of times handled during its original raw data being specially data D changes to data D;
(3-2) following equation is utilized to calculate weights Pri (D): Pri (the D)=t of data D according to ITC (D) and IN (D) 1× ITC (D)+t 2× IN (D)/T, wherein t 1and t 2be respectively and represent ITC (D) and the weighted value of IN (D) importance, and its value is the decimal between 0 to 1, T is the value adjusting IN (D) span, its span be greater than 1 integer;
(4) judge whether to exist the data group G (D) with this data D with identical key assignments in Hash table to exist, if exist, upgrade weights and the data list of these data group G (D), otherwise in Hash list, create the data group G (D) of key assignments identical with this data D, and carry out initialization, this step comprises following sub-step:
(4-1) the key assignments D of data D is obtained key, hash function process is carried out to obtain a unique group identity K to this key assignments;
(4-2) inquire about in Hash table according to this unique identification K, to have judged whether that key assignments is for D keydata group G (D), if having, then proceed to step (4-3), otherwise proceed to step (4-4);
(4-3) data D insertion had key assignments D keydata group G (D) in, then proceed to step (4-5);
(4-4) creating a key assignments is D keydata G (D), and by data D data inserting group G (D), then turn to step (4-6);
(4-5) adopting following formula to upgrade key assignments in Hash table is D keythe weights of data group, Pri (G (D))=Pri (G (D))+Pri (D), then turns to step (5);
(4-6) weights of data group G (D) are set to Pri (D), then turn to step (5);
(5) judge that whether task performer is idle, if yes then enter step (6), otherwise return step (2);
(6) from Hash table, select a data group of maximum weight, send data corresponding for this data group to task performer process, then enter step (7);
(7) judge whether the application program run in task performer terminates, and if it is process terminates, otherwise proceed to step (8);
(8) judge whether also have untreated data group in Hash table, if had, return step (5), otherwise return step (2).
2. use a scheduler for method described in claim 1, it is connected with task performer and message receiver communication respectively, it is characterized in that, this scheduler comprises:
First module, for setting up a Hash table, the corresponding data group of each list item, wherein each list item comprises again three territories: first territory is for storing the key assignments of data group, second territory is used for the weights of data group, 3rd territory is for storing the data list in data group, and data list comprises the value of data and the iteration level at data place;
Second module, for receiving the data D coming from message receiver;
3rd module, for calculating the weights Pri (D) of this data D according to the ITC value of this data D and IN value, specifically comprises following submodule:
First submodule, for calculating ITC value ITC (D) and the IN value IN (D) of data D, wherein ITC (D)=± D, IN (D) is the information be recorded in data D, number of times handled during its original raw data being specially data D changes to data D;
Second submodule, for utilizing following equation to calculate weights Pri (D): Pri (the D)=t of data D according to ITC (D) and IN (D) 1× ITC (D)+t 2× IN (D)/T, wherein t 1and t 2be respectively and represent ITC (D) and the weighted value of IN (D) importance, and its value is the decimal between 0 to 1, T is the value adjusting IN (D) span, its span be greater than 1 integer;
Four module, exist for judging whether to exist the data group G (D) with this data D with identical key assignments in Hash table, if exist, upgrade weights and the data list of these data group G (D), otherwise in Hash list, create the data group G (D) of key assignments identical with this data D, and carry out initialization; Four module comprises:
3rd submodule, for obtaining the key assignments D of data D key, hash function process is carried out to obtain a unique group identity K to this key assignments;
4th submodule, for inquiring about in Hash table according to this unique identification K, to have judged whether that key assignments is for D keydata group G (D), if having, then proceed to the 5th submodule, otherwise proceed to the 6th submodule;
5th submodule, has key assignments D for being inserted by data D keydata group G (D) in, then proceed to the 7th submodule;
6th submodule is D for creating a key assignments keydata G (D), and by data D data inserting group G (D), then turn to the 8th submodule;
7th submodule, upgrading key assignments in Hash table for adopting following formula is D keythe weights of data group, Pri (G (D))=Pri (G (D))+Pri (D), then turns to the 8th submodule;
8th submodule, for the weights of data group G (D) are set to Pri (D), then turns to step the seven submodule;
5th submodule, for judging that whether task performer is idle, if yes then enter the 6th module, otherwise returns the second module;
6th module, for selecting a data group of maximum weight from Hash table, sending data corresponding for this data group to task performer process, then entering the 7th module;
7th module, for judging whether the application program run in task performer terminates, and if it is process terminates, otherwise proceeds to the 8th module;
8th module judges whether also have untreated data group in Hash table, if had, returns the 5th module, otherwise returns the second module.
CN201310173239.4A 2013-05-10 2013-05-10 A kind of scheduler and reduce the method for redundancy overhead in asynchronous iteration process Active CN103309942B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310173239.4A CN103309942B (en) 2013-05-10 2013-05-10 A kind of scheduler and reduce the method for redundancy overhead in asynchronous iteration process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310173239.4A CN103309942B (en) 2013-05-10 2013-05-10 A kind of scheduler and reduce the method for redundancy overhead in asynchronous iteration process

Publications (2)

Publication Number Publication Date
CN103309942A CN103309942A (en) 2013-09-18
CN103309942B true CN103309942B (en) 2016-04-13

Family

ID=49135160

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310173239.4A Active CN103309942B (en) 2013-05-10 2013-05-10 A kind of scheduler and reduce the method for redundancy overhead in asynchronous iteration process

Country Status (1)

Country Link
CN (1) CN103309942B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101018182A (en) * 2007-02-16 2007-08-15 华为技术有限公司 A bridging method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8332366B2 (en) * 2006-06-02 2012-12-11 International Business Machines Corporation System and method for automatic weight generation for probabilistic matching

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101018182A (en) * 2007-02-16 2007-08-15 华为技术有限公司 A bridging method and device

Also Published As

Publication number Publication date
CN103309942A (en) 2013-09-18

Similar Documents

Publication Publication Date Title
CN112601197B (en) Resource optimization method in train-connected network based on non-orthogonal multiple access
CN109885397B (en) Delay optimization load task migration algorithm in edge computing environment
CN103179052B (en) A kind of based on the central virtual resource allocation method and system of the degree of approach
WO2017112077A1 (en) Optimizing skewed joins in big data
CN103812949A (en) Task scheduling and resource allocation method and system for real-time cloud platform
Qiu et al. A packet buffer evaluation method exploiting queueing theory for wireless sensor networks
CN109981744B (en) Data distribution method and device, storage medium and electronic equipment
EP3198494B1 (en) Communication for efficient re-partitioning of data
CN106250240A (en) A kind of optimizing and scheduling task method
CN104820705A (en) Extensible partition method for associated flow graph data
CN104486129B (en) The method and system of application service quality are ensured under distributed environment
CN104809130A (en) Method, equipment and system for data query
CN113033800A (en) Distributed deep learning method and device, parameter server and main working node
CN109548161A (en) A kind of method, apparatus and terminal device of wireless resource scheduling
CN105808346A (en) Task scheduling method and device
CN104469851A (en) Resource distribution method for throughput-delaying balancing in LTE downlink
US8667008B2 (en) Search request control apparatus and search request control method
CN103309942B (en) A kind of scheduler and reduce the method for redundancy overhead in asynchronous iteration process
CN105335313A (en) Basic data transmission method and apparatus
Wang et al. Joint job offloading and resource allocation for distributed deep learning in edge computing
Dakkak et al. Scheduling through backfilling technique for HPC applications in grid computing environment
CN112738225B (en) Edge calculation method based on artificial intelligence
CN105335362A (en) Real-time data processing method and system, and instant processing system
CN104639606B (en) A kind of optimization method of differentiation contrast piecemeal
CN112925831A (en) Big data mining method and big data mining service system based on cloud computing service

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant