CN103384206A - Concurrent processing method and system for mass data - Google Patents

Concurrent processing method and system for mass data Download PDF

Info

Publication number
CN103384206A
CN103384206A CN2012101352263A CN201210135226A CN103384206A CN 103384206 A CN103384206 A CN 103384206A CN 2012101352263 A CN2012101352263 A CN 2012101352263A CN 201210135226 A CN201210135226 A CN 201210135226A CN 103384206 A CN103384206 A CN 103384206A
Authority
CN
China
Prior art keywords
subtask
task
data
cycle
execution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012101352263A
Other languages
Chinese (zh)
Other versions
CN103384206B (en
Inventor
陆忠华
王珏
王彦棡
邓笋根
阚圣哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Network Information Center of CAS
Original Assignee
Computer Network Information Center of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Network Information Center of CAS filed Critical Computer Network Information Center of CAS
Priority to CN201210135226.3A priority Critical patent/CN103384206B/en
Publication of CN103384206A publication Critical patent/CN103384206A/en
Application granted granted Critical
Publication of CN103384206B publication Critical patent/CN103384206B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a concurrent processing method of mass data in network management, in particular to a task scheduling method for increasing the bandwidth and the resource utilization rate. The concurrent processing method comprises the steps that tasks to be collected are taken out from a database at fixed time and inserted in a task collection queue to wait for the execution of a collection task; task descriptions are taken out from the task collection queue, the tasks are divided into subtasks according to the bandwidth resource utilization conditions and/or the computing resource use conditions, and the subtasks are mapped to a collection task processing unit to wait for execution; all the subtasks are executed to finish the data collection task, collected data are added into a data caching queue and subjected to concurrent data processing, and processed data are added into a main memory database; the data in the main memory database are synchronized into a physical database regularly.

Description

A kind of method for parallel processing and system towards mass data
The present invention relates in a kind of network management towards the method for parallel processing of mass data especially a kind of method for scheduling task that improves bandwidth and resource utilization.
Background technology
Along with the development of network technology and popularizing of network application, the user is also more and more higher to the performance requirement of network service.In order to satisfy better user's demand, provide high-quality service to the user, just must guarantee the quality of network service.For this reason, must carry out Real Time Monitoring to the operation conditions of each equipment in network, the fault of equipment in timely discovering network, thus take appropriate measures stablizing with maintaining network.
For the operation conditions of monitor network equipment, need the operational factor of each equipment in the Real-time Collection network, and send to main control computer, by main control computer, the operational factor of the network equipment is analyzed, determine whether equipment breaks down.
In prior art, usually distribute acquisition tasks by main control computer, every operational factor of the harvester Real-time Collection network equipment, wherein each task need to be processed a plurality of acquisition targets (equipment to be collected), and each acquisition target comprises a plurality of acquisition indexes (different parameters).Yet network size is more and more huger, need to gather thousands of equipment in network management, each equipment comprises a plurality of acquisition indexes, thereby the data that need to gather are magnanimity, if a plurality of acquisition tasks are carried out at one time, the instantaneous bandwidth that takies is excessive with causing, if simultaneously the collection subtask number of executed in parallel is very few, although can reduce taking of bandwidth, but can waste the computational resource of harvester, the computational resource of therefore how not only avoiding taking too much bandwidth resources but also taking full advantage of harvester is the technical problem to be solved in the present invention.
Summary of the invention
The objective of the invention is acquisition tasks is carried out rational management effectively utilizing the network bandwidth, and take full advantage of computational resource and carry out data acquisition.
For achieving the above object, the present invention proposes following solution:
According to first aspect, the embodiment of the present invention proposes in a kind of network management the parallel processing system (PPS) towards mass data, it is characterized in that comprising with lower module: the acquisition tasks update module, be used for regularly taking out task to be collected from database, and task insertion task to be collected is gathered formation, wait for the execution of acquisition tasks; The acquisition tasks scheduler module, be used for gathering formation from task and take out task description, utilizing situation and/or computational resource operating position that task is carried out the subtask according to bandwidth resources divides, the subtask is mapped to the execution that the acquisition tasks processing unit is waited for the subtask; Become the data acquisition task, the data that gather are added the data buffer storage formation, and carry out parallel data and process, the data of handling well are added memory database; Physics/main memory DBM module is used for the data of memory database regularly are synchronized in physical database.
According on the other hand, the embodiment of the present invention proposes in a kind of network management the method for parallel processing towards mass data, it is characterized in that comprising the following steps: (1) acquisition tasks step of updating, regularly take out task to be collected from database, and task insertion task to be collected is gathered formation, wait for the execution of acquisition tasks; (2) acquisition tasks scheduling step, gather formation from task and take out task description, utilizing situation and/or computational resource operating position that task is carried out the subtask according to bandwidth resources divides, the subtask is mapped to the execution that the acquisition tasks processing unit is waited for the subtask; (3) acquisition tasks is processed, and carries out each subtask with the data acquisition task, and the data that gather are added the data buffer storage formation, and carries out parallel data and process, and the data of handling well are added memory database; (4) physics/main memory DBM step, regularly be synchronized to the data in memory database in physical database.
Description of drawings
Fig. 1 is the operation hardware structure figure of system of the present invention;
Fig. 2 is parallel processing system (PPS) structure chart of the present invention;
Fig. 3 is acquisition tasks renewal process of the present invention;
Fig. 4 is acquisition tasks processing procedure of the present invention.
Embodiment
Below by drawings and Examples, technical scheme of the present invention is described in further detail.
Because the data volume that gathers is large, in order to improve the execution efficient of whole system, mass data parallel processing system of the present invention may operate in executed in parallel on many harvesters, the hardware structure of its execution as shown in Figure 1, wherein:
Main control computer---be used for distributing acquisition tasks, gather collection result and show the user, it comprises a plurality of physical processing units, can a plurality of acquisition tasks of executed in parallel;
Harvester---be used for the equipment of monitoring is carried out data acquisition, the data that gather comprise a plurality of parameter indexs of equipment, harvester comprises a plurality of physical processing units, and the collecting work of each index is assigned to be a subtask, is completed by a physical processing unit.
In the present invention, each acquisition tasks need to be processed a plurality of acquisition targets (equipment to be collected), and each acquisition target comprises a plurality of acquisition indexes (different parameters), and the collecting work of each index is assigned to be a subtask and is completed by a physical processing unit.The work of subtask is mainly regularly by network, data to be gathered out from underlying device.
Embodiment one
As shown in Figure 2, the parallel processing system (PPS) towards mass data of the present invention comprises: acquisition tasks update module, acquisition tasks scheduler module, acquisition tasks processing module, physics/main memory DBM module, monitoring module, system command are monitored module, log management module.To the make introductions all round course of work of modules of the below:
(1) acquisition tasks update module
The acquisition tasks update module is used for regularly taking out task to be collected from database, and task insertion task to be collected is gathered formation, waits for the execution of acquisition tasks.
Further preferred version is:
As shown in Figure 3, the acquisition tasks update module judges whether the acquisition tasks that need to stop moving from physical database after the taking-up task, if, the task of stopping carrying out according to task ID, and remove in task and gather this task in formation; Otherwise whether the inquiry task of carrying out needs to upgrade, if, updating task, and reload this task; Otherwise task insertion task to be collected is gathered formation.
(2) acquisition tasks scheduler module
The acquisition tasks scheduler module is used for gathering formation from task and takes out task description, utilizes situation, computational resource operating position that task is carried out the subtask according to bandwidth resources and divides, the subtask is mapped to the execution that physical processing unit is waited for the subtask.
Further preferred version is:
In order to prevent that same time point from a large amount of subtasks occurring and piling up, cause instant bandwidth excessive, calculate the subtask number of concurrent execution according to following mode.
Each gathers the subtask and has the execution cycle (namely every just repeated acquisition work of regular time) of oneself, and the time of implementation of each subtask in one-period be far smaller than cycle time, be for example 1/20 of cycle time.Each subtask comprises following information:
PR---the priority of subtask;
PE---subtask self collection period;
NC---subtask self collection capacity (can calculate in advance).
In order to prevent that same time point from a large amount of subtasks occurring and piling up, cause instant bandwidth excessive, the subtask need to be carried out stage by stage.Task triggers cycle P GCDBe the current greatest common divisor of carrying out all subtask collection period.
An acquisition tasks triggers interior rest network bandwidth of cycle and is:
Br=c*B*P GCD-N d
Wherein: c ∈ (0,1]---coefficient;
B---system bandwidth;
P GCD---task triggers the cycle;
N d---be about to the data volume that in the execution cycle, (execution cycle can be that an acquisition tasks triggers the cycle, can be also the triggering cycle of a plurality of acquisition tasks) subtask comprises.
Begin pending subtask in the triggering cycle processed according to the order of priority, suppose that self collection capacity of the current n of a having subtask is respectively:
NC 0、NC 1、NC 2、…、NC m-1、NC m、…NC n-1
If SUM(NC 0..., NC m-1)<=Br<=SUM(NC 0..., NC m), front m subtask is obtained, is the subtask number of the concurrent execution of needs.
This module can regularly be adjusted according to current system bandwidth situation B the subtask number of the concurrent execution of needs.
Further prioritization scheme is:
If the collection subtask number of executed in parallel is very few, although can reduce taking of bandwidth, but can waste the computational resource of harvester, need on the basis that bandwidth takes full advantage of, need to reach the target that takes full advantage of computational resource for this reason.
Subtask number that can be concurrent within the triggering cycle:
NL=P GCD*N c/T a
Wherein:
N c---the physical processing unit number of harvester;
T a---the average time of implementation of subtask, obtain by historical data.
Within a triggering cycle, needing the subtask number of concurrent execution is N=min(m, NL).After the top n subtask distribution that the formation medium priority is high is carried out.The priority P R of other subtask increases.
(3) acquisition tasks processing module
The acquisition tasks processing module is used for carrying out each subtask with the data acquisition task, the data that gather is added the data buffer storage formation, and carry out parallel data and process, and the data of handling well are added memory database.
Because the image data amount is larger, therefore adopt the mode of buffer queue to alleviate the database Bonding pressure, take out data from the database caches formation, adopt parallel processing.
Further preferred version is:
As shown in Figure 4, before image data being added data buffer storage formation step, whether the data that gathered of judgement are within setting range, if, image data is added the data buffer storage formation, otherwise will generate warning information and warning information is added the alarm buffer queue, send alarm notification.
(4) physics/main memory DBM module
Physics/main memory DBM module is used for the data of memory database regularly are synchronized in physical database.
(5) monitoring module
Monitoring module is used for setting up heartbeat with other module, and other module is monitored, and regularly own state with other module is write physical database.
(6) monitor module
Monitor module, after being used for receiving the shutdown command of foreground transmission, closeall module.
(7) log management module
Log management module is used for register system event every day.
Further preferred version is:
Should also comprise towards the parallel processing system (PPS) of mass data:
(8) manual intervention interface module
The manual intervention interface module, the acquisition tasks scheduler module returns to the acquisition tasks update module with tasks carrying situation, bandwidth and underlying resource utilance, feed back to the user by update module again, the user is by manual intervention interface module operation acquisition tasks update module, according to practical operation situation suspend selectively, cancellation and continuation task.
Parallel processing system (PPS) in the present embodiment operates in main control computer.All modules all operate on main control computer, only have acquisition tasks to operate on harvester.
Embodiment two
A kind of method for parallel processing towards mass data comprises: acquisition tasks step of updating, acquisition tasks scheduling step, acquisition tasks treatment step, physics/main memory DBM step, monitoring step, system command are monitored step, log management step.To the make introductions all round implementation of each step of the below:
(1) acquisition tasks step of updating
Regularly take out task to be collected from database, and task insertion task to be collected is gathered formation, wait for the execution of acquisition tasks.
Further preferred version is:
From physical database after the taking-up task, judge whether the acquisition tasks that need to stop moving, if, the task of stopping carrying out according to task ID, and remove in task and gather this task in formation; Otherwise whether the inquiry task of carrying out needs to upgrade, if, updating task, and reload this task; Otherwise task insertion task to be collected is gathered formation.
(2) acquisition tasks scheduling step
Gather formation from task and take out task description, utilize situation, computational resource operating position that task is carried out the subtask according to bandwidth resources and divide, the subtask is mapped to the execution that physical processing unit is waited for the subtask.
Further preferred version is:
In order to prevent that same time point from a large amount of subtasks occurring and piling up, cause instant bandwidth excessive, calculate the subtask number of concurrent execution according to following mode.
Each gathers the subtask and has the execution cycle (namely every just repeated acquisition work of regular time) of oneself, and the time of implementation of each subtask in one-period be far smaller than cycle time (for example can for cycle time 1/20).Each subtask comprises following information:
PR---the priority of subtask;
PE---subtask self collection period;
NC---subtask self collection capacity (can calculate in advance).
In order to prevent that same time point from a large amount of subtasks occurring and piling up, cause instant bandwidth excessive, the subtask need to be carried out stage by stage.Task triggers cycle P GCDBe the current greatest common divisor of carrying out all subtask collection period.
A task triggers interior rest network bandwidth of cycle and is:
Br=c*B*P GCD–N d
Wherein: c ∈ (0,1]---coefficient;
B---system bandwidth;
P GCD---task triggers the cycle;
N d---be about to the data volume that in the execution cycle, task comprises.
Begin pending subtask in the triggering cycle processed according to the order of priority, suppose that self collection capacity of the current n of a having subtask is respectively:
NC 0、NC 1、NC 2、…、NC m-1、NC m、…NC n-1
If SUM(NC 0..., NC m-1)<=Br<=SUM(NC 0..., NC m), front m subtask is obtained, for needing the subtask number of parallel processing.
Can regularly adjust according to current system bandwidth situation B the subtask number of needs executed in parallel.
Further prioritization scheme is:
If the collection subtask number of executed in parallel is very few, although can reduce taking of bandwidth, but can waste the computational resource of harvester, need on the basis that bandwidth takes full advantage of, need to reach the target that takes full advantage of computational resource for this reason.
Subtask number that can be concurrent within the triggering cycle:
NL=P GCD*N c/T a
Wherein:
N c---the physical processing unit number of harvester;
T a---the average time of implementation of subtask, obtain by historical data.
Within a triggering cycle, needing the subtask number of concurrent execution is N=min(m, NL).The top n subtask distribution that the formation medium priority is high is carried out.The priority P R of other subtask increases.
(3) acquisition tasks treatment step
Carry out each subtask with the data acquisition task, the data that gather are added the data buffer storage formation, and carry out parallel data and process, the data of handling well are added memory database.
Because the image data amount is larger, therefore adopt the mode of buffer queue to alleviate the database Bonding pressure, take out data from the database caches formation, adopt parallel processing.
Further preferred version is:
As shown in Figure 4, before image data being added data buffer storage formation step, whether the data that gathered of judgement are within setting range, if, image data is added the data buffer storage formation, otherwise will generate warning information and warning information is added the alarm buffer queue, send alarm notification.
(4) physics/main memory DBM step
Data in memory database regularly are synchronized in physical database.
(5) monitoring step
Set up heartbeat, each process is monitored, regularly each state of a process is write physical database.
(6) monitor step
After receiving the shutdown command of foreground transmission, closeall process.
(7) log management step
Register system event every day.
Further preferred version is:
Should also comprise towards the method for parallel processing of mass data:
(8) manual intervention step
Acquisition tasks implementation status, bandwidth and underlying resource utilance are returned fed back to the user, the user upgrades acquisition tasks by the manual intervention interface, according to practical operation situation suspend selectively, cancellation and continuation task.
Embodiment three
Data acquisition task scheduling system in a kind of network management system is characterized in that:
(1) task management module gathers formation from task and takes out task description;
(2) module is divided in the subtask, utilizing situation that task is carried out the subtask according to bandwidth resources divides, wherein: each gathers the subtask and has the execution cycle of oneself, namely every just repeated acquisition work of regular time, and the time of implementation of each subtask in one-period be far smaller than cycle time (for example can for cycle time 1/20), each subtask comprises following information:
PR---the priority of subtask;
PE---subtask self collection period;
NC---subtask self collection capacity, it can calculate in advance;
(3) mapping block is mapped to the subtask execution that physical processing unit is waited for the subtask,
(4) determination module, determine the subtask number of concurrent execution according to following steps:
(a) calculation task triggers cycle P GCD, it is for the greatest common divisor of current all subtask collection period of carrying out;
(b) calculate as follows rest network bandwidth in the triggering cycle:
Br=c*B*P GCD-N d
Wherein: c ∈ (0,1]---coefficient;
B---system bandwidth;
P GCD---task triggers the cycle;
N d---be about to the data volume that in the execution cycle, task comprises;
(c) the interior rest network bandwidth of a triggering cycle that obtains according to step (b), the subtask number of determining to carry out concurrent processing, determine that mode is as follows:
(i) in the triggering cycle, pending subtask is arranged according to priority orders;
(ii) self collection capacity of current pending n subtask is respectively: NC 0, NC 1, NC 2..., NC m-1, NC m... NC n-1,
If SUM(NC 0..., NC m-1)<=Br<=SUM(NC 0..., NC m), front m subtask is obtained;
(5) Executive Module, executed in parallel determined subtask number.
Further preferred version is regularly to adjust the subtask number of executed in parallel according to current bandwidth conditions B.
When gathering the subtask and carry out parallel processing, not only need the taking into account system bandwidth, should consider also that simultaneously the computational resource of harvester utilizes situation, for reaching the target that takes full advantage of computational resource, to the further preferred scheme of the present embodiment be:
Executive Module is pressed according to the following steps subtasking:
(i) according to the computational resource situation of harvester determine within the triggering cycle can executed in parallel the subtask number,
NL=P GCD*N c/T a
Wherein:
N c---the physical processing unit number of harvester;
T a---the average time of implementation of subtask, it obtains by historical data;
(ii) utilize situation and computational resource situation according to bandwidth resources, the subtask number of determining concurrent execution within a triggering cycle is N=min(m, NL);
(iii) the top n subtask distribution that the formation medium priority is high is carried out, and the priority P R of other subtask increases.
Embodiment four
Data acquisition method for scheduling task in a kind of network management system is characterized in that:
(1) take out task description from task collection formation;
(2) utilizing situation that task is carried out the subtask according to bandwidth resources divides, wherein: each gathers the subtask and has the execution cycle of oneself, namely every just repeated acquisition work of regular time, and the time of implementation of each subtask in one-period is far smaller than cycle time, and each subtask comprises following information:
PR---the priority of subtask;
PE---subtask self collection period;
NC---subtask self collection capacity, it can calculate in advance;
(3) subtask is mapped to the execution that physical processing unit is waited for the subtask,
(4) determine the subtask number of concurrent execution according to following steps:
(a) calculation task triggers cycle P GCD, it is for the greatest common divisor of current all subtask collection period of carrying out;
(b) calculate as follows rest network bandwidth in the triggering cycle:
Br=c*B*P GCD-N d
Wherein: c ∈ (0,1]---coefficient;
B---system bandwidth;
P GCD---task triggers the cycle;
N d---be about to the data volume that in the execution cycle, task comprises;
(c) the interior rest network bandwidth of a triggering cycle that obtains according to step (b), the subtask number of determining to carry out concurrent processing, determine that mode is as follows:
(i) in the triggering cycle, pending subtask is arranged according to priority orders;
(ii) self collection capacity of current pending n subtask is respectively: NC 0, NC 1, NC 2..., NC m-1, NC m... NC n-1,
If SUM(NC 0..., NC m-1)<=Br<=SUM(NC 0..., NC m), front m subtask is obtained;
(d) the determined subtask of executed in parallel number.
Further preferred version is regularly to adjust the subtask number of executed in parallel according to current bandwidth conditions B.
When gathering the subtask and carry out parallel processing, not only need the taking into account system bandwidth, should consider also that simultaneously the computational resource of harvester utilizes situation, for reaching the target that takes full advantage of computational resource, to the further preferred scheme of the present embodiment be:
Step (d) specifically comprises the steps:
(i) according to the computational resource situation of harvester determine within the triggering cycle can executed in parallel the subtask number,
NL=P GCD*N c/T a
Wherein:
N c---the physical processing unit number of harvester;
T a---the average time of implementation of subtask, it obtains by historical data;
(ii) utilize situation and computational resource situation according to bandwidth resources, the subtask number of determining concurrent execution within a triggering cycle is N=min(m, NL);
(iii) the top n subtask distribution that the formation medium priority is high is carried out, and the priority P R of other subtask increases.
By technical scheme disclosed by the invention, can carry out rational management effectively utilizing the network bandwidth to acquisition tasks, and utilize the bottom computational resource to carry out analyzing and processing, solved the technical problem that exists in the prior art.
Above-described embodiment; purpose of the present invention, technical scheme and beneficial effect are further described; institute is understood that; the above is only the specific embodiment of the present invention; the protection range that is not intended to limit the present invention; within the spirit and principles in the present invention all, any modification of making, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (16)

  1. In a network management towards the parallel processing system (PPS) of mass data, it is characterized in that comprising with lower module:
    The acquisition tasks update module is used for regularly taking out task to be collected from database, and task insertion task to be collected is gathered formation, waits for the execution of acquisition tasks;
    The acquisition tasks scheduler module, be used for gathering formation from task and take out task description, utilizing situation and/or computational resource operating position that task is carried out the subtask according to bandwidth resources divides, the subtask is mapped to the execution that the acquisition tasks processing unit is waited for the subtask;
    The acquisition tasks processing module is used for carrying out each subtask with the data acquisition task, the data that gather is added the data buffer storage formation, and carry out parallel data and process, and the data of handling well are added memory database;
    Physics/main memory DBM module is used for the data of memory database regularly are synchronized in physical database.
  2. 2. the system as claimed in claim 1 is characterized in that:
    The acquisition tasks update module judges whether the acquisition tasks that need to stop moving from physical database after the taking-up task, if, the task of stopping carrying out according to task ID or remove in task and gather task in formation; Otherwise whether the inquiry task of carrying out needs to upgrade, if, updating task, and reload this task; Otherwise task insertion task to be collected is gathered formation.
  3. 3. system as claimed in claim 1 or 2, it is characterized in that: described acquisition tasks scheduler module comprises:
    Module is divided in the subtask, utilizes situation that task is gathered the subtask according to bandwidth resources and divides, and each subtask comprises following information: priority, collection period and collection capacity; Wherein, collection period represents the execution cycle of each subtask, and namely every regular time repeated acquisition work, and the time of implementation of each subtask in one-period is much smaller than cycle time;
    Determination module, determine the subtask number of concurrent execution according to following steps:
    (a) determine the triggering cycle of a plurality of acquisition tasks, it is the greatest common divisor of current all subtask collection period of carrying out;
    (b) determine that a task triggers rest network bandwidth in the cycle;
    (c) according to the rest network bandwidth, determine to carry out the subtask number of concurrent processing;
    Mapping block is mapped to the subtask of determining concurrent execution the execution that the acquisition tasks processing module is waited for the subtask.
  4. 4. system as claimed in claim 3, it is characterized in that, described step (b) determines that the method for rest network bandwidth is: the product that system bandwidth and task trigger the cycle multiply by a coefficient again, the result that obtains deducts the data volume that in the soon execution cycle, task comprises again, is a task and triggers interior rest network bandwidth of cycle; Wherein coefficient be (0,1] constant.
  5. 5. system as claimed in claim 4, is characterized in that, described step (c) determines that the subtask counting method of concurrent processing is:
    (i) in the triggering cycle, current pending subtask is arranged according to priority orders;
    (ii) obtain the collection capacity of each subtask in current pending subtask, n subtask collection capacity sum before calculating, rest network bandwidth in the triggering cycle that calculates in itself and step (b) is compared, if the rest network band is wider than front m current pending subtask collection capacity sum, and less than front m+1 current pending subtask collection capacity sum, a front m subtask is obtained, and needing the subtask number of concurrent execution is m; M<n wherein.
  6. 6. system as claimed in claim 5, it is characterized in that: determination module is further determined concurrent subtasking number according to following steps:
    (iii) obtain the physical processing unit number of harvester;
    The result of (iv) obtaining according to step (i), determine within the triggering cycle can executed in parallel the subtask number, computational methods are: the product of the physical processing unit number of the harvester that the rest network bandwidth that obtains in step (b) and step (iii) obtain is again divided by the average time of implementation of subtask;
    (v) determine the subtask number of concurrent execution within a triggering cycle, its needing to obtain in (ii) for step in triggering cycle that the subtask number of concurrent execution and step obtain in (iv) can concurrent execution the subtask number, the numerical value that both are medium and small;
    (vi) (v) obtain the subtask number of concurrent execution within a triggering cycle, the subtask of the high respective numbers of formation medium priority that sets the tasks is for needing the subtask of concurrent execution, and after carrying out, the priority of other subtask increases according to step.
  7. 7. system as claimed in claim 6 is characterized in that: the acquisition tasks processing module also is used for carrying out:
    Before image data being added data buffer storage formation step, whether the data that judgement is gathered are within setting range, if image data is added the data buffer storage formation, otherwise will generate warning information and warning information will be added the alarm buffer queue, send alarm notification.
  8. 8. system as claimed in claim 7, is characterized in that: also comprise
    The manual intervention interface module, the acquisition tasks scheduler module returns to the acquisition tasks update module with tasks carrying situation, bandwidth and underlying resource utilance, feed back to the user by update module again, the user is by manual intervention interface module operation acquisition tasks update module, according to practical operation situation suspend selectively, cancellation and continuation task.
  9. In a network management towards the method for parallel processing of mass data, it is characterized in that comprising the following steps:
    (1) acquisition tasks step of updating is regularly taken out task to be collected, and task insertion task to be collected is gathered formation from database, wait for the execution of acquisition tasks;
    (2) acquisition tasks scheduling step, gather formation from task and take out task description, utilizing situation and/or computational resource operating position that task is carried out the subtask according to bandwidth resources divides, the subtask is mapped to the execution that the acquisition tasks processing unit is waited for the subtask;
    (3) acquisition tasks is processed, and carries out each subtask with the data acquisition task, and the data that gather are added the data buffer storage formation, and carries out parallel data and process, and the data of handling well are added memory database;
    (4) physics/main memory DBM step, regularly be synchronized to the data in memory database in physical database.
  10. 10. method as claimed in claim 9 is characterized in that:
    From physical database after the taking-up task, judge whether the acquisition tasks that need to stop moving, if stop carrying out or remove in task gathering task in formation according to task ID; Otherwise whether the inquiry task of carrying out needs to upgrade, if, updating task, and reload this task; Otherwise task insertion task to be collected is gathered formation.
  11. 11. method as described in claim 9 or 10 is characterized in that: described acquisition tasks scheduling step specifically comprises:
    Obtaining step gathers formation from task and takes out task description;
    The subtask partiting step utilizes situation that task is gathered the subtask according to bandwidth resources and divides, and each subtask comprises following information: priority, collection period and collection capacity; Wherein, collection period represents the execution cycle of each subtask, and namely every regular time repeated acquisition work, and the time of implementation of each subtask in one-period is much smaller than cycle time;
    Determining step, determine to comprise the subtask number of concurrent execution:
    (a) determine the triggering cycle of a plurality of acquisition tasks, it is the greatest common divisor of current all subtask collection period of carrying out;
    (b) determine that a task triggers rest network bandwidth in the cycle;
    (c) according to the rest network bandwidth, determine to carry out the subtask number of concurrent processing;
    Mapping step is mapped to the subtask of determining concurrent execution the execution that physical processing unit is waited for the subtask.
  12. 12. method as claimed in claim 11, it is characterized in that, described step (b) determines that the method for rest network bandwidth is: the product that system bandwidth and task trigger the cycle multiply by a coefficient again, the result that obtains deducts the data volume that in the soon execution cycle, task comprises again, is a task and triggers interior rest network bandwidth of cycle; Wherein coefficient be (0,1] constant.
  13. 13. method as claimed in claim 12 is characterized in that, described step (c) determines that the subtask counting method of concurrent processing is:
    (i) in the triggering cycle, current pending subtask is arranged according to priority orders;
    (ii) obtain the collection capacity of each subtask in current pending subtask, n subtask collection capacity sum before calculating, rest network bandwidth in the triggering cycle that calculates in itself and step (b) is compared, if the rest network band is wider than front m current pending subtask collection capacity sum, and less than front m+1 current pending subtask collection capacity sum, a front m subtask is obtained, and needing the subtask number of concurrent execution is m; M<n wherein.
  14. 14. method as claimed in claim 13 is characterized in that: determining step is specifically determined concurrent subtasking number according to following steps:
    (iii) obtain the physical processing unit number of harvester;
    The result of (iv) obtaining according to step (i), determine within the triggering cycle can executed in parallel the subtask number, computational methods are: the product of the physical processing unit number of the harvester that the rest network bandwidth that obtains in step (b) and step (iii) obtain is again divided by the average time of implementation of subtask;
    (v) determine the subtask number of concurrent execution within a triggering cycle, its needing to obtain in (ii) for step in triggering cycle that the subtask number of concurrent execution and step obtain in (iv) can concurrent execution the subtask number, the numerical value that both are medium and small;
    (vi) (v) obtain the subtask number of concurrent execution within a triggering cycle, the subtask of the high respective numbers of formation medium priority that sets the tasks is for needing the subtask of concurrent execution, and after carrying out, the priority of other subtask increases according to step.
  15. 15. method as claimed in claim 14 is characterized in that: the acquisition tasks treatment step also comprises:
    Before image data being added data buffer storage formation step, whether the data that judgement is gathered are within setting range, if image data is added the data buffer storage formation, otherwise will generate warning information and warning information will be added the alarm buffer queue, send alarm notification.
  16. 16. method as claimed in claim 15 is characterized in that: also comprise
    The manual intervention interface step is returned to tasks carrying situation, bandwidth and underlying resource utilance, and is fed back to the user, the user by the manual intervention interface according to practical operation situation suspend selectively, cancellation and continuation task.
CN201210135226.3A 2012-05-02 2012-05-02 A kind of method for parallel processing towards mass data and system Active CN103384206B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210135226.3A CN103384206B (en) 2012-05-02 2012-05-02 A kind of method for parallel processing towards mass data and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210135226.3A CN103384206B (en) 2012-05-02 2012-05-02 A kind of method for parallel processing towards mass data and system

Publications (2)

Publication Number Publication Date
CN103384206A true CN103384206A (en) 2013-11-06
CN103384206B CN103384206B (en) 2016-05-25

Family

ID=49491907

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210135226.3A Active CN103384206B (en) 2012-05-02 2012-05-02 A kind of method for parallel processing towards mass data and system

Country Status (1)

Country Link
CN (1) CN103384206B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577263A (en) * 2013-11-07 2014-02-12 广东电网公司佛山供电局 Power quality data real-time collection method and device
CN104199930A (en) * 2014-09-04 2014-12-10 江苏百联软件有限公司 System and method for acquiring and processing data
CN104408054A (en) * 2014-10-29 2015-03-11 深圳市金证科技股份有限公司 Database management system
CN105281962A (en) * 2015-12-03 2016-01-27 成都广达新网科技股份有限公司 System for achieving network management performance collection based on parallel pipelines and working method thereof
CN105388893A (en) * 2015-12-25 2016-03-09 安徽江淮汽车股份有限公司 CAN communication data monitoring method and system based on OBD interface
CN105550274A (en) * 2015-12-10 2016-05-04 曙光信息产业(北京)有限公司 Method and device for querying double-transcript parallel database
CN105610633A (en) * 2016-02-23 2016-05-25 烽火通信科技股份有限公司 Self-sampling system and method for real-time performance in communication equipment
CN105868628A (en) * 2016-03-24 2016-08-17 中国科学院信息工程研究所 An automatic sample behavior collection method and a device and a system therefor
CN106375103A (en) * 2015-07-23 2017-02-01 杭州海康威视数字技术股份有限公司 Alarming data acquisition and transmission method
CN106506282A (en) * 2016-11-30 2017-03-15 国云科技股份有限公司 A kind of monitoring method for improving cloud platform monitoring performance and scale
CN106649140A (en) * 2016-12-29 2017-05-10 深圳前海弘稼科技有限公司 Data processing method, apparatus and system
CN108629420A (en) * 2017-03-22 2018-10-09 埃森哲环球解决方案有限公司 Multimode quantum optimization engine
CN109298917A (en) * 2017-07-25 2019-02-01 沈阳高精数控智能技术股份有限公司 A kind of self-adapting dispatching method suitable for real-time system hybrid task
CN111177782A (en) * 2019-12-30 2020-05-19 智慧神州(北京)科技有限公司 Method and device for extracting distributed data based on big data and storage medium
CN111487920A (en) * 2020-05-26 2020-08-04 上海威派格智慧水务股份有限公司 Data acquisition and processing system
CN112307060A (en) * 2019-08-01 2021-02-02 北京京东振世信息技术有限公司 Method and device for processing picking task
CN113190335A (en) * 2021-05-07 2021-07-30 安徽南瑞中天电力电子有限公司 Multi-task scheduling and collecting method of power collecting terminal and power collecting system
CN113392252A (en) * 2021-06-01 2021-09-14 上海徐毓智能科技有限公司 Data processing method and device
CN113965481A (en) * 2021-10-11 2022-01-21 山东星维九州安全技术有限公司 Network asset detection multitask scheduling optimization method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040008866A1 (en) * 2001-03-05 2004-01-15 Rhoads Geoffrey B. Geographic information systems using digital watermarks
CN1756190A (en) * 2004-09-30 2006-04-05 北京航空航天大学 Distributed performance data acquisition method
CN101141315A (en) * 2007-10-11 2008-03-12 上海交通大学 Network resource scheduling simulation system
CN102375837A (en) * 2010-08-19 2012-03-14 中国移动通信集团公司 Data acquiring system and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040008866A1 (en) * 2001-03-05 2004-01-15 Rhoads Geoffrey B. Geographic information systems using digital watermarks
CN1756190A (en) * 2004-09-30 2006-04-05 北京航空航天大学 Distributed performance data acquisition method
CN101141315A (en) * 2007-10-11 2008-03-12 上海交通大学 Network resource scheduling simulation system
CN102375837A (en) * 2010-08-19 2012-03-14 中国移动通信集团公司 Data acquiring system and method

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577263A (en) * 2013-11-07 2014-02-12 广东电网公司佛山供电局 Power quality data real-time collection method and device
CN104199930A (en) * 2014-09-04 2014-12-10 江苏百联软件有限公司 System and method for acquiring and processing data
CN104199930B (en) * 2014-09-04 2018-07-17 江苏百联软件有限公司 Data acquire and the system and method for processing
CN104408054A (en) * 2014-10-29 2015-03-11 深圳市金证科技股份有限公司 Database management system
CN104408054B (en) * 2014-10-29 2017-10-31 深圳市金证科技股份有限公司 A kind of data base management system
CN106375103B (en) * 2015-07-23 2020-02-21 杭州海康威视数字技术股份有限公司 Alarm data acquisition and transmission method
CN106375103A (en) * 2015-07-23 2017-02-01 杭州海康威视数字技术股份有限公司 Alarming data acquisition and transmission method
CN105281962A (en) * 2015-12-03 2016-01-27 成都广达新网科技股份有限公司 System for achieving network management performance collection based on parallel pipelines and working method thereof
CN105281962B (en) * 2015-12-03 2018-08-28 成都广达新网科技股份有限公司 One kind realizing network management performance acquisition system and its working method based on parallel pipeline
CN105550274A (en) * 2015-12-10 2016-05-04 曙光信息产业(北京)有限公司 Method and device for querying double-transcript parallel database
CN105550274B (en) * 2015-12-10 2019-01-25 曙光信息产业(北京)有限公司 The querying method and device of this parallel database of two-pack
CN105388893A (en) * 2015-12-25 2016-03-09 安徽江淮汽车股份有限公司 CAN communication data monitoring method and system based on OBD interface
CN105388893B (en) * 2015-12-25 2018-02-13 安徽江淮汽车集团股份有限公司 A kind of CAN communication data monitoring method and system based on OBD interfaces
CN105610633A (en) * 2016-02-23 2016-05-25 烽火通信科技股份有限公司 Self-sampling system and method for real-time performance in communication equipment
CN105868628A (en) * 2016-03-24 2016-08-17 中国科学院信息工程研究所 An automatic sample behavior collection method and a device and a system therefor
CN106506282A (en) * 2016-11-30 2017-03-15 国云科技股份有限公司 A kind of monitoring method for improving cloud platform monitoring performance and scale
CN106649140A (en) * 2016-12-29 2017-05-10 深圳前海弘稼科技有限公司 Data processing method, apparatus and system
CN108629420B (en) * 2017-03-22 2022-03-11 埃森哲环球解决方案有限公司 Method for solving optimization task and system of multiple computing resources
CN108629420A (en) * 2017-03-22 2018-10-09 埃森哲环球解决方案有限公司 Multimode quantum optimization engine
CN109298917A (en) * 2017-07-25 2019-02-01 沈阳高精数控智能技术股份有限公司 A kind of self-adapting dispatching method suitable for real-time system hybrid task
CN109298917B (en) * 2017-07-25 2020-10-30 沈阳高精数控智能技术股份有限公司 Self-adaptive scheduling method suitable for real-time system mixed task
CN112307060A (en) * 2019-08-01 2021-02-02 北京京东振世信息技术有限公司 Method and device for processing picking task
CN112307060B (en) * 2019-08-01 2024-04-23 北京京东振世信息技术有限公司 Method and device for processing picking task
CN111177782A (en) * 2019-12-30 2020-05-19 智慧神州(北京)科技有限公司 Method and device for extracting distributed data based on big data and storage medium
CN111487920A (en) * 2020-05-26 2020-08-04 上海威派格智慧水务股份有限公司 Data acquisition and processing system
CN113190335A (en) * 2021-05-07 2021-07-30 安徽南瑞中天电力电子有限公司 Multi-task scheduling and collecting method of power collecting terminal and power collecting system
CN113190335B (en) * 2021-05-07 2023-05-26 安徽南瑞中天电力电子有限公司 Multi-task scheduling and collecting method of power collecting terminal and power collecting system
CN113392252B (en) * 2021-06-01 2023-01-17 上海徐毓智能科技有限公司 Data processing method and device
CN113392252A (en) * 2021-06-01 2021-09-14 上海徐毓智能科技有限公司 Data processing method and device
CN113965481A (en) * 2021-10-11 2022-01-21 山东星维九州安全技术有限公司 Network asset detection multitask scheduling optimization method
CN113965481B (en) * 2021-10-11 2024-06-07 山东星维九州安全技术有限公司 Network asset detection multitask scheduling optimization method

Also Published As

Publication number Publication date
CN103384206B (en) 2016-05-25

Similar Documents

Publication Publication Date Title
CN103384206B (en) A kind of method for parallel processing towards mass data and system
CN110262899A (en) Monitor component elastic telescopic method, apparatus and controlled terminal based on Kubernetes cluster
CN103164279B (en) Cloud computing resources distribution method and system
CN103019853A (en) Method and device for dispatching job task
CN107925612A (en) Network monitoring system, network monitoring method and program
US20120297393A1 (en) Data Collecting Method, Data Collecting Apparatus and Network Management Device
CN107193909A (en) Data processing method and system
CN108845878A (en) The big data processing method and processing device calculated based on serverless backup
CN110912773A (en) Cluster monitoring system and monitoring method for multiple public cloud computing platforms
CN109962856B (en) Resource allocation method, device and computer readable storage medium
CN103098014A (en) Storage system
US9600791B2 (en) Managing a network system
CN105373432B (en) A kind of cloud computing resource scheduling method based on virtual resource status predication
US8856799B2 (en) Managing resources for maintenance tasks in computing systems
Birje et al. Cloud monitoring system: basics, phases and challenges
CN110532086A (en) Resource multiplexing method, equipment, system and storage medium
CN107430526B (en) Method and node for scheduling data processing
CN103945005A (en) Multiple evaluation indexes based dynamic load balancing framework
WO2023221846A1 (en) Computing cluster and data acquisition method and device thereof, and storage medium
CN103309843B (en) The collocation method of server and system
CN103384205B (en) A kind of mass alarm data parallel acquisition system, device and method
CN114244718A (en) Power transmission line communication network equipment management system
CN111339466A (en) Interface management method and device, electronic equipment and readable storage medium
CN103324538A (en) Method for designing dislocated scattered cluster environment distributed concurrent processes
WO2024082861A1 (en) Cloud storage scheduling system applied to video monitoring

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant