CN103384206B - A kind of method for parallel processing towards mass data and system - Google Patents

A kind of method for parallel processing towards mass data and system Download PDF

Info

Publication number
CN103384206B
CN103384206B CN201210135226.3A CN201210135226A CN103384206B CN 103384206 B CN103384206 B CN 103384206B CN 201210135226 A CN201210135226 A CN 201210135226A CN 103384206 B CN103384206 B CN 103384206B
Authority
CN
China
Prior art keywords
task
subtask
data
cycle
queue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210135226.3A
Other languages
Chinese (zh)
Other versions
CN103384206A (en
Inventor
陆忠华
王珏
王彦棡
邓笋根
阚圣哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Network Information Center of CAS
Original Assignee
Computer Network Information Center of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Network Information Center of CAS filed Critical Computer Network Information Center of CAS
Priority to CN201210135226.3A priority Critical patent/CN103384206B/en
Publication of CN103384206A publication Critical patent/CN103384206A/en
Application granted granted Critical
Publication of CN103384206B publication Critical patent/CN103384206B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The present invention relates to the method for parallel processing towards mass data in a kind of network management, especially a kind of method for scheduling task that improves bandwidth and resource utilization, comprise: task to be collected is taken out in timing from database, and task insertion task to be collected is gathered to queue, wait for the execution of acquisition tasks; Gather queue and take out task description from task, utilize situation and/or computational resource service condition that task is carried out to subtask division according to bandwidth resources, subtask is mapped to acquisition tasks processing unit and waits for the execution of subtask; Carry out each subtask with data acquisition task, the data of collection are added to data buffer storage queue, and carry out parallel data processing, the data of handling well are added to memory database; Data in memory database are regularly synchronized in physical database.

Description

A kind of method for parallel processing towards mass data and system
The present invention relates to the method for parallel processing towards mass data in a kind of network management, especially a kind of bandwidth that improvesMethod for scheduling task with resource utilization.
Background technology
Along with the development of network technology and popularizing of network application, user is also more and more higher to the performance requirement of network service.In order to meet better user's demand, provide high-quality service to user, just must ensure the quality of network service., must monitor in real time to the operation conditions of each equipment in network the event of equipment in discovering network in time for this reasonBarrier, thus take appropriate measures stablizing with maintaining network.
For the operation conditions of monitor network equipment, need the operational factor of each equipment in Real-time Collection network, concurrentDeliver to main control computer, by main control computer, the operational factor of the network equipment is analyzed, determine whether equipment occursFault.
In prior art, conventionally distribute acquisition tasks by main control computer, the harvester Real-time Collection network equipment eachItem operational factor, wherein each task need to be processed multiple acquisition targets (equipment to be collected), each acquisition target bagDraw together multiple acquisition indexes (different parameters). But network size is more and more huger, in network management, need to gatherThousands of equipment, each equipment comprises multiple acquisition indexes, thereby need gather data be magnanimity, ifMultiple acquisition tasks are carried out at one time, and by causing, the instantaneous bandwidth taking is excessive, if adopting of executed in parallel simultaneouslyCollected works number of tasks is very few, although can reduce taking of bandwidth, but can waste the computational resource of harvester, therefore asWhat had not only avoided the computational resource that takies too much bandwidth resources but also make full use of harvester is that the technology that the present invention will solve is askedTopic.
Summary of the invention
The object of the invention is acquisition tasks to carry out rational management effectively to utilize the network bandwidth, and make full use of meterOperator resource carries out data acquisition.
For achieving the above object, the present invention proposes following solution:
According to first aspect, the embodiment of the present invention proposes the parallel processing system (PPS) towards mass data in a kind of network management,It is characterized in that comprising with lower module: acquisition tasks is new module more, take out task to be collected for timing from database,And task insertion task to be collected is gathered to queue, wait for the execution of acquisition tasks; Acquisition tasks scheduler module, forGather queue and take out task description from task, utilize situation and/or computational resource service condition to appoint according to bandwidth resourcesSubtask division is carried out in business, subtask is mapped to acquisition tasks processing unit and waits for the execution of subtask; Become data acquisitionSet task, adds data buffer storage queue by the data of collection, and carries out parallel data processing, and the data of handling well are addedEnter memory database; Physics/main memory DBM module, for being regularly synchronized to thing by the data of memory databaseIn reason database.
According on the other hand, the embodiment of the present invention proposes the method for parallel processing towards mass data in a kind of network management,It is characterized in that comprising the following steps: (1) acquisition tasks step of updating, task to be collected is taken out in timing from database,And task insertion task to be collected is gathered to queue, wait for the execution of acquisition tasks; (2) acquisition tasks scheduling step,Gather queue and take out task description from task, utilize situation and/or computational resource service condition to appoint according to bandwidth resourcesSubtask division is carried out in business, subtask is mapped to acquisition tasks processing unit and waits for the execution of subtask; (3) gatherTask processing, carries out each subtask with data acquisition task, and the data of collection are added to data buffer storage queue,And carry out parallel data processing, the data of handling well are added to memory database; (4) physics/main memory DBMStep, is regularly synchronized to the data in memory database in physical database.
Brief description of the drawings
Fig. 1 is system operation hardware structure figure of the present invention;
Fig. 2 is parallel processing system (PPS) structure chart of the present invention;
Fig. 3 is acquisition tasks renewal process of the present invention;
Fig. 4 is acquisition tasks processing procedure of the present invention.
Detailed description of the invention
Below by drawings and Examples, technical scheme of the present invention is described in further detail.
Because the data volume gathering is large, in order to improve the execution efficiency of whole system, the parallel place of mass data of the present inventionReason system may operate in executed in parallel on many harvesters, the hardware structure of its execution as shown in Figure 1, wherein:
Main control computer---be used for distributing acquisition tasks, gather collection result and show user, it comprises multiple physics placeReason unit, can the multiple acquisition tasks of executed in parallel;
Harvester---be used for the equipment of monitoring to carry out data acquisition, the data of collection comprise that multiple parameters of equipment refer toMark, harvester comprises multiple physical processing units, the collecting work of each index is assigned to be a subtask, by oneIndividual physical processing unit completes.
In the present invention, each acquisition tasks need to be processed multiple acquisition targets (equipment to be collected), each acquisition target bagDraw together multiple acquisition indexes (different parameters), the collecting work of each index is assigned to be a subtask by a thingReason processing unit completes. The work of subtask is mainly regularly by network, data to be gathered out from underlying device.
Embodiment mono-
As shown in Figure 2, the parallel processing system (PPS) towards mass data of the present invention comprises: acquisition tasks more new module,Acquisition tasks scheduler module, acquisition tasks processing module, physics/main memory DBM module, monitoring module, systemModule, log management module are monitored in order. Below by the course of work of the modules that makes introductions all round:
(1) more new module of acquisition tasks
Acquisition tasks is new module more, takes out task to be collected for timing from database, and task to be collected is insertedTask gathers queue, waits for the execution of acquisition tasks.
Further preferred version is:
As shown in Figure 3, acquisition tasks more new module from physical database, after taking-up task, judge whether to stopThe acquisition tasks of only moving, the task of if so, stopping carrying out according to task ID, and remove in task and adoptThis task in collection queue; Otherwise whether the inquiry task of carrying out needs to upgrade, if so, new task more, andReload this task; Otherwise task insertion task to be collected is gathered to queue.
(2) acquisition tasks scheduler module
Acquisition tasks scheduler module, takes out task description for gathering queue from task, utilizes feelings according to bandwidth resourcesTask is carried out subtask division by condition, computational resource service condition, subtask is mapped to physical processing unit and waits for sonThe execution of task.
Further preferred version is:
In order to prevent that the accumulation of a large amount of subtasks from appearring in same time point, causes instant bandwidth excessive, according to following mode meterCalculation obtains the subtask number of concurrent execution.
Each collection subtask has the execution cycle (every just repeated acquisition work of regular time) of oneself, andAnd the time of implementation of each subtask in one-period is far smaller than cycle time, for example, it is 1/20 of cycle time.Each subtask comprises following information:
PR---the priority of subtask;
PE---subtask self collection period;
NC---subtask self collection capacity (can calculate in advance).
In order to prevent that the accumulation of a large amount of subtasks from appearring in same time point, cause instant bandwidth excessive, subtask need to be entered stage by stageRow is carried out. Task triggers cycle PGCDFor the current greatest common divisor of carrying out all subtasks collection period.
An acquisition tasks triggers rest network bandwidth in the cycle and is:
Br=c*B*PGCD-Nd
Wherein: c ∈ (0,1]---coefficient;
B---system bandwidth;
PGCD---task triggers the cycle;
Nd---by the execution cycle, (execution cycle can be that an acquisition tasks triggers the cycle, can be also multipleThe triggering cycle of acquisition tasks) data volume that comprises of subtask.
Start pending subtask in the triggering cycle processed according to the order of priority, suppose the current n of a having subtaskSelf collection capacity is respectively:
NC0、NC1、NC2、…、NCm-1、NCm、…NCn-1
If SUM(NC0,…,NCm-1)<=Br<=SUM(NC0,…,NCm), front m subtask is obtained,For the subtask number of the concurrent execution of needs.
This module can be adjusted the subtask number that needs concurrent execution according to current system bandwidth situation B timing.
Further prioritization scheme is:
If the collection subtask number of executed in parallel is very few, although can reduce taking of bandwidth, but can waste harvesterComputational resource, on the basis that for this reason need to make full use of in bandwidth, need to reach the target that makes full use of computational resource.
Subtask number that can be concurrent within the triggering cycle:
NL=PGCD*Nc/Ta
Wherein:
Nc---the physical processing unit number of harvester;
Ta---the average time of implementation of subtask, obtain by historical data.
Within a triggering cycle, needing the subtask number of concurrent execution is N=min(m, NL). By excellent in queueThe top n subtask that first level is high is distributed after execution. The priority P R of other subtask increases.
(3) acquisition tasks processing module
Acquisition tasks processing module, for carrying out each subtask with data acquisition task, adds the data of collectionEnter data buffer storage queue, and carry out parallel data processing, the data of handling well are added to memory database.
Because image data amount is larger, therefore adopt the mode of buffer queue to alleviate database Bonding pressure, from dataIn banked cache queue, take out data, adopt parallel processing.
Further preferred version is:
As shown in Figure 4, before image data being added to data buffer storage queue step, judge that collected data areNo within setting range, if so, image data is added to data buffer storage queue, otherwise will generate warning information alsoWarning information is added to alarm buffer queue, send alarm notification.
(4) physics/main memory DBM module
Physics/main memory DBM module, for being regularly synchronized to physical database by the data of memory databaseIn.
(5) monitoring module
Monitoring module, for setting up heartbeat with other module, monitors other module, and timing will be own and otherThe state of module writes physical database.
(6) monitor module
Monitor module, for receiving after the shutdown command of foreground transmission, closeall module.
(7) log management module
Log management module, for register system event every day.
Further preferred version is:
Should also comprise towards the parallel processing system (PPS) of mass data:
(8) manual intervention interface module
Manual intervention interface module, acquisition tasks scheduler module is by tasks carrying situation, bandwidth and underlying resource utilization rateReturn to more new module of acquisition tasks, then feed back to user by new module more, user grasps by manual intervention interface moduleMake more new module of acquisition tasks, according to practical operation situation suspend selectively, cancellation and continuation task.
Parallel processing system (PPS) in the present embodiment operates in main control computer. All modules all operate on main control computer, only haveAcquisition tasks operates on harvester.
Embodiment bis-
A kind of method for parallel processing towards mass data comprises: acquisition tasks step of updating, acquisition tasks scheduling step,Acquisition tasks treatment step, physics/main memory DBM step, monitoring step, system command are monitored step, daily recordManagement process. Below by the implementation of each step that makes introductions all round:
(1) acquisition tasks step of updating
Task to be collected is taken out in timing from database, and task insertion task to be collected is gathered to queue, waits to be collectedThe execution of task.
Further preferred version is:
From physical database after taking-up task, judge whether the acquisition tasks that need to stop moving, if so,The task of stopping carrying out according to task ID, and remove this task in task collection queue; Otherwise inquiryWhether carrying out of task needs to upgrade, if so, new task more, and reload this task; Otherwise by be collectedBusiness insertion task gathers queue.
(2) acquisition tasks scheduling step
Gather queue and take out task description from task, utilize situation, computational resource service condition to incite somebody to action according to bandwidth resourcesTask is carried out subtask division, subtask is mapped to physical processing unit and waits for the execution of subtask.
Further preferred version is:
In order to prevent that the accumulation of a large amount of subtasks from appearring in same time point, causes instant bandwidth excessive, according to following mode meterCalculation obtains the subtask number of concurrent execution.
Each collection subtask has the execution cycle (every just repeated acquisition work of regular time) of oneself, andAnd the time of implementation of each subtask in one-period is far smaller than cycle time (for example can be for cycle time1/20). Each subtask comprises following information:
PR---the priority of subtask;
PE---subtask self collection period;
NC---subtask self collection capacity (can calculate in advance).
In order to prevent that the accumulation of a large amount of subtasks from appearring in same time point, cause instant bandwidth excessive, subtask need to be entered stage by stageRow is carried out. Task triggers cycle PGCDFor the current greatest common divisor of carrying out all subtasks collection period.
A task triggers rest network bandwidth in the cycle and is:
Br=c*B*PGCD–Nd
Wherein: c ∈ (0,1]---coefficient;
B---system bandwidth;
PGCD---task triggers the cycle;
Nd---the data volume comprising by task in the execution cycle.
Start pending subtask in the triggering cycle processed according to the order of priority, suppose the current n of a having subtaskSelf collection capacity is respectively:
NC0、NC1、NC2、…、NCm-1、NCm、…NCn-1
If SUM(NC0,…,NCm-1)<=Br<=SUM(NC0,…,NCm), front m subtask is obtained,For needing the subtask number of parallel processing.
Can adjust and need the subtask of executed in parallel number according to current system bandwidth situation B timing.
Further prioritization scheme is:
If the collection subtask number of executed in parallel is very few, although can reduce taking of bandwidth, but can waste harvesterComputational resource, on the basis that for this reason need to make full use of in bandwidth, need to reach the target that makes full use of computational resource.
Subtask number that can be concurrent within the triggering cycle:
NL=PGCD*Nc/Ta
Wherein:
Nc---the physical processing unit number of harvester;
Ta---the average time of implementation of subtask, obtain by historical data.
Within a triggering cycle, needing the subtask number of concurrent execution is N=min(m, NL). By excellent in queueExecution is distributed in the top n subtask that first level is high. The priority P R of other subtask increases.
(3) acquisition tasks treatment step
Carry out each subtask with data acquisition task, the data of collection are added to data buffer storage queue, and carry outParallel data processing, adds memory database by the data of handling well.
Because image data amount is larger, therefore adopt the mode of buffer queue to alleviate database Bonding pressure, from dataIn banked cache queue, take out data, adopt parallel processing.
Further preferred version is:
As shown in Figure 4, before image data being added to data buffer storage queue step, judge that collected data areNo within setting range, if so, image data is added to data buffer storage queue, otherwise will generate warning information alsoWarning information is added to alarm buffer queue, send alarm notification.
(4) physics/main memory DBM step
Data in memory database are regularly synchronized in physical database.
(5) monitoring step
Set up heartbeat, each process is monitored, each state of a process is write physical database by timing.
(6) monitor step
Receive after the shutdown command of foreground transmission closeall process.
(7) log management step
Register system event every day.
Further preferred version is:
Should also comprise towards the method for parallel processing of mass data:
(8) manual intervention step
Acquisition tasks implementation status, bandwidth and underlying resource utilization rate are returned and fed back to user, and user is by manual interventionInterface upgrades acquisition tasks, according to practical operation situation suspend selectively, cancellation and continuation task.
Embodiment tri-
Data acquisition task scheduling system in a kind of NMS, is characterized in that:
(1) task management module, gathers queue and takes out task description from task;
(2) module is divided in subtask, utilizes situation that task is carried out to subtask division according to bandwidth resources, wherein: everyIndividual collection subtask has the execution cycle of oneself, every just repeated acquisition work of regular time, and every heightThe time of implementation of task in one-period is far smaller than cycle time (can be for example 1/20 of cycle time), everyIndividual subtask comprises following information:
PR---the priority of subtask;
PE---subtask self collection period;
NC---subtask self collection capacity, it can calculate in advance;
(3) mapping block, is mapped to subtask the execution of physical processing unit wait subtask,
(4) determination module, determine the subtask number of concurrent execution according to following steps:
(a) calculation task triggers cycle PGCD, its for current all subtasks collection period of carrying outLarge common divisor;
(b) calculate as follows rest network bandwidth in the triggering cycle:
Br=c*B*PGCD-Nd
Wherein: c ∈ (0,1]---coefficient;
B---system bandwidth;
PGCD---task triggers the cycle;
Nd---the data volume comprising by task in the execution cycle;
(c) rest network bandwidth in the triggering cycle obtaining according to step (b), determines and carries out concurrent placeThe subtask number of reason, determine that mode is as follows:
(i) in the triggering cycle, pending subtask is arranged according to priority orders;
(ii) self collection capacity of a current pending n subtask is respectively: NC0、NC1、NC2、…、NCm-1、NCm、…NCn-1
If SUM(NC0,…,NCm-1)<=Br<=SUM(NC0,…,NCm), front m subtaskObtained;
(5) Executive Module, the determined subtask of executed in parallel number.
Further preferred version is, adjusts the subtask number of executed in parallel according to current bandwidth conditions B timing.
In the time that parallel processing is carried out in collection subtask, not only need taking into account system bandwidth, also should consider harvester simultaneouslyComputational resource utilize situation, for reaching the target that makes full use of computational resource, to further preferred side of the present embodimentCase is:
Executive Module is pressed according to following steps subtasking:
According to the computational resource situation of harvester determine within the triggering cycle can executed in parallel subtask number,
NL=PGCD*Nc/Ta
Wherein:
Nc---the physical processing unit number of harvester;
Ta---the average time of implementation of subtask, it obtains by historical data;
(ii) utilize situation and computational resource situation according to bandwidth resources, determine within a triggering cycle concurrent executionSubtask number is N=min(m, NL);
(iii) top n subtask high queue medium priority is distributed and carried out, the priority P R of other subtask increasesAdd.
Embodiment tetra-
Data acquisition method for scheduling task in a kind of NMS, is characterized in that:
(1) gather queue and take out task description from task;
(2) utilize situation that task is carried out to subtask division according to bandwidth resources, wherein: each collection subtask hasThe execution cycle of oneself, every just repeated acquisition work of regular time, and each subtask is in one-periodTime of implementation be far smaller than cycle time, each subtask comprises following information:
PR---the priority of subtask;
PE---subtask self collection period;
NC---subtask self collection capacity, it can calculate in advance;
(3) subtask is mapped to physical processing unit and waits for the execution of subtask,
(4) determine the subtask number of concurrent execution according to following steps:
(a) calculation task triggers cycle PGCD, its for current all subtasks collection period of carrying outLarge common divisor;
(b) calculate as follows rest network bandwidth in the triggering cycle:
Br=c*B*PGCD-Nd
Wherein: c ∈ (0,1]---coefficient;
B---system bandwidth;
PGCD---task triggers the cycle;
Nd---the data volume comprising by task in the execution cycle;
(c) rest network bandwidth in the triggering cycle obtaining according to step (b), determines and carries out concurrent placeThe subtask number of reason, determine that mode is as follows:
(i) in the triggering cycle, pending subtask is arranged according to priority orders;
(ii) self collection capacity of a current pending n subtask is respectively: NC0、NC1、NC2、…、NCm-1、NCm、…NCn-1,
If SUM(NC0,…,NCm-1)<=Br<=SUM(NC0,…,NCm), front m subtaskObtained;
(d) the determined subtask of executed in parallel number.
Further preferred version is, adjusts the subtask number of executed in parallel according to current bandwidth conditions B timing.
In the time that parallel processing is carried out in collection subtask, not only need taking into account system bandwidth, also should consider harvester simultaneouslyComputational resource utilize situation, for reaching the target that makes full use of computational resource, to further preferred side of the present embodimentCase is:
Step (d) specifically comprises the steps:
According to the computational resource situation of harvester determine within the triggering cycle can executed in parallel subtask number,
NL=PGCD*Nc/Ta
Wherein:
Nc---the physical processing unit number of harvester;
Ta---the average time of implementation of subtask, it obtains by historical data;
(ii) utilize situation and computational resource situation according to bandwidth resources, determine within a triggering cycle concurrent executionSubtask number is N=min(m, NL);
(iii) top n subtask high queue medium priority is distributed and carried out, the priority P R of other subtask increasesAdd.
By technical scheme disclosed by the invention, can carry out rational management effectively to utilize the network bandwidth to acquisition tasks,And utilize bottom computational resource to carry out analyzing and processing, solve the technical problem existing in prior art.
Above-described detailed description of the invention, has carried out further object of the present invention, technical scheme and beneficial effectDescribe in detail, institute it should be understood that and the foregoing is only the specific embodiment of the present invention, and is not used in restrictionProtection scope of the present invention, within the spirit and principles in the present invention all, any amendment of making, be equal to replacement, changeEnter etc., within all should being included in protection scope of the present invention.

Claims (14)

  1. In network management towards a parallel processing system (PPS) for mass data, it is characterized in that comprising with counterdiePiece:
    Acquisition tasks is new module more, takes out task to be collected for timing from database, and by task to be collectedInsertion task gathers queue, waits for the execution of acquisition tasks;
    Acquisition tasks scheduler module, takes out task description for gathering queue from task, according to bandwidth resources profitBy situation and/or computational resource service condition, task is carried out to subtask division, subtask is mapped to gather and appointsBusiness processing unit is waited for the execution of subtask;
    Acquisition tasks processing module, for carrying out each subtask with data acquisition task, by the number gatheringAccording to adding data buffer storage queue, and carry out parallel data processing, the data of handling well are added to memory database;
    Physics/main memory DBM module, for being regularly synchronized to physics number by the data of memory databaseIn storehouse;
    Wherein, described acquisition tasks scheduler module comprises:
    Module is divided in subtask, utilizes situation that task is gathered to subtask divide according to bandwidth resources, eachSubtask comprises following information: priority, collection period and collection capacity; Wherein, collection period represents every heightThe execution cycle of task, every regular time repeated acquisition work, and each subtask is at one-periodThe interior time of implementation is much smaller than cycle time;
    Determination module, determine the subtask number of concurrent execution according to following steps:
    (a) determine triggering cycle of multiple acquisition tasks, it is current all subtasks of carrying outThe greatest common divisor of collection period;
    (b) determine that a task triggers rest network bandwidth in the cycle;
    (c), according to rest network bandwidth, determine the subtask number that carries out concurrent processing;
    Mapping block, will determine that the subtask of concurrent execution is mapped to acquisition tasks processing module and waits for subtaskCarry out.
  2. 2. the system as claimed in claim 1, is characterized in that:
    Acquisition tasks more new module after taking-up task, judges whether need to stop moving from physical databaseAcquisition tasks, the task of if so, stopping carrying out according to task ID or remove task gather queueIn task; Otherwise whether the inquiry task of carrying out needs to upgrade, if so, new task more, and againLoad this task; Otherwise task insertion task to be collected is gathered to queue.
  3. 3. the system as claimed in claim 1, is characterized in that, described step (b) is determined rest network bandWide method is: the product that system bandwidth and task trigger the cycle is multiplied by a coefficient again, and the result obtaining deducts againThe data volume comprising by task in the execution cycle, is a rest network band in the task triggering cycleWide; Wherein coefficient be (0,1] constant.
  4. 4. system as claimed in claim 3, is characterized in that, described step (c) is determined concurrent processingSubtask counting method is:
    (i), in the triggering cycle, current pending subtask is arranged according to priority orders;
    (ii) obtain the collection capacity of each subtask in current pending subtask, before calculating, adopt n subtaskCollection amount sum, compares rest network bandwidth in the triggering cycle calculating in itself and step (b), if rest network band is wider than front m current pending subtask collection capacity sum, and be less than front m+1Individual current pending subtask collection capacity sum, front m subtask is obtained, needs the son of concurrent execution to appointBusiness number is m; Wherein m < n.
  5. 5. system as claimed in claim 4, is characterized in that: determination module is further according to following stepsDetermine concurrent subtasking number:
    (iii) obtain the physical processing unit number of harvester;
    (iv) result of obtaining according to step (i), determine within the triggering cycle can executed in parallel son appointBusiness number, computational methods are: the collection that the rest network bandwidth obtaining in step (b) and step (iii) obtainThe product of the physical processing unit number of machine is again divided by the average time of implementation of subtask;
    (v) determine the subtask number of concurrent execution within a triggering cycle, it is in step (ii), to obtain needingSubtask that can concurrent execution in the triggering cycle of wanting to obtain in the subtask number of concurrent execution and step (iv)Number, the numerical value that both are medium and small;
    (vi) obtain the subtask number of concurrent execution within a triggering cycle according to step (v), determine and appointThe subtask of the respective numbers that business queue medium priority is high is the subtask that needs concurrent execution, other son after carrying outThe priority of task increases.
  6. 6. system as claimed in claim 5, is characterized in that: acquisition tasks processing module is also for carrying out:
    Before image data being added to data buffer storage queue step, judge whether collected data are setting modelWithin enclosing, if so, image data is added to data buffer storage queue, otherwise will generate warning information and by alarmInformation adds alarm buffer queue, sends alarm notification.
  7. 7. system as claimed in claim 6, is characterized in that: also comprise
    Manual intervention interface module, acquisition tasks scheduler module is by tasks carrying situation, bandwidth and underlying resource profitReturn to more new module of acquisition tasks by rate, then feed back to user by new module more, user connects by manual interventionMore new module of mouth module operation acquisition tasks, suspends selectively, cancels or continue according to practical operation situation and appointBusiness.
  8. In network management towards a method for parallel processing for mass data, it is characterized in that comprising following stepRapid:
    (1) acquisition tasks step of updating, task to be collected is taken out in timing from database, and by be collectedBusiness insertion task gathers queue, waits for the execution of acquisition tasks;
    (2) acquisition tasks scheduling step, gathers queue and takes out task description from task, according to bandwidth resourcesUtilize situation and/or computational resource service condition that task is carried out to subtask division, subtask is mapped to collectionTask processing unit is waited for the execution of subtask;
    (3) acquisition tasks processing, carries out each subtask with data acquisition task, by the data that gatherAdd data buffer storage queue, and carry out parallel data processing, the data of handling well are added to memory database;
    (4) physics/main memory DBM step, is regularly synchronized to physics number by the data in memory databaseIn storehouse;
    Wherein, described acquisition tasks scheduling step specifically comprises:
    Obtaining step, gathers queue and takes out task description from task;
    Subtask partiting step, utilizes situation that task is gathered to subtask according to bandwidth resources and divides, eachSubtask comprises following information: priority, collection period and collection capacity; Wherein, collection period represents every heightThe execution cycle of task, every regular time repeated acquisition work, and each subtask is at one-periodThe interior time of implementation is much smaller than cycle time;
    Determining step, determine and comprise the subtask number of concurrent execution:
    (a) determine triggering cycle of multiple acquisition tasks, it is current all subtasks of carrying outThe greatest common divisor of collection period;
    (b) determine that a task triggers rest network bandwidth in the cycle;
    (c), according to rest network bandwidth, determine the subtask number that carries out concurrent processing;
    Mapping step, is mapped to physical processing unit by the subtask of determining concurrent execution and waits for holding of subtaskOK.
  9. 9. method as claimed in claim 8, is characterized in that:
    From physical database after taking-up task, judge whether the acquisition tasks that need to stop moving, if so,Stop carrying out or removing in task gathering the task in queue according to task ID; Otherwise inquiry is heldWhether the task of row needs to upgrade, if so, new task more, and reload this task; Otherwise by be collectedTask insertion task gathers queue.
  10. 10. method as claimed in claim 9, is characterized in that, described step (b) is determined rest networkThe method of bandwidth is: the product that system bandwidth and task trigger the cycle is multiplied by a coefficient again, and the result obtaining subtracts againGo the data volume comprising by task in the execution cycle, be a rest network band in the task triggering cycleWide; Wherein coefficient be (0,1] constant.
  11. 11. methods as claimed in claim 10, is characterized in that, described step (c) is determined concurrent processingSubtask counting method be:
    (i), in the triggering cycle, current pending subtask is arranged according to priority orders;
    (ii) obtain the collection capacity of each subtask in current pending subtask, before calculating, adopt n subtaskCollection amount sum, compares rest network bandwidth in the triggering cycle calculating in itself and step (b), if rest network band is wider than front m current pending subtask collection capacity sum, and be less than front m+1Individual current pending subtask collection capacity sum, front m subtask is obtained, needs the son of concurrent execution to appointBusiness number is m; Wherein m < n.
  12. 12. methods as claimed in claim 11, is characterized in that: determining step is specifically according to following stepsDetermine concurrent subtasking number:
    (iii) obtain the physical processing unit number of harvester;
    (iv) result of obtaining according to step (i), determine within the triggering cycle can executed in parallel son appointBusiness number, computational methods are: the collection that the rest network bandwidth obtaining in step (b) and step (iii) obtainThe product of the physical processing unit number of machine is again divided by the average time of implementation of subtask;
    (v) determine the subtask number of concurrent execution within a triggering cycle, it is in step (ii), to obtain needingSubtask that can concurrent execution in the triggering cycle of wanting to obtain in the subtask number of concurrent execution and step (iv)Number, the numerical value that both are medium and small;
    (vi) obtain the subtask number of concurrent execution within a triggering cycle according to step (v), determine and appointThe subtask of the respective numbers that business queue medium priority is high is the subtask that needs concurrent execution, other son after carrying outThe priority of task increases.
  13. 13. methods as claimed in claim 12, is characterized in that: acquisition tasks treatment step also comprises:
    Before image data being added to data buffer storage queue step, judge whether collected data are setting modelWithin enclosing, if so, image data is added to data buffer storage queue, otherwise will generate warning information and by alarmInformation adds alarm buffer queue, sends alarm notification.
  14. 14. methods as claimed in claim 13, is characterized in that: also comprise
    Manual intervention interface step, returns to tasks carrying situation, bandwidth and underlying resource utilization rate, and feedbackGive user, user is suspended selectively, cancels according to practical operation situation by manual intervention interface or continues timesBusiness.
CN201210135226.3A 2012-05-02 2012-05-02 A kind of method for parallel processing towards mass data and system Active CN103384206B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210135226.3A CN103384206B (en) 2012-05-02 2012-05-02 A kind of method for parallel processing towards mass data and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210135226.3A CN103384206B (en) 2012-05-02 2012-05-02 A kind of method for parallel processing towards mass data and system

Publications (2)

Publication Number Publication Date
CN103384206A CN103384206A (en) 2013-11-06
CN103384206B true CN103384206B (en) 2016-05-25

Family

ID=49491907

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210135226.3A Active CN103384206B (en) 2012-05-02 2012-05-02 A kind of method for parallel processing towards mass data and system

Country Status (1)

Country Link
CN (1) CN103384206B (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577263A (en) * 2013-11-07 2014-02-12 广东电网公司佛山供电局 Power quality data real-time collection method and device
CN104199930B (en) * 2014-09-04 2018-07-17 江苏百联软件有限公司 Data acquire and the system and method for processing
CN104408054B (en) * 2014-10-29 2017-10-31 深圳市金证科技股份有限公司 A kind of data base management system
CN106375103B (en) * 2015-07-23 2020-02-21 杭州海康威视数字技术股份有限公司 Alarm data acquisition and transmission method
CN105281962B (en) * 2015-12-03 2018-08-28 成都广达新网科技股份有限公司 One kind realizing network management performance acquisition system and its working method based on parallel pipeline
CN105550274B (en) * 2015-12-10 2019-01-25 曙光信息产业(北京)有限公司 The querying method and device of this parallel database of two-pack
CN105388893B (en) * 2015-12-25 2018-02-13 安徽江淮汽车集团股份有限公司 A kind of CAN communication data monitoring method and system based on OBD interfaces
CN105610633A (en) * 2016-02-23 2016-05-25 烽火通信科技股份有限公司 Self-sampling system and method for real-time performance in communication equipment
CN105868628A (en) * 2016-03-24 2016-08-17 中国科学院信息工程研究所 An automatic sample behavior collection method and a device and a system therefor
CN106506282A (en) * 2016-11-30 2017-03-15 国云科技股份有限公司 A kind of monitoring method for improving cloud platform monitoring performance and scale
CN106649140A (en) * 2016-12-29 2017-05-10 深圳前海弘稼科技有限公司 Data processing method, apparatus and system
US10095981B1 (en) * 2017-03-22 2018-10-09 Accenture Global Solutions Limited Multi-state quantum optimization engine
CN109298917B (en) * 2017-07-25 2020-10-30 沈阳高精数控智能技术股份有限公司 Self-adaptive scheduling method suitable for real-time system mixed task
CN112307060B (en) * 2019-08-01 2024-04-23 北京京东振世信息技术有限公司 Method and device for processing picking task
CN111177782A (en) * 2019-12-30 2020-05-19 智慧神州(北京)科技有限公司 Method and device for extracting distributed data based on big data and storage medium
CN111487920A (en) * 2020-05-26 2020-08-04 上海威派格智慧水务股份有限公司 Data acquisition and processing system
CN113190335B (en) * 2021-05-07 2023-05-26 安徽南瑞中天电力电子有限公司 Multi-task scheduling and collecting method of power collecting terminal and power collecting system
CN113392252B (en) * 2021-06-01 2023-01-17 上海徐毓智能科技有限公司 Data processing method and device
CN113965481B (en) * 2021-10-11 2024-06-07 山东星维九州安全技术有限公司 Network asset detection multitask scheduling optimization method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1756190A (en) * 2004-09-30 2006-04-05 北京航空航天大学 Distributed performance data acquisition method
CN101141315A (en) * 2007-10-11 2008-03-12 上海交通大学 Network resource scheduling simulation system
CN102375837A (en) * 2010-08-19 2012-03-14 中国移动通信集团公司 Data acquiring system and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7197160B2 (en) * 2001-03-05 2007-03-27 Digimarc Corporation Geographic information systems using digital watermarks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1756190A (en) * 2004-09-30 2006-04-05 北京航空航天大学 Distributed performance data acquisition method
CN101141315A (en) * 2007-10-11 2008-03-12 上海交通大学 Network resource scheduling simulation system
CN102375837A (en) * 2010-08-19 2012-03-14 中国移动通信集团公司 Data acquiring system and method

Also Published As

Publication number Publication date
CN103384206A (en) 2013-11-06

Similar Documents

Publication Publication Date Title
CN103384206B (en) A kind of method for parallel processing towards mass data and system
CN110262899B (en) Monitoring component elastic expansion method and device based on Kubernetes cluster and controlled terminal
CN108683720B (en) Container cluster service configuration method and device
CN110308983B (en) Resource load balancing method and system, service node and client
CN103019853A (en) Method and device for dispatching job task
CN103152393B (en) A kind of charging method of cloud computing and charge system
CN103246592B (en) A kind of monitoring acquisition system and method
US20140282520A1 (en) Provisioning virtual machines on a physical infrastructure
CN108845878A (en) The big data processing method and processing device calculated based on serverless backup
CN107193909A (en) Data processing method and system
CN109962856B (en) Resource allocation method, device and computer readable storage medium
US9244721B2 (en) Computer system and divided job processing method and program
JP2008527514A5 (en)
CN108241528B (en) Dynamic acquisition method for mass network security data customized by user
CN102541460A (en) Multiple disc management method and equipment
CN105975047B (en) Cloud data center regulating power consumption method and system
CN110888714A (en) Container scheduling method, device and computer-readable storage medium
CN111459641B (en) Method and device for task scheduling and task processing across machine room
WO2020172852A1 (en) Computing resource scheduling method, scheduler, internet of things system, and computer readable medium
Birje et al. Cloud monitoring system: basics, phases and challenges
CN111638959A (en) Elastic expansion method based on load regression prediction in cloud environment and readable storage medium
CN103384205B (en) A kind of mass alarm data parallel acquisition system, device and method
CN103309843B (en) The collocation method of server and system
WO2016153401A1 (en) Methods and nodes for scheduling data processing
CN115080341A (en) Computing cluster and data acquisition method, equipment and storage medium thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant