CN102209118A

CN102209118A - Distributed mass data gathering method

Info

Publication number: CN102209118A
Application number: CN2011101541171A
Authority: CN
Inventors: 周关力
Original assignee: Chengdu Qinzhi Digital Technology Co Ltd
Current assignee: Chengdu Qinzhi Digital Technology Co Ltd
Priority date: 2011-06-10
Filing date: 2011-06-10
Publication date: 2011-10-05

Abstract

The invention discloses a distributed mass data gathering method. The method comprises the followings steps: a user configures acquirer connection information on a centre machine; B, a device administrator configures multiple groups of gathering rule files according to the data structure difference of each acquirer on the centre machine; C, the user configures corresponding gathering rule files for acquirers on the centre machine; D, the centre machine starts an acquiring timer to automatically connect the acquirers according to the connection information; E, the centre machine acquires required data according to the gathering rule files configured for the acquirers and the and the acquired data are sent back to the centre machine; F, the centre machine compresses and counts the acquired data according to the gathering rule file configured for each acquire, and stores results in each storage module; and G, the centre machine starts a graded gathering timer to gather the data in the storage modules regularly in a graded manner. The acquirers and the centre machine can be produced by different manufacturers and can have different data structures.

Description

A kind of distributed mass data assemblage method

Technical field

The present invention relates to data warehouse and data mining technology.

Background technology

Along with greatly developing of informatization, each large enterprises begins to enable the network service and comes management enterprise information, along with the equipment that the development network service of business is required is also increasing; Just need monitor automatically in order to ensure the available of network service all devices; Simultaneously in order to ensure each large enterprises of viability of monitor data adopt substantially multiple special watch-dog to whole network carry out in all directions, real-time monitoring.

Along with the long-play of watch-dog, obtained huge monitor data, just need converge the efficient of checking that improves monitor data to data; While just need be rejected invalid, overlapped data to the centralized Analysis of all monitor datas along with multi-vendor multiple watch-dog enables, and monitor data is stored with the statistics structure.Conventional method is: 1) set up a data center machine manual installation and create database; 2) manual data with each supervisory control system are copied to data center's machine, keep source data data structure and initial data on data center's machine; 3) storage that the manual compiling database script is identical with data structure needs the manual or semi-automatic unique identification that is provided with in the middle of same memory device; 4) when obtaining comprehensive statistics information, need adopt different querying commands, add up again after desired data is obtained one by one according to different data structures.So need a method to solve or optimize the problems referred to above.

Summary of the invention

There is the huge retrieval difficulty of data in purpose of the present invention at existing technology lower network monitor data, monitor data is integrated problem such as difficulty and is proposed a kind of distributed mass data assemblage method under supervisory control system diversification, the data structure diversification, to improve the efficient of checking, classifying to monitor data.

To achieve these goals, the invention discloses a kind of distributed mass data assemblage method, this method may further comprise the steps:

A, user dispose the harvester link information on center machine;

The steps A link information comprises following content: 1) harvester equipment link information; 2) harvester data obtain manner and call parameters; Obtain manner comprises two kinds: direct-connected mode of database and system interface obtain manner.

Also inequality according to its corresponding parameters of selected data obtain manner; The direct-connected mode parameter of database is: data bank network address, database side slogan, type of database, database-name, database login user name, database land password; System interface obtain manner parameter is: the system service network address, system service port numbers, System Privileges user name, System Privileges password, the described framework of interface or system.

The direct-connected mode parameter of above-mentioned database, described property data base type comprises as follows: Oracle, MySql, SqlServer, Sybase, DB.

Said system interface obtain manner parameter, described framework of described feature interface or system comprise as follows: webservice, corba, socket, snmp, TL1.

Dispose the harvester link information guaranteeing correct, and center machine will once be tested connection, and after test is passed through, center machine will be stored the link information of this harvester.

The described center machine of steps A is the leading machine of this method, but can also be the center machine cluster for the separate unit center machine; Described harvester is the center machine of diverse network performance collection equipment; Center machine can be used different data structures for different vendor, different operators, different O﹠M merchant with harvester.Harvester can use different data structures for different vendor, different operators, different O﹠M merchant with harvester.

B, equipment manager are on center machine, and according to each harvester data structure difference, the many groups of configuration converge rule file;

Converging rule file mainly describes: data obtain manner method, how the data that obtain are resolved, how to be stored in center machine.

Described in the step B, rule file mainly is divided into two kinds of templates according to data obtain manner difference: template one mainly needs to be defined as follows content for the rule file of the direct-connected mode of database, file: required data structure, querying command, the data structure position of fetched data correspondence on center machine that obtains data; Template two is the rule file of system interface obtain manner, and file mainly needs to be defined as follows content: each concrete data correspondence data structure position on center machine in the parameter of the required interface method name of calling, required transmission, the form of return data and parsing template, the return data.

Can dispose a plurality of rule files according to the harvester internal data structure is one group of rule file.

Step B is that center machine equipment manager or O﹠M personnel are configured.

C, user select the corresponding rule file that converges for harvester on center machine;

Step C: the user is that the harvester that is disposed is selected its pairing rule file that converges.

Harvester can be selected a plurality of rule files in the same group of rule file, and same group of rule file can be selected by the harvester of a plurality of same data structure.Like this can be effectively be reused and be convenient to unified the modification, increase work efficiency converging rule file.

D, center machine start gather timer according to link information from being dynamically connected harvester;

Among the step D, center machine will be called automatically and gather timer connection harvester, gather timer and will carry out every day once, and the concrete time of implementation is configured by equipment manager.

E, center machine join according to harvester and converge rule file and obtain monitor data;

The step e center machine converges the data that rule file is obtained last one day according to this harvester is selected, directly inquires about desired data or call the harvester system interface obtaining this interface return data.

F, center machine are compressed obtaining data statistics and the result are stored in each memory module according to rule file that each harvester is joined;

Each memory module described in the step F: center machine will mark off a plurality of memory modules according to Data Source, harvester type, harvester end data structure.

In center machine the acquisition data are compressed statistics by the time in the step F, with statistics back data with unified data structure storage to different memory modules; Take different operations according to gathering link information selected data obtain manner during statistics:

1, when the data obtain manner is the direct-connected mode of database, center machine data and desired data, only permitted hour being that unit compresses statistics to data on time, obtain maximum hourly, mean value, minimum value, total value and maximum and minimum value place time point, statistics back data also can be stored to each memory module sky table with the result according to one day the data of data statistics of hour meter with unified data structure storage to different memory module hour meter.

2, when the data obtain manner is the system interface obtain manner, the data of being returned are at first resolved and format to center machine according to rule file, and then obtain needed data, reject after invalid, the hash hour being that unit compresses statistics to valid data by the time according to rule file, obtain maximum hourly, mean value, minimum value, total value and maximum and minimum value place time point, with statistics back data with unified data structure storage to different memory modules.

Center machine is set at same categorical data with several monitor datas, and its data structure in center machine makes things convenient for storage and uniform, unified inquiry with unanimity.

All there are a plurality of hour meters, a plurality of days tables, a plurality of weekly form, a plurality of menology, a plurality of annual report in the center machine in each memory module; Specifically based on the data type of preset value, each memory module will be data under this module, and affiliated each type is created one group of timetable promptly: hour, day, the week, month, chronology.

G, center machine start classification and converge timer, regularly data staging in the memory module are converged; Timer is converged in classification can converge, realize inquiring about preprocessing function to the data in the memory module by week, 3 ranks in month, year automatically, when the user passed through each data of center machine query statistic, center machine can be automatically according to the time range inquiry different stage table of being inquired about.

Step G: center machine will start classification automatically and converge timer promptly: converge Zhou Huiju timer, the moon timer, year converge timer; Concrete time execution date: the Zhou Huiju timer is carried out weekly once, converge that timer was carried out once in every month the moon, converge that timer is annual to be carried out once year; The concrete time of implementation is configurable, is configured by equipment manager.

The Zhou Huiju timer will serve as the data in a last week of basis statistics and the result will be stored in the middle of the weekly form of each memory module with the sky table;

Converging timer the moon will serve as the data in basis statistics last January and the result will be stored in the middle of the menology of each memory module with the sky table;

Converge timer year and will be the data of basis statistics last one year and the result is stored in the middle of the chronology of each memory module with the menology.

The outstanding advantage of above-mentioned distributed mass data assemblage method is: 1) center machine integrated data base automatically, need not the client arrange personnel to install separately, reduced personnel input, provide cost savings; 2) the center machine data of can be automatically obtaining each harvester according to rule need not manually copy, its advantage is: a, the mistake of avoiding manual operation to cause, b, full automatic collecting work have improved operating efficiency to greatest extent, c, automatically acquisition time is configurable, and data acquisition is more timely more accurate; 3) center machine is carried out the rejecting operation and the data qualification of invalid data automatically according to configuration file to institute's image data, and manual operation that need not be loaded down with trivial details exists when having avoided manual compiling to reject order and sort command and writes wrong possibility; 4) center machine adopts consolidation form to be stored to different memory modules according to data type automatically, and data structure has more stratification; 4) center machine is carried out preliminary treatment by timer to data, and the data of improve inquiry velocity to greatest extent during the data query statistics, being added up are also more accurate.

Description of drawings

The present invention will illustrate by example and with reference to the mode of accompanying drawing, wherein

Fig. 1 is that workflow is always schemed.

Fig. 2 is a data pick-up assemblage method schematic diagram.

Fig. 3 is according to regular statistical method schematic diagram.

Fig. 4 is the data pick-up method flow diagram.

Embodiment

Disclosed all features in this specification, or the step in disclosed all methods or the process except mutually exclusive feature and/or step, all can make up by any way.

Disclosed arbitrary feature in this specification (comprising any accessory claim, summary and accompanying drawing) is unless special narration all can be replaced by other equivalences or the alternative features with similar purpose.That is, unless special narration, each feature is an example in a series of equivalences or the similar characteristics.

The present invention is described further below in conjunction with accompanying drawing

As follows as Fig. 1, the inventive method basic procedure: the user is in the link information, harvester data obtain manner and the call parameters that are converging configuration harvester on the center machine; The selected rule file that converges by the equipment manager configuration is verified automatically by center machine, checking by the back to joining information store; Center machine will be called the collection timer automatically, and the timing acquiring data are to center machine; Center machine inside will call automatically converge timer to data on time between rank: the week, month, year, converge.

In the present invention, the data structure of center machine is originated as Fig. 2, and Data Source of the present invention is from each harvester, and center machine self is not carried out concrete device data acquisition operations; Center machine is at first according to the harvester link information that is disposed with converge rule file and obtain monitor data; Secondly center machine is carried out validation verification, is rejected invalid data and data compression statistics data according to converging rule file; To add up the back storage in the middle of the corresponding stored module according to Data Source, harvester type, harvester end data structure at last.

Data compression statistics and storage concrete steps are as follows:

1) with data one hour for unit gathers, the result is stored in the middle of the hour meter corresponding in the memory module;

2) according to 1) data after gathering are foundation, are that unit converges once more with the sky with data, the result is stored in the middle of the sky table corresponding in the memory module.

Center machine specifically obtains the image data method such as Fig. 4 concrete steps are as follows:

A, user dispose the harvester link information in center machine: data access mode: the direct-connected mode of database; Data bank network address: 172.16.104.2; Type of database: mysql; Database service port numbers: 3066; Database service title: oral; Database user name: root; Password is logined in the numerical control storehouse: root; Rule file is converged in selection: the version1.1 file group.

B, center machine start gathers timer, gathers timer and disposes the harvester link information according to the user, initiatively connects harvester.

C, after center machine connects the harvester success, according to the data query information that is disposed in the version1.1 file group, the data oneself of a required collection are inquired about and are obtained, and the data after will obtaining are beamed back center machine.

C, center machine are added up the gained data by the time, add up at first by the hour, calculate the concrete time point of maximum hourly, minimum value, mean value, total value and maximum, minimum value; Add up by the sky again, calculate maximum, minimum value, mean value, total value and the maximum of every day, the concrete time point of minimum value.

D, data based after will adding up carry out the subregion storage without data source, different system, different pointer type, make data have stratification, compartmentalization, rankization so that follow-up data query statistics.

E, center machine will be called automatically and converge timer, the storage data are carried out further convergence processing, carried out statistical summaries as Fig. 3 the inventive method in the data that regularly start in the once Zhou Huiju timer centring machine week on used day table weekly, the statistical value in a last week is deposited in the middle of each corresponding weekly form; Regularly started once the data that converge used day table of timer centring machine January of lasting the moon in every month and carry out further statistical summaries, the statistical value in last January is deposited in the middle of each menology of correspondence; Annual regularly start a next year and converge the data of all menologies of timer centring machine last one year and carry out further statistical summaries, the statistical value of last one year is deposited in the middle of each corresponding chronology; When the user passed through each data of center machine query statistic, center machine can be automatically according to the time range inquiry different stage table of being inquired about.

The present invention is not limited to aforesaid embodiment.The present invention expands to any new feature or any new combination that discloses in this manual, and the arbitrary new method that discloses or step or any new combination of process.

Claims

1. distributed mass data assemblage method, this method may further comprise the steps:

A, user dispose the harvester link information on center machine;

F, center machine are joined according to each harvester and are converged rule file, and obtaining data are compressed statistics and the result is stored in each memory module;

G, center machine start classification and converge timer, regularly data staging in the memory module are converged.

2. a kind of distributed mass data assemblage method according to claim 1 is characterized in that: described center machine is the leading machine of this method, but can also be the center machine cluster for the separate unit center machine; Described harvester is the center machine of diverse network performance collection equipment; Center machine can be used different data structures for different vendor, different operators, different O﹠M merchant with harvester; Harvester can use different data structures for different vendor, different operators, different O﹠M merchant with harvester.

3. method according to claim 1 and 2 is characterized in that: described steps A user disposes the harvester link information on center machine, link information comprises following content: 1) harvester equipment link information; 2) harvester data obtain manner and call parameters; Obtain manner comprises two kinds: direct-connected mode of database and system interface obtain manner.

4. a kind of distributed mass data assemblage method according to claim 3, it is characterized in that: described step B equipment manager is on center machine, according to each harvester data structure difference, the many groups of configuration converge rule file: converge rule file and mainly describe: data obtain manner method, and how the data that obtain are resolved, how to be stored in center machine; Can dispose a plurality of rule files according to harvester internal data mechanism is one group of rule file.

5. a kind of distributed mass data assemblage method according to claim 4, it is characterized in that: described step C user selects the corresponding rule file that converges for harvester on center machine: harvester can be selected a plurality of rule files in the same group of rule file, same group of rule file can be selected by the harvester of a plurality of same data structure, is used for effectively being reused and being convenient to unified the modification converging rule file.

6. a kind of distributed mass data assemblage method according to claim 5, it is characterized in that: described step D center machine start gather timer according to link information from being dynamically connected harvester: center machine will be called automatically and gather timer and connect harvester, gathering timer will carry out once every day, and the concrete time of implementation is configured by equipment manager.

7. a kind of distributed mass data assemblage method according to claim 6 is characterized in that: described step e center machine is joined according to harvester and is converged rule file and obtain monitor data; Join according to each harvester with the F center machine and to converge rule file, obtaining data are compressed statistics and the result is stored in each memory module: center machine converges the data that rule file is obtained last one day according to this harvester is selected, directly inquires about desired data or call the harvester system interface obtaining this interface return data; Center machine is compressed statistics to the acquisition data by the time, with statistics back data with unified data structure storage to different memory modules.

8. according to a kind of distributed mass data assemblage method described in the claim 7, it is characterized in that: describedly compress statistics:, obtain maximum hourly, mean value, minimum value, total value and maximum and minimum value place time point hour to be that unit compresses statistics to valid data by the time by the time.

9. according to a kind of distributed mass data assemblage method described in the claim 7, it is characterized in that: described step G center machine starts classification and converges timer, regularly data staging in the memory module is converged: timer is converged in classification can converge, realize inquiring about preprocessing function to the data in the memory module by week, 3 ranks in month, year automatically, when the user passed through each data of center machine query statistic, center machine can be automatically according to the time range inquiry different stage table of being inquired about.