CN105760459A

CN105760459A - Distributed data processing system and method

Info

Publication number: CN105760459A
Application number: CN201610081200.3A
Authority: CN
Inventors: 姚敏
Original assignee: Sichuan Justbon Asset Management Group Co Ltd
Current assignee: Sichuan Jiabao life Service Group Co.,Ltd.
Priority date: 2016-02-04
Filing date: 2016-02-04
Publication date: 2016-07-13
Anticipated expiration: 2036-02-04
Also published as: CN105760459B

Abstract

The invention provides a distributed data processing system and method. The system comprises a plurality of single-serial servers and further comprises a data collection module for signal data collection, a data storage module for storing complied data information, a data generation module for performing data structure compiling on the data information, a data mining module for scheduling the data information according to the calculated task requirement and a sending module for sending status of the data information meeting the conditions to a monitoring host. The collected device data are preprocessed, the collected mass data are stored in a distributed mode, and the problems that existing data access modes are independent and no dependency or precedence relationship exists are solved; during data mining, a plurality of data scheduling nodes are processed at the same time, the response speed is high, and the data can be sent to the monitoring host within the shortest time; when data is inserted and inquired at the same time, the system performance is not lowered, and the input cost is lowered.

Description

A kind of distributed data processing system and method

Technical field

The invention mainly relates to processing data information field, be specifically related to a kind of distributed data processing system and method.

Background technology

Conventional solution adopts relevant database, Dan Ku, single table carries out business storage, in technology itself, this mode, support to data is limited, when data are to certain amount, namely the mono-table of mysql is more than 500W bar record, arises that performance sharply declines, data for native system magnanimity, it is impossible to the support of offer；If making many storehouses multilist into, simply solving storage problem, but performance can decline much, the complexity of program can suddenly increase, and the stability of system also can be reduced, and does not reach and produces the condition reached the standard grade；When data base is while a large amount of insertion data, carrying out inquiry simultaneously and only two kinds of performances all can be dragged down, serious meeting directly affects the operation of business, it is impossible to ensure the accurate of data, until system crash.

Summary of the invention

The technical problem to be solved is to provide a kind of distributed data processing system and method, magnanimity is gathered data and carries out distributed access, by multiple single serial servers and distributed data base, data are acquired and storage, to reduce separate unit database purchase reading pressure, being greatly promoted system access speed, during data mining, multiple data dispatch nodes process simultaneously, fast response time, postpone low, monitoring host computer can be sent in the shortest time.

The technical scheme is that a kind of distributed data processing system, including data acquisition module, data generation module, data memory module, data-mining module, sending module and multiple single serial server,

Described data acquisition module, after setting up data cube computation by single serial server with electrical equipment, gathers data message from electrical equipment；Described data acquisition module is provided with collection channel, gathers channel and gathers data message from electrical equipment；

Described data memory module, it is for utilizing distributed data base to be stored by the data message of collection；

Described data generation module, for each data message in distributed data base carrying out classification process according to data attribute, and is compiled each data message sorted according to the data structure set；

Described data-mining module, for the data message after compiling being distributed to the node scheduling specified according to calculating mission requirements, and the screening of real-time calling Business Processing function obtains qualified data message；It is additionally operable to the node scheduling by being sequentially allocated acquiescence in order without the data message calculating mission requirements first-class pending；Node scheduling is provided with multiple, and multiple node schedulings are according to calculating the data message after mission requirements receive compiling respectively；

Described sending module, for being sent to monitoring host computer by the state of qualified data message；

Described single serial server, for carrying out protocol conversion and data transmission between data acquisition module and electrical equipment.

Data acquisition module includes collecting device, current collecting device only supports that RTU485 agreement carries out data transmission, this agreement does not support that the form in the Internet carries out data communication, the effect of single serial server is to make capture program can carry out communication with the mode of TCP/IP with collecting device, single serial server can between do a protocol conversion so that capture program can carry out data acquisition in the form of the Internet；

Due under the premise of consideration introducing distributed storage, the real data of bonding apparatus data acquisition, targetedly a DSB data store block in a data acquisition channel and single serial server is carried out correspondence, although adding data total amount to a certain extent, but such form is suitable for the collection support of all of RS485, it is not necessary to carry out adaptive programming again and calculate respectively；And a project has the much equipment (temperature, voltage, electric current, power, switching value, water logging, humidity etc.) needing monitoring, moment ensures that project can provide human settlements service normally, each equipment turns Ethernet TCP/IP procotol with the frequency of millisecond pole by RS485 serial port protocol, and service end carries out the collection of data by socket listening port.

The invention has the beneficial effects as follows: the device data collected is carried out pretreatment, namely the mass data collected is carried out distributed access, solve that current data acess method is relatively independent, do not rely on problem with precedence relationship from each other；During data mining, multiple data dispatch nodes process simultaneously, fast response time, postpone low, monitoring host computer (can also be management system or web monitor view layer) can be sent in the shortest time, can process rapidly for there are the data calculating mission requirements, node scheduling for being assigned to acquiescence without the data calculating mission requirements is first-class pending, accelerate the speed of data mining, and allocated tuple will not be calculated task preemption by other again, and be improved the calculated performance of calculating task by the mode of minimizing network delay；If the message needing manual intervention and decision-making also meets with a response in the very first time, when inserting data and inquiry data carry out simultaneously, systematic function will not be reduced, and reduce input cost.

On the basis of technique scheme, the present invention can also do following improvement.

Further, also including alarm module, it is for the quantity of data message in Monitoring Data memory module, when the quantity of data message is higher or lower than the max-thresholds of default or minimum threshold, generates equipment alarm Data Concurrent and delivers to monitoring host computer.

Adopt above-mentioned further scheme to provide the benefit that: when data volume is too low or increases, early warning process can be carried out to monitoring host computer, maintain the stability of system data.

Further, also include data package module, it is for being packaged into tuple tuple by the data message after compiling according to its data attribute, by tuple tuple composition stream tuple data stream identical for data attribute, and sends tuple tuple or stream tuple data stream to described data-mining module.

Adopt above-mentioned further scheme to provide the benefit that: to encapsulate data into tuple or tuple data stream, make system processing delay time low, monitoring host computer can be sent in the shortest time.

Further, the data message filtered out is notified to the main finger daemon of Nimbus by described sending module, and the state of the data message in main for Nimbus finger daemon is sent to monitoring host computer.

Adopt above-mentioned further scheme to provide the benefit that: can meet with a response the very first time, information is sent to rapidly monitoring host computer.

Further, described data-mining module adopts the big data processing shelf of Storm streaming to build.

The big data processing shelf of Storm streaming is distributed, real-time stream analytical tool, and data produce continually, and in internal memory, data stream is carried out real-time computational analysis by the big data processing shelf of Storm streaming.

Further, described signal data includes equipment id, signal id, channel number, signal value and timestamp；Described data structure includes Key value and Value value, and described Key value includes equipment id, signal id, channel number and timestamp, and described Value value includes signal value and timestamp.The signal data of all devices collection is all the device status data reporting this equipment current time to stab, and data structure is:

Key: equipment id+ signal id+ channel number+timestamp

Value: signal value

Timestamp.

Adopting above-mentioned further scheme to provide the benefit that: to be similar to the data structure of kv type, the data structure of the complexity in the relatively conventional application of notebook data architecture is fairly simple, it is possible to accelerate the speed that system processes.

This invention address that another technical scheme of above-mentioned technical problem is as follows: a kind of distributed data processing method, comprise the steps:

Step S1: after setting up data cube computation by single serial server with electrical equipment, gathers data message from electrical equipment；

Step S2: utilize distributed data base to be stored by the data message of collection；

Step S3: according to data attribute, each data message in distributed data base carried out classification process, and each data message sorted is compiled according to the data structure set；

Step S4: according to calculating mission requirements, the data message after compiling is distributed to the node scheduling specified, and the screening of real-time calling Business Processing function obtains qualified data message；It is additionally operable to the node scheduling by being sequentially allocated acquiescence in order without the data message calculating mission requirements first-class pending；

Step S5: the state of qualified data message is sent to monitoring host computer.

Further, also include the quantity of Monitoring Data information, when the quantity of data message is higher or lower than the max-thresholds of default or minimum threshold, generates equipment alarm Data Concurrent and deliver to the step of monitoring host computer.

Further, also include the data message after by compiling and be packaged into tuple tuple according to its data attribute, by tuple tuple composition stream tuple data stream identical for data attribute, and tuple tuple or stream tuple data stream are sent to sending the step to the node scheduling specified.

Further, the concrete methods of realizing of step S5 is, is notified to the main finger daemon of Nimbus by the data message filtered out, and the state of the data message in main for Nimbus finger daemon is sent to monitoring host computer.

Further, described signal data includes equipment id, signal id, channel number, signal value and timestamp；Described data structure includes Key value and Value value, and described Key value includes equipment id, signal id, channel number and timestamp, and described Value value includes signal value and timestamp.

Accompanying drawing explanation

Fig. 1 is the module frame chart that the present invention processes system；

Fig. 2 is the method flow diagram of processing method of the present invention.

Detailed description of the invention

Below in conjunction with accompanying drawing, principles of the invention and feature being described, example is served only for explaining the present invention, is not intended to limit the scope of the present invention.

As it is shown in figure 1, a kind of distributed data processing system, including data acquisition module, data generation module, data memory module, data-mining module, sending module and multiple single serial server；

Distributed data base is specially the HBase distributed data base based on Hadoop, HBase be a high reliability, high-performance, towards row, telescopic distributed memory system, utilizing HBase technology can erect large-scale structure storage cluster on cheap PCServer, this database realizing based on column can perfectly solve the data persistence of native system very much.

Described data-mining module, for the data message after compiling being distributed to the node scheduling specified according to calculating mission requirements, and the screening of real-time calling Business Processing function obtains qualified data message；It is additionally operable to the node scheduling by being sequentially allocated acquiescence in order without the data message calculating mission requirements first-class pending；Multiple node schedulings are according to calculating the data message after mission requirements receive compiling respectively；

Multiple single serial server is set, due under the premise of consideration introducing distributed storage, the real data of bonding apparatus data acquisition, targetedly a DSB data store block on a data acquisition channel and collecting device is carried out correspondence, although adding data total amount to a certain extent, but such form is suitable for the collection support of all of RS485, it is not necessary to carry out adaptive programming again and calculate respectively；And a project has the much equipment (temperature, voltage, electric current, power, switching value, water logging, humidity etc.) needing monitoring, moment ensures that project can provide human settlements service normally, each equipment turns Ethernet TCP/IP procotol with the frequency of millisecond pole by RS485 serial port protocol, and service end carries out the collection of data by socket listening port；

Described data-mining module, for the data message in distributed data base being distributed to the node scheduling specified according to calculating mission requirements, and obtains qualified data message according to the screening of execute method real-time calling Business Processing function；It is additionally operable to the node scheduling by being sequentially allocated acquiescence in order without the data message calculating mission requirements first-class pending；Described data-mining module adopts the big data processing shelf of Storm streaming to build, specific practice is on the default node scheduler that the big data processing shelf of Storm streaming provides, achieve the node scheduling device of a smallScheduler, change default node scheduler order-assigned and calculate the strategy of resource, calculating resource is carried out the division of logic level；According to an actual demand calculating task, assign them to the physical computing nodes specified, calculating task without specific demand adopts the Storm scheduling strategy given tacit consent to, be assigned on the physical computing nodes of Storm computing cluster with or without the calculating resource sequence of specific demand, but the calculating resource fallen by smallScheduler distribution, task preemption will not be calculated by other again, and improve the calculated performance of calculating task by reducing the mode of network delay；

Described sending module, for being sent to monitoring host computer by the state of qualified data message.Concrete, the data message filtered out is notified to the main finger daemon of Nimbus, and the state of the data message in main for Nimbus finger daemon is sent to monitoring host computer.

Preferably, also including alarm module, it is for the quantity of data message in Monitoring Data memory module, when the quantity of data message is higher or lower than the max-thresholds of default or minimum threshold, generates equipment alarm Data Concurrent and delivers to monitoring host computer.

Preferably, also include data package module, it is for being packaged into tuple tuple by the data message after compiling according to its data attribute, by tuple tuple composition stream tuple data stream identical for data attribute, and sends tuple tuple or stream tuple data stream to described data-mining module.

Described signal data includes equipment id, signal id, channel number, signal value and timestamp；Described data structure includes Key value and Value value, and described Key value includes equipment id, signal id, channel number and timestamp, and described Value value includes signal value and timestamp.Concrete, the signal data of all devices collection is all the device status data reporting this equipment current time to stab, and data structure is:

Key: equipment id+ signal id+ channel number+timestamp

Value: signal value

Timestamp.

As in figure 2 it is shown, a kind of distributed data processing method, comprise the steps:

The concrete methods of realizing of step S5 is, is notified to the main finger daemon of Nimbus by the data message filtered out, and the state of the data message in main for Nimbus finger daemon is sent to monitoring host computer.

Also include the quantity of Monitoring Data information, when the quantity of data message is higher or lower than the max-thresholds of default or minimum threshold, generates equipment alarm Data Concurrent and deliver to the step of monitoring host computer.

Also include the data message after by compiling and be packaged into tuple tuple according to its data attribute, by tuple tuple composition stream tuple data stream identical for data attribute, and tuple tuple or stream tuple data stream are sent the step to the node scheduling specified.

What realization processed from encapsulation process to excavation concretely comprises the following steps:

Step S001: according to its data attribute, the data message after compiling is packaged into tuple tuple, by tuple tuple composition stream tuple data stream identical for data attribute；

Step S002: according to calculate mission requirements tuple tuple or stream tuple data flow point are issued the node scheduling Bolt specified, it is intended that node scheduling Bolt according to execute method real-time calling Business Processing function screening obtain qualified tuple tuple；

Step S003: the tuple tuple for filtering out notifies to the main finger daemon of Nimbus, and the status information of the tuple tuple in main for Nimbus finger daemon is sent to monitoring host computer, then the tuple tuple filtered out is stored in distributed data base；

Step S004: first-class pending by being sequentially allocated, without the tuple tuple or stream tuple data stream that calculate mission requirements, the node scheduling Bolt given tacit consent in order.

The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all within the spirit and principles in the present invention, any amendment of making, equivalent replacement, improvement etc., should be included within protection scope of the present invention.

Claims

1. a distributed data processing system, it is characterised in that include data acquisition module, data generation module, data memory module, data-mining module, sending module and multiple single serial server,

Described data acquisition module, after setting up data cube computation by single serial server with electrical equipment, gathers data message from electrical equipment；

Described data-mining module, for the data message after compiling being distributed to the node scheduling specified according to calculating mission requirements, and the screening of real-time calling Business Processing function obtains qualified data message；It is additionally operable to the node scheduling by being sequentially allocated acquiescence in order without the data message calculating mission requirements first-class pending；

2. a kind of distributed data processing system according to claim 1, it is characterized in that, also include alarm module, it is for the quantity of data message in Monitoring Data memory module, when the quantity of data message is higher or lower than the max-thresholds of default or minimum threshold, generates equipment alarm Data Concurrent and deliver to monitoring host computer.

3. a kind of distributed data processing system according to claim 1, it is characterized in that, also include data package module, it is for being packaged into tuple tuple by the data message after compiling according to its data attribute, by tuple tuple composition stream tuple data stream identical for data attribute, and tuple tuple or stream tuple data stream are sent to described data-mining module.

4. a kind of distributed data processing system according to claim 1, it is characterized in that, the data message filtered out is notified to the main finger daemon of Nimbus by described sending module, and the state of the data message in main for Nimbus finger daemon is sent to monitoring host computer.

5. a kind of distributed data processing system according to claim 1, it is characterised in that described data-mining module adopts the big data processing shelf of Storm streaming to build.

6. a kind of distributed data processing system according to claim 1, it is characterised in that described signal data includes equipment id, signal id, channel number, signal value and timestamp；Described data structure includes Key value and Value value, and described Key value includes equipment id, signal id, channel number and timestamp, and described Value value includes signal value and timestamp.

7. a distributed data processing method, it is characterised in that comprise the steps:

8. a kind of distributed data processing method according to claim 7, it is characterized in that, also include the quantity of Monitoring Data information, when the quantity of data message is higher or lower than the max-thresholds of default or minimum threshold, generates equipment alarm Data Concurrent and deliver to the step of monitoring host computer.

9. a kind of distributed data processing method according to claim 7, it is characterized in that, also include the data message after by compiling and be packaged into tuple tuple according to its data attribute, by tuple tuple composition stream tuple data stream identical for data attribute, and tuple tuple or stream tuple data stream are sent to sending the step to the node scheduling specified.

10. a kind of distributed data processing method according to claim 7, it is characterized in that, the concrete methods of realizing of step S5 is, is notified to the main finger daemon of Nimbus by the data message filtered out, and the state of the data message in main for Nimbus finger daemon is sent to monitoring host computer.