CN105760459A - Distributed data processing system and method - Google Patents

Distributed data processing system and method Download PDF

Info

Publication number
CN105760459A
CN105760459A CN 201610081200 CN201610081200A CN105760459A CN 105760459 A CN105760459 A CN 105760459A CN 201610081200 CN201610081200 CN 201610081200 CN 201610081200 A CN201610081200 A CN 201610081200A CN 105760459 A CN105760459 A CN 105760459A
Authority
CN
Grant status
Application
Patent type
Prior art keywords
data
information
module
tuple
distributed
Prior art date
Application number
CN 201610081200
Other languages
Chinese (zh)
Inventor
姚敏
Original Assignee
四川嘉宝资产管理集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor ; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor ; File system structures therefor in structured data stores
    • G06F17/30557Details of integrating or interfacing systems involving at least one database management system
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor ; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor ; File system structures therefor in structured data stores
    • G06F17/30575Replication, distribution or synchronisation of data between databases or within a distributed database; Distributed database system architectures therefor

Abstract

The invention provides a distributed data processing system and method. The system comprises a plurality of single-serial servers and further comprises a data collection module for signal data collection, a data storage module for storing complied data information, a data generation module for performing data structure compiling on the data information, a data mining module for scheduling the data information according to the calculated task requirement and a sending module for sending status of the data information meeting the conditions to a monitoring host. The collected device data are preprocessed, the collected mass data are stored in a distributed mode, and the problems that existing data access modes are independent and no dependency or precedence relationship exists are solved; during data mining, a plurality of data scheduling nodes are processed at the same time, the response speed is high, and the data can be sent to the monitoring host within the shortest time; when data is inserted and inquired at the same time, the system performance is not lowered, and the input cost is lowered.

Description

一种分布式数据处理系统及方法 A distributed data processing system and method

技术领域 FIELD

[0001]本发明主要涉及数据信息处理领域,具体涉及一种分布式数据处理系统及方法。 [0001] The present invention generally relates to data processing, and particularly relates to a distributed data processing system and method.

背景技术 Background technique

[0002]传统技术方案采用关系型数据库,单库,单表进行业务存储,在技术本身,这种方式,对数据的支持是有限的,当数据到一定的量,即mysql单表大于500W条记录,就会出现性能急剧下降,对于本系统海量的数据,不能提供的支持;如果改成多库多表,只是解决了存放问题,但性能会下降不少,程序的复杂性会陡然增加,对系统的稳定性也会降低,达不到生产上线的条件;当数据库在大量插入数据的同时,同时进行查询只会把两种性能都拉低,严重的会直接影响业务的运行,不能保证数据的准确,直至系统崩溃。 [0002] The technical solution using conventional relational database, a single database, a single stored service table, in the technology itself, in this way, support for data is limited, when the data to a certain amount, i.e., greater than 500W mysql single table section record, there will be a sharp decline in performance, vast amounts of data to the system, support is not provided; if the library into a multi-multi-table, just to solve the storage problem, but the performance will drop a lot, the complexity of the program will be a sharp increase, the stability of the system will be reduced, not meet the criteria of the production line; when the database at the same time a large number of insert data, but will only be queried both properties are low, serious will directly affect the operation of the business, is not guaranteed accurate data until the system crashes.

发明内容 SUMMARY

[0003]本发明所要解决的技术问题是提供一种分布式数据处理系统及方法,对海量采集数据进行分布式存取,通过多个单串口服务器和分布式数据库对数据进行采集与存储,以降低单台数据库存储读取压力,大大提升系统访问速度,数据挖掘时,多个数据调度节点同时处理,响应速度快、延迟低,能在最短的时间发送到监控主机。 [0003] The present invention solves the technical problem is to provide a distributed data processing system and method for distributed access mass data acquisition, acquisition and storage of data by a plurality of serial server and distributed database to when a single database storage reduced pressure reading, the system greatly enhance the access speed, data mining, data scheduling a plurality of processing nodes simultaneously, fast response, low latency, the host can be sent to the monitor in the shortest time.

[0004]本发明解决上述技术问题的技术方案如下:一种分布式数据处理系统,包括数据采集模块、数据生成模块、数据存储模块、数据挖掘模块、发送模块和多个单串口服务器, [0004] solve the above problems in the aspect of the present invention are as follows: a distributed data processing system comprising a data acquisition module, a data generation module, data storage module, a data mining module, a sending module and a plurality of serial server,

[0005]所述数据采集模块,用于通过单串口服务器与电气设备建立数据连接后,从电气设备上采集数据信息;所述数据采集模块上设有采集信道,采集信道从电气设备上采集数据信息; [0005] The data acquisition module, for establishing a data connection with the electrical device through a single serial port server, collecting data from the electrical device; collecting channel is provided with the data acquisition module, the acquisition channel data acquired from the electrical device information;

[0006]所述数据存储模块,其用于利用分布式数据库将采集的数据信息进行存储; The [0006] data storage module for a distributed database with data collected information is stored;

[0007]所述数据生成模块,用于根据数据属性将分布式数据库中各个数据信息进行分类处理,并将分类后的各个数据信息根据设定的数据结构进行编译; [0007] The data generation module configured to process the classified information distributed database according to the data for each data attribute, and each of the classified data information to compile the data set of the structure;

[0008]所述数据挖掘模块,用于根据计算任务需求将编译后的数据信息分发给指定的节点调度,并实时调用业务处理函数筛选得到符合条件的数据信息;还用于将无计算任务需求的数据信息按顺序依次分配到默认的节点调度上等待处理;节点调度设有多个,多个节点调度根据计算任务需求分别接收编译后的数据信息; [0008] The data mining module for computing tasks demand information compiled data to the designated node scheduling and real-time data service call handler screened qualified; computing tasks without further demand for sequentially assigned data information in order to schedule a pending default node; node with scheduling information of a plurality of data, a plurality of compiler nodes receive scheduling tasks according to the calculation requirements;

[0009]所述发送模块,用于将符合条件的数据信息的状态发送到监控主机; [0009] The transmitting module, for the qualified data state information to the monitoring host;

[0010]所述单串口服务器,用于在数据采集模块与电气设备之间进行协议转换和数据传输。 [0010] The single-port server, and protocol conversion for data transmission between the data acquisition module and the electrical device.

[0011 ]数据采集模块包括采集设备,目前采集设备只支持RTU485协议进行数据传递,这种协议不支持在互联网的形式进行数据通信,单串口服务器的作用是使采集程序可以同TCP/IP的方式同采集设备进行通讯,单串口服务器会在之间做一个协议转换,从而使采集程序可以在互联网的形式进行数据采集; [0011] Data acquisition module comprises a collection device, the current acquisition device supports only RTU485 data transfer protocol, such protocols do not support data communications in the form of the Internet, the role of the server is to enable a single serial acquisition program may be the same TCP / IP way communication with the collection device, the server making a single serial between a protocol conversion, so that the program can be collected in the data collection form of the Internet;

[0012]由于考虑引入分布式存储的前提下,结合设备数据采集的实际数据,针对性的将一个数据采集信道和单串口服务器上的一个数据存储块进行对应,虽然在一定程度上增加了数据总量,但这样的格式适合于所有的RS485的采集支持,不需要再进行适配编程与分别计算;而一个项目中有许多需要监控的设备(温度、电压、电流、功率、开关量、水浸、湿度等),时刻保障项目能正常的提供人居服务,各个设备以毫秒极的频率通过RS485串口协议转以太网TCP/IP网络协议,服务端通过socket监听端口进行数据的收集。 [0012] In consideration of the introduction of distributed storage premise, the actual data of the data acquisition devices, targeted to a data storage block on a data channel and a single collection port corresponding to the server, although the increase of data to a certain extent total, but this format is adapted to support the acquisition of all RS485, no further programming and adaptation are calculated; a project and there are many devices need to be monitored (temperature, voltage, current, power, switch, water immersion, humidity, etc.), always protect the project will provide normal living services, individual devices in milliseconds pole frequency protocol via RS485 serial port to Ethernet TCP / IP network protocol, the server through socket listening port for data collection.

[0013]本发明的有益效果是:对采集到的设备数据进行预处理,即对采集到的海量数据进行分布式存取,解决了目前数据存取方式相对独立、互相之间没有依赖与先后关系的问题;数据挖掘时,多个数据调度节点同时处理,响应速度快、延迟低,能在最短的时间发送到监控主机(还可以为管理系统或网页监控视图层),对于存在计算任务需求的数据能够迅速处理,对于无计算任务需求的数据分配到默认的节点调度上等待处理,加快了数据挖掘的速度,而已分配的元组不会再被其他计算任务抢占,并通过减少网络延迟的方式提高计算任务的计算性能;如果需要人工干预和决策的消息也在第一时间得到响应,当插入数据和查询数据同时进行时,不会降低系统性能,且降低了投入成本。 [0013] Advantageous effects of the present invention are: the device data collected pretreatment, i.e., the collected mass data distributed access, solves the data access independent manner, and has no dependencies between each other relationship problems; data mining nodes simultaneously process a plurality of data scheduling, fast response, low latency, the host can be sent to the monitoring (management system may also be a layer or web monitor view) in the shortest time, demand exists for computational tasks the data can be quickly processed, without computing tasks for the assigned data needs to schedule the default node waiting processing, accelerates the speed of data mining, it is assigned a tuple no longer be preempted other computing tasks, and by reducing the network delay way improve computing performance computing task; If the message requires manual intervention and decision-making also the first time response obtained, when inserting data and query data at the same time, without reducing the system performance and reduces the investment costs.

[0014]在上述技术方案的基础上,本发明还可以做如下改进。 [0014] Based on the foregoing technical solution, the present invention may be modified as follows.

[0015]进一步,还包括告警模块,其用于监测数据存储模块内数据信息的数量,当数据信息的数量高于或低于系统设定的最大阈值或最小阈值时,生成设备告警数据并发送到监控主机。 [0015] Further, further includes an alarm module that monitors the number of data within the data storage module for, when the number of data information system above or below a set maximum or minimum threshold value, an alarm generating device and data transmission to monitor the host.

[0016]采用上述进一步方案的有益效果是:当数据量过低或增大时,可向监控主机进行预警处理,维持系统数据的稳定性。 [0016] A further embodiment of the above-described beneficial effects: low or when the amount of data increases, warning processing may be performed to monitor the host, to maintain the stability of the system data.

[0017]进一步,还包括数据封装模块,其用于将编译后的数据信息根据其数据属性封装成tupIe元组,将数据属性相同的tupIe元组组成stream元组数据流,并将tupIe元组或stream元组数据流发送至所述数据挖掘模块。 [0017] Further, the package further comprising a data module, the data for the information compiled tupIe packaged into tuples according to their data attributes, the same attributes tupIe tuple data stream tuple data stream, and tupIe tuple tuple data stream or streams to the data mining module.

[0018]采用上述进一步方案的有益效果是:将数据封装成元组或元组数据流,使系统处理延迟时间低,能在最短的时间发送到监控主机。 [0018] A further embodiment of the above-described beneficial effects: the data package into a tuple or tuple data stream, low processing delay time of the system, can be sent to the monitoring host in the shortest time.

[0019]进一步,所述发送模块中将筛选出的数据信息通知至Nimbus主守护进程中,并将N imbus主守护进程中的数据信息的状态发送到监控主机。 [0019] Further, the transmitting module will filter out the data notification to the master daemon Nimbus, and transmits the status information data N imbus master daemon to monitor the host.

[0020]采用上述进一步方案的有益效果是:能够第一时间得到响应,将信息迅速发送给监控主机。 [0020] A further embodiment of the above-described beneficial effects: a first time can be obtained in response, to send information to the monitoring host quickly.

[0021]进一步,所述数据挖掘模块采用Storm流式大数据处理框架进行搭建。 [0021] Further, using the data mining module Storm processing large data stream framework structures.

[0022] Storm流式大数据处理框架是分布式的、实时数据流分析工具,数据源源不断产生,在内存中,Storm流式大数据处理框架对数据流进行实时的计算分析。 [0022] Storm frame processing large data stream is distributed, real-time data flow analysis tool, generates a steady stream of data in memory, Storm streaming large data processing frame data stream in real time computational analysis.

[0023]进一步,所述信号数据包括设备id、信号id、信道号、信号值和时间戳;所述数据结构包括Key值和Value值,所述Key值包括设备id、信号id、信道号和时间戳,所述Value值包括信号值和时间戳。 [0023] Further, the apparatus comprises a data signal id, signal id, channel number, and time stamp signal value; said data structure comprising a value and a Key Value values, said apparatus comprising a Key value id, signal id, channel number, and stamp, comprising a value is the signal value and a timestamp. 所有设备采集的信号数据都是上报该设备当前时间戳的设备状态数据,数据结构为: All signal data acquisition devices are reporting device status of the device current time stamp data, data structure:

[0024] Key:设备id+信号id+信道号+时间戳 [0024] Key: device id + id + channel signal time stamp number +

[0025] Value:信号值 [0025] Value: signal value

[0026] 时间戳。 [0026] timestamp.

[0027]采用上述进一步方案的有益效果是:类似于kv型的数据结构,本数据结构格式相对传统应用中的复杂的数据结构比较简单,能够加快系统处理的速度。 [0027] A further embodiment of the above-described beneficial effects: kv similar type data structure, the data structure format of this relatively complex structure of a conventional data application is relatively simple, the system can accelerate the process.

[0028]本发明解决上述技术问题的另一技术方案如下:一种分布式数据处理方法,包括如下步骤: [0028] Another aspect of the present invention to solve the aforementioned problems are as follows: a distributed data processing method, comprising the steps of:

[0029]步骤S1:通过单串口服务器与电气设备建立数据连接后,从电气设备上采集数据信息; [0029] Step S1: After establishing a data connection with a single electrical device via the serial port server, collecting data from the electrical device;

[0030]步骤S2:利用分布式数据库将采集的数据信息进行存储; [0030] Step S2: The information data using a distributed database to store collected;

[0031 ]步骤S3:根据数据属性将分布式数据库中各个数据信息进行分类处理,并将分类后的各个数据信息根据设定的数据结构进行编译; [0031] Step S3: The classification process distributed database according to the data for each data attribute information, and each of the classified data information to compile the data set of the structure;

[0032]步骤S4:根据计算任务需求将编译后的数据信息分发给指定的节点调度,并实时调用业务处理函数筛选得到符合条件的数据信息;还用于将无计算任务需求的数据信息按顺序依次分配到默认的节点调度上等待处理; [0032] Step S4: The demand computing tasks compiled data to the designated node schedules, and real-time call screening service processing function information data obtained qualified; computing tasks without further demand for data information sequentially sequentially allocated to waiting on the default node schedules;

[0033]步骤S5:将符合条件的数据信息的状态发送到监控主机。 [0033] Step S5: The qualified data state information to the monitoring host.

[0034]在上述技术方案的基础上,本发明还可以做如下改进。 [0034] Based on the foregoing technical solution, the present invention may be modified as follows.

[0035]进一步,还包括监测数据信息的数量,当数据信息的数量高于或低于系统设定的最大阈值或最小阈值时,生成设备告警数据并发送到监控主机的步骤。 [0035] Further, further comprising monitoring the number of data information, when the number of data information system above or below a set maximum or minimum threshold value, the step of monitoring the host device generates alarm data sent.

[0036]进一步,还包括将编译后的数据信息根据其数据属性封装成tuple元组,将数据属性相同的tuple元组组成stream元组数据流,并将tuple元组或stream元组数据流发送至发送至指定的节点调度的步骤。 [0036] Further, the data further includes information compiled into a package in accordance with its data tuple attribute tuples, the tuple same tuple data stream attributes tuple data stream, and the stream tuple or tuple data stream tuple to step to the transmission schedule specified node.

[0037]进一步,步骤S5的具体实现方法为,将筛选出的数据信息通知至Nimbus主守护进程中,并将Nimbus主守护进程中的数据信息的状态发送到监控主机。 [0037] Further, the specific method in the step S5, the selected information notification data to the master daemon Nimbus, and the status data of the process is sent to the master daemon Nimbus monitoring host.

[0038]进一步,所述信号数据包括设备id、信号id、信道号、信号值和时间戳;所述数据结构包括Key值和Value值,所述Key值包括设备id、信号id、信道号和时间戳,所述Value值包括信号值和时间戳。 [0038] Further, the apparatus comprises a data signal id, signal id, channel number, and time stamp signal value; said data structure comprising a value and a Key Value values, said apparatus comprising a Key value id, signal id, channel number, and stamp, comprising a value is the signal value and a timestamp.

附图说明 BRIEF DESCRIPTION

[0039]图1为本发明处理系统的模块框图; [0039] FIG 1 block diagram of a processing system of the present invention;

[0040]图2为本发明处理方法的方法流程图。 Method [0040] FIG 2 is a flowchart of the processing method of the present invention.

具体实施方式 Detailed ways

[0041]以下结合附图对本发明的原理和特征进行描述,所举实例只用于解释本发明,并非用于限定本发明的范围。 [0041] The following drawings in conjunction with the principles and features of this invention will be described, The examples are only for explaining the present invention and are not intended to limit the scope of the invention.

[0042]如图1所示,一种分布式数据处理系统,包括数据采集模块、数据生成模块、数据存储模块、数据挖掘模块、发送模块和多个单串口服务器; [0042] As shown in FIG. 1, a distributed data processing system comprising a data acquisition module, a data generation module, data storage module, a data mining module, a sending module and a plurality of serial server;

[0043]所述数据采集模块,用于通过单串口服务器与电气设备建立数据连接后,从电气设备上采集数据信息;所述数据采集模块上设有采集信道,采集信道从电气设备上采集数据信息; [0043] The data acquisition module, for establishing a data connection with the electrical device through a single serial port server, collecting data from the electrical device; collecting channel is provided with the data acquisition module, the acquisition channel data acquired from the electrical device information;

[0044]所述数据存储模块,其用于利用分布式数据库将采集的数据信息进行存储; [0044] The data storage module, information for the distributed database using data collected will be stored;

[0045] 分布式数据库具体为基于Hadoop的HBase分布式数据库,HBase是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统,利用HBase技术可在廉价PC Server上搭建起大规模结构化存储集群,这种基于列式的数据库实现能很完美的解决本系统的数据持久化。 [0045] Specifically distributed database HBase Hadoop-based distributed database, HBase is a high-reliability, high performance, column-oriented, scalable, distributed storage system, using technology HBase erected on a large scale in an inexpensive PC Server structured storage cluster, this column-based database can achieve perfect solution to the system data persistence.

[0046]所述数据生成模块,用于根据数据属性将分布式数据库中各个数据信息进行分类处理,并将分类后的各个数据信息根据设定的数据结构进行编译; [0046] The data generation module configured to process the classified information distributed database according to the data for each data attribute, and each of the classified data information to compile the data set of the structure;

[0047]所述数据挖掘模块,用于根据计算任务需求将编译后的数据信息分发给指定的节点调度,并实时调用业务处理函数筛选得到符合条件的数据信息;还用于将无计算任务需求的数据信息按顺序依次分配到默认的节点调度上等待处理;多个节点调度根据计算任务需求分别接收编译后的数据信息; [0047] The data mining module for computing tasks demand information compiled data to the designated node scheduling and real-time data service call handler screened qualified; computing tasks without further demand for sequentially assigned data information in order to schedule a pending default node; a plurality of nodes receives scheduling information compiled data are calculated according to mission requirements;

[0048]所述发送模块,用于将符合条件的数据信息的状态发送到监控主机; [0048] The transmitting module, for the qualified data state information to the monitoring host;

[0049]所述单串口服务器,用于在数据采集模块与电气设备之间进行协议转换和数据传输。 [0049] The single-port server, and protocol conversion for data transmission between the data acquisition module and the electrical device.

[0050]数据采集模块包括采集设备,目前采集设备只支持RTU485协议进行数据传递,这种协议不支持在互联网的形式进行数据通信,单串口服务器的作用是使采集程序可以同TCP/IP的方式同采集设备进行通讯,单串口服务器会在之间做一个协议转换,从而使采集程序可以在互联网的形式进行数据采集; [0050] The data acquisition module comprises a collection device, the current acquisition device supports only RTU485 data transfer protocol, such protocols do not support data communications in the form of the Internet, the role of the server is to enable a single serial acquisition program may be the same TCP / IP way communication with the collection device, the server making a single serial between a protocol conversion, so that the program can be collected in the data collection form of the Internet;

[0051 ]设置多个单串口服务器,由于考虑引入分布式存储的前提下,结合设备数据采集的实际数据,针对性的将一个数据采集信道和采集设备上的一个数据存储块进行对应,虽然在一定程度上增加了数据总量,但这样的格式适合于所有的RS485的采集支持,不需要再进行适配编程与分别计算;而一个项目中有许多需要监控的设备(温度、电压、电流、功率、开关量、水浸、湿度等),时刻保障项目能正常的提供人居服务,各个设备以毫秒极的频率通过RS485串口协议转以太网TCP/IP网络协议,服务端通过socket监听端口进行数据的收集; [0051] a plurality of serial single server, considering the introduction of distributed storage premise, the actual data collection devices combined data, targeted to a data storage block on a data channel acquisition and collection devices correspond, although It increases the amount of data to some extent, but this format is adapted to collect all the support RS485, no further programmed and adapted to calculate; a project and there are many devices (temperature, voltage, current needs to be monitored, power switch, water, humidity, etc.), time to provide security projects normal living services, each device in milliseconds pole frequency by listening socket port through the RS485 serial protocol to Ethernet TCP / IP network protocol, the server data collection;

[0052]所述数据挖掘模块,用于根据计算任务需求将分布式数据库内的数据信息分发给指定的节点调度,并根据execute方法实时调用业务处理函数筛选得到符合条件的数据信息;还用于将无计算任务需求的数据信息按顺序依次分配到默认的节点调度上等待处理;所述数据挖掘模块采用Storm流式大数据处理框架进行搭建,具体做法是在Storm流式大数据处理框架提供的默认节点调度器之上,实现了一个smallScheduler的节点调度器,改变了默认节点调度器顺序分配计算资源的策略,对计算资源进行逻辑层面的划分;根据一个计算任务的实际需求,将其分配给指定的物理计算节点,无特殊需求的计算任务采用Storm默认的调度策略,把所有无特殊需求的计算资源顺序分配到Storm计算集群的物理计算节点上,但是已经被smalIScheduler分配掉的计算资源,不会再被其他计算任务 [0052] The data mining module for computing tasks demand data within a distributed database to the designated node schedules, data traffic and call processing functions in real time screened Matching method according to execute; for further the data needs no computational tasks sequentially allocated in order to schedule a pending default node; Storm using the data mining module processing large data stream framework structures, which would be in the Storm processing large data stream provided by the framework over the default node scheduler implements a smallScheduler scheduler node, the node changes the default policy scheduler sequentially assigned computing resources, computing resources divided logical level; according to the actual needs of a computing task, assign it to specified physical compute nodes, no special demand computing tasks using Storm default scheduling policy, the allocation of computing resources order of all no special requirements to the physical compute nodes Storm computing clusters, but has been allocated out of smalIScheduler computing resources, not It will no longer be other computing tasks 占,并通过减少网络延迟的方式提高计算任务的计算性能; Accounting, and improve computing performance by reducing network computing tasks delayed manner;

[0053]所述发送模块,用于将符合条件的数据信息的状态发送到监控主机。 [0053] The transmitting module, for the qualified data state information to the monitoring host. 具体的,将筛选出的数据信息通知至Nimbus主守护进程中,并将Nimbus主守护进程中的数据信息的状态发送到监控主机。 Specifically, the selected data to the information notification Nimbus master daemon, and the status data of the process is sent to the master daemon Nimbus monitoring host.

[0054]优选地,还包括告警模块,其用于监测数据存储模块内数据信息的数量,当数据信息的数量高于或低于系统设定的最大阈值或最小阈值时,生成设备告警数据并发送到监控主机。 [0054] Preferably, the alarm further comprises a module for monitoring the number of data within the data storage module information, when the number of data information system above or below a set maximum or minimum threshold value, and alarm data generating device sent to the monitoring host.

[0055]优选地,还包括数据封装模块,其用于将编译后的数据信息根据其数据属性封装成tupIe元组,将数据属性相同的tupIe元组组成stream元组数据流,并将tupIe元组或stream元组数据流发送至所述数据挖掘模块。 [0055] Preferably, the package further comprising a data module, the data for the information compiled tupIe packaged into tuples according to their data attributes, the same attributes tupIe tuple data stream tuple data stream, and metadata tupIe group or stream tuple data stream to the data mining module.

[0056]所述信号数据包括设备id、信号id、信道号、信号值和时间戳;所述数据结构包括Key值和Value值,所述Key值包括设备id、信号id、信道号和时间戳,所述Value值包括信号值和时间戳。 [0056] The apparatus comprises a data signal id, signal id, channel number, and time stamp signal value; said data structure comprising a value and a Key Value values, said apparatus comprising a Key value id, signal id, channel number and time stamp the value is the timestamp value comprises a signal. 具体的,所有设备采集的信号数据都是上报该设备当前时间戳的设备状态数据,数据结构为: Specifically, all the signal data acquisition devices are reporting device status of the device current time stamp data, data structure:

[0057] Key:设备id+信号id+信道号+时间戳 [0057] Key: device id + id + channel signal time stamp number +

[0058] Value:信号值 [0058] Value: signal value

[0059] 时间戳。 [0059] timestamp.

[0060] 如图2所示,一种分布式数据处理方法,包括如下步骤: [0060] As shown in a distributed data processing method in FIG. 2, comprising the steps of:

[0061]步骤S1:通过单串口服务器与电气设备建立数据连接后,从电气设备上采集数据信息; [0061] Step S1: After establishing a data connection with a single electrical device via the serial port server, collecting data from the electrical device;

[0062]步骤S2:利用分布式数据库将采集的数据信息进行存储; [0062] Step S2: The information data using a distributed database to store collected;

[0063]步骤S3:根据数据属性将分布式数据库中各个数据信息进行分类处理,并将分类后的各个数据信息根据设定的数据结构进行编译; [0063] Step S3: The classification process distributed database according to the data for each data attribute information, and each of the classified data information to compile the data set of the structure;

[0064]步骤S4:根据计算任务需求将编译后的数据信息分发给指定的节点调度,并实时调用业务处理函数筛选得到符合条件的数据信息;还用于将无计算任务需求的数据信息按顺序依次分配到默认的节点调度上等待处理; [0064] Step S4: The demand computing tasks compiled data to the designated node schedules, and real-time call screening service processing function information data obtained qualified; computing tasks without further demand for data information sequentially sequentially allocated to waiting on the default node schedules;

[0065]步骤S5:将符合条件的数据信息的状态发送到监控主机。 [0065] Step S5: The qualified data state information to the monitoring host.

[0066]步骤S5的具体实现方法为,将筛选出的数据信息通知至Nimbus主守护进程中,并将Nimbus主守护进程中的数据信息的状态发送到监控主机。 Specific method [0066] in the step S5, the selected information notification data to the master daemon Nimbus, and the status data of the process is sent to the master daemon Nimbus monitoring host.

[0067]还包括监测数据信息的数量,当数据信息的数量高于或低于系统设定的最大阈值或最小阈值时,生成设备告警数据并发送到监控主机的步骤。 [0067] further comprises monitoring the number of data information, when the number of data information system above or below a set maximum or minimum threshold value, the step of generating an alarm device and send data to the host monitor.

[0068]还包括将编译后的数据信息根据其数据属性封装成tuple元组,将数据属性相同的tuple元组组成stream元组数据流,并将tuple元组或stream元组数据流发送至指定的节点调度的步骤。 [0068] further includes information compiled data package according to its data attributes into a tuple of tuples, the tuple same tuple data stream attributes tuple data stream, and the stream tuple tuple or tuple data stream to the specified the step of scheduling nodes.

[0069]实现从封装处理到挖掘处理的具体步骤为: [0069] The encapsulation processing to implement specific steps from excavation process is:

[0070]步骤SOOl:将编译后的数据信息根据其数据属性封装成tuple元组,将数据属性相同的tuple元组组成stream元组数据流; [0070] Step SOOl: the data information is compiled into a package in accordance with its data tuple attribute tuples, the tuple data attribute tuples same stream tuple data stream;

[007Ί]步骤S002:根据计算任务需求将tuple元组或stream元组数据流分发给指定的节点调度Bo 11,指定的节点调度Bo 11根据exe cut e方法实时调用业务处理函数筛选得到符合条件的tup Ie元组; [007Ί] Step S002: The computing task demand tuple tuple or stream tuple data stream to the designated node schedules Bo 11, the specified node schedules Bo 11 real-time call service processing function based exe cut e method screened Eligible tup Ie tuples;

[0072] 步骤S003:用于将筛选出的tuple元组通知至Nimbus主守护进程中,并将Nimbus主守护进程中的tuple元组的状态信息发送到监控主机,再将筛选出的tuple元组存储在分布式数据库中; [0072] Step S003: The notification screened for tuple to a tuple Nimbus master daemon, and transmits the state information tuple of the tuple to the monitoring process a host Nimbus master daemon, then screened tuple tuple stored in a distributed database;

[0073] 步骤S004:将无计算任务需求的tuple元组或stream元组数据流按顺序依次分配到默认的节点调度Bolt上等待处理。 [0073] Step S004: the tuple or tuple computational task needs no stream tuple data stream default node according to the order assigned to the scheduled pending the Bolt.

[0074]以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。 [0074] The foregoing is only preferred embodiments of the present invention, not intended to limit the present invention within the spirit and principle of the present invention, any modification, equivalent replacement, or improvement, it should be included in the present within the scope of the invention.

Claims (10)

  1. 1.一种分布式数据处理系统,其特征在于,包括数据采集模块、数据生成模块、数据存储模块、数据挖掘模块、发送模块和多个单串口服务器, 所述数据采集模块,用于通过单串口服务器与电气设备建立数据连接后,从电气设备上采集数据信息; 所述数据存储模块,其用于利用分布式数据库将采集的数据信息进行存储; 所述数据生成模块,用于根据数据属性将分布式数据库中各个数据信息进行分类处理,并将分类后的各个数据信息根据设定的数据结构进行编译; 所述数据挖掘模块,用于根据计算任务需求将编译后的数据信息分发给指定的节点调度,并实时调用业务处理函数筛选得到符合条件的数据信息;还用于将无计算任务需求的数据信息按顺序依次分配到默认的节点调度上等待处理; 所述发送模块,用于将符合条件的数据信息的状态发送到监 A distributed data processing system comprising a data acquisition module, a data generation module, data storage module, a data mining module, a sending module and a plurality of serial server, the data acquisition module, for a single after the server establishes a data connection with serial electrical device, the electrical device from the collected data; said data storage module, data information for use of distributed databases will store collected; the data generating module for data attributes the distributed database information for each data classification, and each of the classified data information to compile the data set of the structure; the data mining module for computing tasks demand information compiled data to the designated the node scheduling and real-time call screening service processing function information data obtained qualified; no data information is also used to calculate the needs of the mission according to the order assigned to the scheduled pending default node; said transmitting module, configured to qualified data state information to the monitoring 主机; 所述单串口服务器,用于在数据采集模块与电气设备之间进行协议转换和数据传输。 Host; the single-port server, the protocol used between the data acquisition module and converts the electrical data transmission devices.
  2. 2.根据权利要求1所述一种分布式数据处理系统,其特征在于,还包括告警模块,其用于监测数据存储模块内数据信息的数量,当数据信息的数量高于或低于系统设定的最大阈值或最小阈值时,生成设备告警数据并发送到监控主机。 2. The A distributed data processing system according to claim, characterized in that, further includes an alarm module that monitors the number of data within the data storage module for, when the number of data information system disposed above or below when a predetermined maximum threshold or the minimum threshold, the alarm generating device and send data to the host monitor.
  3. 3.根据权利要求1所述一种分布式数据处理系统,其特征在于,还包括数据封装模块,其用于将编译后的数据信息根据其数据属性封装成tuple元组,将数据属性相同的tuple元组组成stream元组数据流,并将tuple元组或stream元组数据流发送至所述数据挖掘模块。 3. The A distributed data processing system of claim wherein the package further comprising a data module, the data for the tuple into tuple compiled packaged according to their data attributes, the same attribute data tuple of tuples stream tuple data stream, and the stream tuple tuple or tuple data stream to the data mining module.
  4. 4.根据权利要求1所述一种分布式数据处理系统,其特征在于,所述发送模块中将筛选出的数据信息通知至Nimbus主守护进程中,并将Nimbus主守护进程中的数据信息的状态发送到监控主机。 A according to the distributed data processing system as claimed in claim, wherein the transmitting module will filter out the data notification to the master daemon Nimbus and Nimbus master daemon process data of the state sent to the monitoring host.
  5. 5.根据权利要求1所述一种分布式数据处理系统,其特征在于,所述数据挖掘模块采用Storm流式大数据处理框架进行搭建。 5. The A distributed data processing system according to claim, characterized in that, using the data mining module Storm processing large data stream framework structures.
  6. 6.根据权利要求1所述一种分布式数据处理系统,其特征在于,所述信号数据包括设备id、信号id、信道号、信号值和时间戳;所述数据结构包括Key值和Value值,所述Key值包括设备id、信号id、信道号和时间戳,所述Value值包括信号值和时间戳。 6. The A distributed data processing system according to claim, wherein said data signal comprises a device id, signal id, channel number, and time stamp signal value; said data structure comprising a value and a Key Value Value the apparatus comprises a Key value id, signal id, channel number and time stamp, said signal comprising a value is a value and a timestamp.
  7. 7.一种分布式数据处理方法,其特征在于,包括如下步骤: 步骤S1:通过单串口服务器与电气设备建立数据连接后,从电气设备上采集数据信息; 步骤S2:利用分布式数据库将采集的数据信息进行存储; 步骤S3:根据数据属性将分布式数据库中各个数据信息进行分类处理,并将分类后的各个数据信息根据设定的数据结构进行编译; 步骤S4:根据计算任务需求将编译后的数据信息分发给指定的节点调度,并实时调用业务处理函数筛选得到符合条件的数据信息;还用于将无计算任务需求的数据信息按顺序依次分配到默认的节点调度上等待处理; 步骤S5:将符合条件的数据信息的状态发送到监控主机。 A distributed data processing method characterized by comprising the following steps: Step S1: establishing a data connection, the data collection information from the electrical device and electrical device via a single serial server; Step S2: collected using a distributed database storing data information; step S3: the distributed database according to the data attribute information for each data classification, and each of the classified data information to compile the data set of the structure; step S4: the demand computing tasks compiled after the data information to the designated node schedules, and real-time call screening service processing function information data obtained qualified; no data information is also used to calculate the needs of the mission according to the order assigned to the waiting process on the default node schedules; step S5: the state qualified data information to the monitoring host.
  8. 8.根据权利要求7所述一种分布式数据处理方法,其特征在于,还包括监测数据信息的数量,当数据信息的数量高于或低于系统设定的最大阈值或最小阈值时,生成设备告警数据并发送到监控主机的步骤。 7 8. A distributed data processing method according to claim, wherein further comprising monitoring the number of data information, when the number of data information system above or below a set maximum threshold or the minimum threshold, generating step device alarm data sent monitoring host.
  9. 9.根据权利要求7所述一种分布式数据处理方法,其特征在于,还包括将编译后的数据信息根据其数据属性封装成tuple元组,将数据属性相同的tuple元组组成stream元组数据流,并将tup I e元组或s tr earn元组数据流发送至发送至指定的节点调度的步骤。 7 9. A distributed data processing method according to claim, wherein further comprising data information compiled tuple into tuple packaged according to their data attributes, the same attribute data tuple of tuples stream tuple data streams and tup I e s tr earn tuple or tuple data streams to be sent to the step of scheduling the specified node.
  10. 10.根据权利要求7所述一种分布式数据处理方法,其特征在于,步骤S5的具体实现方法为,将筛选出的数据信息通知至Nimbus主守护进程中,并将Nimbus主守护进程中的数据信息的状态发送到监控主机。 7 10. A distributed data processing method according to claim, characterized in that the specific method in the step S5, the selected information notification data to the master daemon Nimbus, and the master daemon Nimbus status information to the data monitoring host.
CN 201610081200 2016-02-04 2016-02-04 Distributed data processing system and method CN105760459A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201610081200 CN105760459A (en) 2016-02-04 2016-02-04 Distributed data processing system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201610081200 CN105760459A (en) 2016-02-04 2016-02-04 Distributed data processing system and method

Publications (1)

Publication Number Publication Date
CN105760459A true true CN105760459A (en) 2016-07-13

Family

ID=56329928

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201610081200 CN105760459A (en) 2016-02-04 2016-02-04 Distributed data processing system and method

Country Status (1)

Country Link
CN (1) CN105760459A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0459774B1 (en) * 1990-05-30 2001-09-12 Fujitsu Limited File access system in distributed data processing system
CN102916844A (en) * 2012-11-22 2013-02-06 南京恩瑞特实业有限公司 Mass data fusion and real-time monitoring system
CN103685442A (en) * 2012-08-09 2014-03-26 洛克威尔自动控制技术股份有限公司 Remote industrial monitoring using a cloud infrastructure
CN105159240A (en) * 2015-07-23 2015-12-16 上海极熵数据科技有限公司 Job scheduling system of automatic industrial apparatuses

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0459774B1 (en) * 1990-05-30 2001-09-12 Fujitsu Limited File access system in distributed data processing system
CN103685442A (en) * 2012-08-09 2014-03-26 洛克威尔自动控制技术股份有限公司 Remote industrial monitoring using a cloud infrastructure
CN102916844A (en) * 2012-11-22 2013-02-06 南京恩瑞特实业有限公司 Mass data fusion and real-time monitoring system
CN105159240A (en) * 2015-07-23 2015-12-16 上海极熵数据科技有限公司 Job scheduling system of automatic industrial apparatuses

Similar Documents

Publication Publication Date Title
Chung et al. NS by Example
US20030009443A1 (en) Generic data aggregation
US20030120764A1 (en) Real-time monitoring of services through aggregation view
US20120110042A1 (en) Database insertions in a stream database environment
US20130290969A1 (en) Operator graph changes in response to dynamic connections in stream computing applications
US20110085461A1 (en) Flexible network measurement
Hu et al. Optimized scheduling for data aggregation in wireless sensor networks
Rabkin et al. Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area.
US8732300B2 (en) Application monitoring in a stream database environment
US20060294221A1 (en) System for programmatically controlling measurements in monitoring sources
CN102521044A (en) Distributed task scheduling method and system based on messaging middleware
Al-Shaer et al. Hifi: A new monitoring architecture for distributed systems management
CN101051962A (en) Expandable dynamic network monitor system and its monitor method
CN103024060A (en) Open type cloud computing monitoring system for large scale cluster and method thereof
CN103139251A (en) Method of city-level data sharing exchange platform technology
CN102647452A (en) Self-adaptation resource monitoring system and method based on large-scale cloud computing platform
US20140095506A1 (en) Compile-time grouping of tuples in a streaming application
US20140122559A1 (en) Runtime grouping of tuples in a streaming application
CN101262367A (en) Collection method and device for performance data
Yau et al. Toward development of adaptive service-based software systems
CN103401934A (en) Method and system for acquiring log data
US20140278337A1 (en) Selecting an operator graph configuration for a stream-based computing application
Lohrmann et al. Elastic stream processing with latency guarantees
CN103019853A (en) Method and device for dispatching job task
US20090141638A1 (en) Method for partitioning network flows based on their time information

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination