CN108241661A - A kind of distributed traffic analysis method - Google Patents
A kind of distributed traffic analysis method Download PDFInfo
- Publication number
- CN108241661A CN108241661A CN201611213281.4A CN201611213281A CN108241661A CN 108241661 A CN108241661 A CN 108241661A CN 201611213281 A CN201611213281 A CN 201611213281A CN 108241661 A CN108241661 A CN 108241661A
- Authority
- CN
- China
- Prior art keywords
- sub
- data file
- real
- time
- stream
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The present invention provides a kind of distributed traffic analysis method, data flow is subjected to secondary sequential and is divided into sub- real-time stream, sub- Recent data file and sub- history data file, it distributes above-mentioned sub- real-time stream, sub- Recent data file and sub- history data file respectively accordingly according to the processing capability in real time of calculate node, the fine granularity management of data flow is realized according to the processing capability in real time of calculate node.
Description
【Technical field】
The invention belongs to Data Stream Processing field more particularly to distributed data method for stream processing.
【Background technology】
The current state of system can be held and carry out real-time response, mesh by carrying out analysis in real time to data stream in the prior art
It is preceding distributed stream process in real time is carried out to massive logs data flow mode be:Real-time reception batch of data stream first, and should
Data flow is issued to multiple processing units in a manner that distribution is distributed, and each processing unit receives the one or more of distribution
Data flow is simultaneously analyzed and processed in real time, and the data flow handled carries out integration output.
But existing data flow distributed variable-frequencypump mode do not become more meticulous when task divides consider it is each
The processing capability in real time of calculate node, the relatively low unsuitable carry out daily record of real-time computing that some calculate nodes can be caused current
The real-time processing of data flow, can not realize fine-grained concurrent management.
Based on the above problem, there is an urgent need for a kind of new distributed traffic analysis methods now, and two are carried out according to by data flow
Secondary sequential is divided into sub- real-time stream, sub- Recent data file and sub- history data file, according to the real-time place of calculate node
Reason ability distributes above-mentioned sub- real-time stream, sub- Recent data file and sub- history data file respectively accordingly, according to calculating
The processing capability in real time of node realizes the fine granularity management of data flow.
【Invention content】
In order to solve the above problem of the prior art, the present invention proposes a kind of distributed traffic analysis method.
The technical solution adopted by the present invention is as follows:
A kind of distributed traffic analysis method, which is characterized in that this method comprises the following steps:
(1) data flow is received, and sequential grouping is carried out to data stream, log data stream is divided into real-time number after sequential grouping
According to stream, Recent data file and history data file;
(2) secondary sequential grouping, secondary sequential point are carried out to real-time stream, Recent data file and history data file
Log data stream is divided into n sub- real-time streams and m sub- Recent data files and k History file data after group;
(3) real-time computing is selected to sort high preceding n calculate nodes as sub- real time data from multiple calculate nodes
Flowmeter operator node selects m Recent data file calculate node and k for handling Recent data file and history data file
A history data file calculate node;
(4) sub- real-time stream, sub- Recent data file and sub- history data file task are distributed into sub- real time data
Flowmeter operator node, sub- Recent data file calculate node and sub- history data file calculate node;
(5) sub- real-time stream calculate node, sub- Recent data file calculate node and sub- history data file calculate section
Point carries out the processing of corresponding analysis task respectively;
(6) above-mentioned handling result is integrated and is exported.
Beneficial effects of the present invention include:Data flow is subjected to secondary sequential and is divided into sub- real-time stream, son number in the recent period
According to file and sub- history data file, above-mentioned sub- real time data is distributed respectively accordingly according to the processing capability in real time of calculate node
Stream, sub- Recent data file and sub- history data file realize the particulate of data flow according to the processing capability in real time of calculate node
Degree management.
【Description of the drawings】
Attached drawing described herein is to be used to provide further understanding of the present invention, and forms the part of the application, but
It does not constitute improper limitations of the present invention, in the accompanying drawings:
Fig. 1 is the frame diagram of multinode data processing system of the present invention;
Fig. 2 is the flow chart of distributed traffic analysis method of the present invention.
【Specific embodiment】
Come that the present invention will be described in detail below in conjunction with attached drawing and specific embodiment, illustrative examples therein and say
It is bright to be only used for explaining the present invention, but not as a limitation of the invention.
Referring to attached drawing 1, a kind of distributed traffic analysis method is used in multinode data flow processing system, the system
Including a main controlled node and multiple calculate nodes, wherein the main controlled node is based on each according to the data flow received
Operator node distributes task, and each calculate node is used to calculate received distribution tasks in parallel.In an implementation
Mode, the multinode is in distributed system or multinode is in cloud system.
Embodiment 1, referring to attached drawing 2, a kind of distributed traffic analysis method, this method comprises the following steps:
(1) data flow is received, and sequential grouping is carried out to data stream, log data stream is divided into real-time number after sequential grouping
According to stream, Recent data file and history data file;
In one embodiment, it is defined as the time range of " real-time " according to actual needs, current " real-time " time
In the range of log data stream be divided into real-time stream, in one embodiment, " in current 3 hours " log data stream
By as real-time stream, " working as day data stored data file outside current 3 hours " is as Recent data text
Part, remaining data flow be stored historical data by as history data file.
(2) secondary sequential grouping, secondary sequential point are carried out to real-time stream, Recent data file and history data file
Log data stream is divided into n sub- real-time streams and m sub- Recent data files and k History file data after group;
3 sub- real-time streams are divided by 1 hour time interval to real-time stream in one embodiment, it is assumed that
Recent data file is 12 hours of the same day in addition to current 3 hours, then it is near to be divided into 4 sons by 3 hour time intervals
Phase data file, history data file were divided into sub- history data file in 2 days, sub- historical data in 2-4 days by two days intervals
File, sub- history data file ... .. in 4-6 days
(3) real-time computing is selected to sort high preceding n calculate nodes as sub- real time data from multiple calculate nodes
Flowmeter operator node selects m Recent data file calculate node and k for handling Recent data file and history data file
A history data file calculate node.
, can be with processed offline since history data file has been stored, the real-time calculating for calculate node
Capability Requirement is not high, and sub- real-time stream needs handled in real time, therefore it is required that the processing capability in real time of calculate node compared with
It is high.
(5) sub- real-time stream, sub- Recent data file and sub- history data file task are distributed into sub- real time data
Flowmeter operator node, sub- Recent data file calculate node and sub- history data file calculate node;
(6) sub- real-time stream calculate node, sub- Recent data file calculate node and sub- history data file calculate section
Point carries out the processing of corresponding analysis task respectively;
(6) above-mentioned handling result is integrated and is exported.
Data flow is carried out secondary sequential and is divided into sub- real-time stream, sub- Recent data file and sub- history number by the present invention
According to file, above-mentioned sub- real-time stream, sub- Recent data text are distributed respectively accordingly according to the processing capability in real time of calculate node
Part and sub- history data file realize the fine granularity management of data flow according to the processing capability in real time of calculate node.
The above is only the better embodiment of the present invention, therefore all constructions according to described in present patent application range,
The equivalent change or modification that feature and principle are done, is included in the range of present patent application.
Claims (3)
1. a kind of distributed traffic analysis method, which is characterized in that this method comprises the following steps:
(1) data flow is received, and sequential grouping is carried out to data stream, log data stream is divided into real time data after sequential grouping
Stream, Recent data file and history data file;
(2) secondary sequential grouping is carried out to real-time stream, Recent data file and history data file, after secondary sequential grouping
Log data stream is divided into n sub- real-time streams and m sub- Recent data files and k History file data;
(3) real-time computing is selected to sort high preceding n calculate nodes as sub- real time data flowmeter from multiple calculate nodes
Operator node selects to go through for handling the m Recent data file calculate node and k of Recent data file and history data file
History data file calculate node;
(4) sub- real-time stream, sub- Recent data file and sub- history data file task are distributed into sub- real time data flowmeter
Operator node, sub- Recent data file calculate node and sub- history data file calculate node;
(5) sub- real-time stream calculate node, sub- Recent data file calculate node and sub- history data file calculate node point
Not carry out corresponding analysis task processing;
(6) above-mentioned handling result is integrated and is exported.
2. distributed traffic analysis method according to claim 1, which is characterized in that the multinode is in distribution
In system.
3. distributed traffic analysis method according to claim 1, which is characterized in that the multinode is in cloud system
In.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611213281.4A CN108241661A (en) | 2016-12-23 | 2016-12-23 | A kind of distributed traffic analysis method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611213281.4A CN108241661A (en) | 2016-12-23 | 2016-12-23 | A kind of distributed traffic analysis method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108241661A true CN108241661A (en) | 2018-07-03 |
Family
ID=62703631
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611213281.4A Pending CN108241661A (en) | 2016-12-23 | 2016-12-23 | A kind of distributed traffic analysis method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108241661A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103136217A (en) * | 2011-11-24 | 2013-06-05 | 阿里巴巴集团控股有限公司 | Distributed data flow processing method and system thereof |
CN103178982A (en) * | 2011-12-23 | 2013-06-26 | 阿里巴巴集团控股有限公司 | Method and device for analyzing log |
CN103595651A (en) * | 2013-10-15 | 2014-02-19 | 北京航空航天大学 | Distributed data stream processing method and system |
CN104778188A (en) * | 2014-02-24 | 2015-07-15 | 贵州电网公司信息通信分公司 | Distributed device log collection method |
CN105897718A (en) * | 2016-04-25 | 2016-08-24 | 上海携程商务有限公司 | System and method for preventing local area network (LAN) from being scanned |
CN106126643A (en) * | 2016-06-23 | 2016-11-16 | 北京百度网讯科技有限公司 | The distributed approach of stream data and device |
-
2016
- 2016-12-23 CN CN201611213281.4A patent/CN108241661A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103136217A (en) * | 2011-11-24 | 2013-06-05 | 阿里巴巴集团控股有限公司 | Distributed data flow processing method and system thereof |
CN103178982A (en) * | 2011-12-23 | 2013-06-26 | 阿里巴巴集团控股有限公司 | Method and device for analyzing log |
CN103595651A (en) * | 2013-10-15 | 2014-02-19 | 北京航空航天大学 | Distributed data stream processing method and system |
CN104778188A (en) * | 2014-02-24 | 2015-07-15 | 贵州电网公司信息通信分公司 | Distributed device log collection method |
CN105897718A (en) * | 2016-04-25 | 2016-08-24 | 上海携程商务有限公司 | System and method for preventing local area network (LAN) from being scanned |
CN106126643A (en) * | 2016-06-23 | 2016-11-16 | 北京百度网讯科技有限公司 | The distributed approach of stream data and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105631026A (en) | Security data analysis system | |
CN105071994B (en) | A kind of mass data monitoring system | |
CN112148455B (en) | Task processing method, device and medium | |
CN102456031A (en) | MapReduce system and method for processing data streams | |
CN107562541B (en) | Load balancing distributed crawler method and crawler system | |
CN106506266B (en) | Network flow analysis method based on GPU, Hadoop/Spark mixing Computational frame | |
CN102314336B (en) | A kind of data processing method and system | |
CN103812949A (en) | Task scheduling and resource allocation method and system for real-time cloud platform | |
CN106780149A (en) | A kind of equipment real-time monitoring system based on timed task scheduling | |
CN107193652A (en) | The flexible resource dispatching method and system of flow data processing system in container cloud environment | |
CN107086929A (en) | A kind of batch streaming computing system performance guarantee method based on modeling of queuing up | |
CN105868222A (en) | Task scheduling method and device | |
CN111367951A (en) | Method and device for processing stream data | |
CN108345450A (en) | The method for generating the software architecture for managing data | |
CN107656973A (en) | A kind of log audit subsystem applied to cloud auditing system | |
CN104320382A (en) | Distributive real-time stream processing device, method and unit | |
CN106909624B (en) | Real-time sequencing optimization method for mass data | |
CN108241644A (en) | A kind of data Mining stream method | |
CN108241661A (en) | A kind of distributed traffic analysis method | |
CN105468676A (en) | Big data processing method | |
CN104407811A (en) | Cloud computing-based merging IO (input/output) device | |
CN106791932A (en) | Distributed trans-coding system, method and its device | |
CN108241525A (en) | A kind of multinode task dynamic control method | |
CN115664992A (en) | Network operation data processing method and device, electronic equipment and medium | |
CN106022374B (en) | The method and device that a kind of pair of history flow data is classified |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 101399 No. 2 East Airport Road, Shunyi Airport Economic Core Area, Beijing (1st, 5th and 7th floors of Industrial Park 1A-4) Applicant after: Zhongke Star Map Co., Ltd. Address before: 101399 Building 1A-4, National Geographic Information Technology Industrial Park, Guomen Business District, Shunyi District, Beijing Applicant before: Space Star Technology (Beijing) Co., Ltd. |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180703 |