CN108241661A - A kind of distributed traffic analysis method - Google Patents

A kind of distributed traffic analysis method Download PDF

Info

Publication number
CN108241661A
CN108241661A CN201611213281.4A CN201611213281A CN108241661A CN 108241661 A CN108241661 A CN 108241661A CN 201611213281 A CN201611213281 A CN 201611213281A CN 108241661 A CN108241661 A CN 108241661A
Authority
CN
China
Prior art keywords
sub
data file
real
time
stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611213281.4A
Other languages
Chinese (zh)
Inventor
李振钊
安西民
徐凤桐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Space Star Technology (beijing) Co Ltd
Original Assignee
Space Star Technology (beijing) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Space Star Technology (beijing) Co Ltd filed Critical Space Star Technology (beijing) Co Ltd
Priority to CN201611213281.4A priority Critical patent/CN108241661A/en
Publication of CN108241661A publication Critical patent/CN108241661A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The present invention provides a kind of distributed traffic analysis method, data flow is subjected to secondary sequential and is divided into sub- real-time stream, sub- Recent data file and sub- history data file, it distributes above-mentioned sub- real-time stream, sub- Recent data file and sub- history data file respectively accordingly according to the processing capability in real time of calculate node, the fine granularity management of data flow is realized according to the processing capability in real time of calculate node.

Description

A kind of distributed traffic analysis method
【Technical field】
The invention belongs to Data Stream Processing field more particularly to distributed data method for stream processing.
【Background technology】
The current state of system can be held and carry out real-time response, mesh by carrying out analysis in real time to data stream in the prior art It is preceding distributed stream process in real time is carried out to massive logs data flow mode be:Real-time reception batch of data stream first, and should Data flow is issued to multiple processing units in a manner that distribution is distributed, and each processing unit receives the one or more of distribution Data flow is simultaneously analyzed and processed in real time, and the data flow handled carries out integration output.
But existing data flow distributed variable-frequencypump mode do not become more meticulous when task divides consider it is each The processing capability in real time of calculate node, the relatively low unsuitable carry out daily record of real-time computing that some calculate nodes can be caused current The real-time processing of data flow, can not realize fine-grained concurrent management.
Based on the above problem, there is an urgent need for a kind of new distributed traffic analysis methods now, and two are carried out according to by data flow Secondary sequential is divided into sub- real-time stream, sub- Recent data file and sub- history data file, according to the real-time place of calculate node Reason ability distributes above-mentioned sub- real-time stream, sub- Recent data file and sub- history data file respectively accordingly, according to calculating The processing capability in real time of node realizes the fine granularity management of data flow.
【Invention content】
In order to solve the above problem of the prior art, the present invention proposes a kind of distributed traffic analysis method.
The technical solution adopted by the present invention is as follows:
A kind of distributed traffic analysis method, which is characterized in that this method comprises the following steps:
(1) data flow is received, and sequential grouping is carried out to data stream, log data stream is divided into real-time number after sequential grouping According to stream, Recent data file and history data file;
(2) secondary sequential grouping, secondary sequential point are carried out to real-time stream, Recent data file and history data file Log data stream is divided into n sub- real-time streams and m sub- Recent data files and k History file data after group;
(3) real-time computing is selected to sort high preceding n calculate nodes as sub- real time data from multiple calculate nodes Flowmeter operator node selects m Recent data file calculate node and k for handling Recent data file and history data file A history data file calculate node;
(4) sub- real-time stream, sub- Recent data file and sub- history data file task are distributed into sub- real time data Flowmeter operator node, sub- Recent data file calculate node and sub- history data file calculate node;
(5) sub- real-time stream calculate node, sub- Recent data file calculate node and sub- history data file calculate section Point carries out the processing of corresponding analysis task respectively;
(6) above-mentioned handling result is integrated and is exported.
Beneficial effects of the present invention include:Data flow is subjected to secondary sequential and is divided into sub- real-time stream, son number in the recent period According to file and sub- history data file, above-mentioned sub- real time data is distributed respectively accordingly according to the processing capability in real time of calculate node Stream, sub- Recent data file and sub- history data file realize the particulate of data flow according to the processing capability in real time of calculate node Degree management.
【Description of the drawings】
Attached drawing described herein is to be used to provide further understanding of the present invention, and forms the part of the application, but It does not constitute improper limitations of the present invention, in the accompanying drawings:
Fig. 1 is the frame diagram of multinode data processing system of the present invention;
Fig. 2 is the flow chart of distributed traffic analysis method of the present invention.
【Specific embodiment】
Come that the present invention will be described in detail below in conjunction with attached drawing and specific embodiment, illustrative examples therein and say It is bright to be only used for explaining the present invention, but not as a limitation of the invention.
Referring to attached drawing 1, a kind of distributed traffic analysis method is used in multinode data flow processing system, the system Including a main controlled node and multiple calculate nodes, wherein the main controlled node is based on each according to the data flow received Operator node distributes task, and each calculate node is used to calculate received distribution tasks in parallel.In an implementation Mode, the multinode is in distributed system or multinode is in cloud system.
Embodiment 1, referring to attached drawing 2, a kind of distributed traffic analysis method, this method comprises the following steps:
(1) data flow is received, and sequential grouping is carried out to data stream, log data stream is divided into real-time number after sequential grouping According to stream, Recent data file and history data file;
In one embodiment, it is defined as the time range of " real-time " according to actual needs, current " real-time " time In the range of log data stream be divided into real-time stream, in one embodiment, " in current 3 hours " log data stream By as real-time stream, " working as day data stored data file outside current 3 hours " is as Recent data text Part, remaining data flow be stored historical data by as history data file.
(2) secondary sequential grouping, secondary sequential point are carried out to real-time stream, Recent data file and history data file Log data stream is divided into n sub- real-time streams and m sub- Recent data files and k History file data after group;
3 sub- real-time streams are divided by 1 hour time interval to real-time stream in one embodiment, it is assumed that Recent data file is 12 hours of the same day in addition to current 3 hours, then it is near to be divided into 4 sons by 3 hour time intervals Phase data file, history data file were divided into sub- history data file in 2 days, sub- historical data in 2-4 days by two days intervals File, sub- history data file ... .. in 4-6 days
(3) real-time computing is selected to sort high preceding n calculate nodes as sub- real time data from multiple calculate nodes Flowmeter operator node selects m Recent data file calculate node and k for handling Recent data file and history data file A history data file calculate node.
, can be with processed offline since history data file has been stored, the real-time calculating for calculate node Capability Requirement is not high, and sub- real-time stream needs handled in real time, therefore it is required that the processing capability in real time of calculate node compared with It is high.
(5) sub- real-time stream, sub- Recent data file and sub- history data file task are distributed into sub- real time data Flowmeter operator node, sub- Recent data file calculate node and sub- history data file calculate node;
(6) sub- real-time stream calculate node, sub- Recent data file calculate node and sub- history data file calculate section Point carries out the processing of corresponding analysis task respectively;
(6) above-mentioned handling result is integrated and is exported.
Data flow is carried out secondary sequential and is divided into sub- real-time stream, sub- Recent data file and sub- history number by the present invention According to file, above-mentioned sub- real-time stream, sub- Recent data text are distributed respectively accordingly according to the processing capability in real time of calculate node Part and sub- history data file realize the fine granularity management of data flow according to the processing capability in real time of calculate node.
The above is only the better embodiment of the present invention, therefore all constructions according to described in present patent application range, The equivalent change or modification that feature and principle are done, is included in the range of present patent application.

Claims (3)

1. a kind of distributed traffic analysis method, which is characterized in that this method comprises the following steps:
(1) data flow is received, and sequential grouping is carried out to data stream, log data stream is divided into real time data after sequential grouping Stream, Recent data file and history data file;
(2) secondary sequential grouping is carried out to real-time stream, Recent data file and history data file, after secondary sequential grouping Log data stream is divided into n sub- real-time streams and m sub- Recent data files and k History file data;
(3) real-time computing is selected to sort high preceding n calculate nodes as sub- real time data flowmeter from multiple calculate nodes Operator node selects to go through for handling the m Recent data file calculate node and k of Recent data file and history data file History data file calculate node;
(4) sub- real-time stream, sub- Recent data file and sub- history data file task are distributed into sub- real time data flowmeter Operator node, sub- Recent data file calculate node and sub- history data file calculate node;
(5) sub- real-time stream calculate node, sub- Recent data file calculate node and sub- history data file calculate node point Not carry out corresponding analysis task processing;
(6) above-mentioned handling result is integrated and is exported.
2. distributed traffic analysis method according to claim 1, which is characterized in that the multinode is in distribution In system.
3. distributed traffic analysis method according to claim 1, which is characterized in that the multinode is in cloud system In.
CN201611213281.4A 2016-12-23 2016-12-23 A kind of distributed traffic analysis method Pending CN108241661A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611213281.4A CN108241661A (en) 2016-12-23 2016-12-23 A kind of distributed traffic analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611213281.4A CN108241661A (en) 2016-12-23 2016-12-23 A kind of distributed traffic analysis method

Publications (1)

Publication Number Publication Date
CN108241661A true CN108241661A (en) 2018-07-03

Family

ID=62703631

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611213281.4A Pending CN108241661A (en) 2016-12-23 2016-12-23 A kind of distributed traffic analysis method

Country Status (1)

Country Link
CN (1) CN108241661A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136217A (en) * 2011-11-24 2013-06-05 阿里巴巴集团控股有限公司 Distributed data flow processing method and system thereof
CN103178982A (en) * 2011-12-23 2013-06-26 阿里巴巴集团控股有限公司 Method and device for analyzing log
CN103595651A (en) * 2013-10-15 2014-02-19 北京航空航天大学 Distributed data stream processing method and system
CN104778188A (en) * 2014-02-24 2015-07-15 贵州电网公司信息通信分公司 Distributed device log collection method
CN105897718A (en) * 2016-04-25 2016-08-24 上海携程商务有限公司 System and method for preventing local area network (LAN) from being scanned
CN106126643A (en) * 2016-06-23 2016-11-16 北京百度网讯科技有限公司 The distributed approach of stream data and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136217A (en) * 2011-11-24 2013-06-05 阿里巴巴集团控股有限公司 Distributed data flow processing method and system thereof
CN103178982A (en) * 2011-12-23 2013-06-26 阿里巴巴集团控股有限公司 Method and device for analyzing log
CN103595651A (en) * 2013-10-15 2014-02-19 北京航空航天大学 Distributed data stream processing method and system
CN104778188A (en) * 2014-02-24 2015-07-15 贵州电网公司信息通信分公司 Distributed device log collection method
CN105897718A (en) * 2016-04-25 2016-08-24 上海携程商务有限公司 System and method for preventing local area network (LAN) from being scanned
CN106126643A (en) * 2016-06-23 2016-11-16 北京百度网讯科技有限公司 The distributed approach of stream data and device

Similar Documents

Publication Publication Date Title
CN105631026A (en) Security data analysis system
CN105071994B (en) A kind of mass data monitoring system
CN112148455B (en) Task processing method, device and medium
CN102456031A (en) MapReduce system and method for processing data streams
CN107562541B (en) Load balancing distributed crawler method and crawler system
CN106506266B (en) Network flow analysis method based on GPU, Hadoop/Spark mixing Computational frame
CN102314336B (en) A kind of data processing method and system
CN103812949A (en) Task scheduling and resource allocation method and system for real-time cloud platform
CN106780149A (en) A kind of equipment real-time monitoring system based on timed task scheduling
CN107193652A (en) The flexible resource dispatching method and system of flow data processing system in container cloud environment
CN107086929A (en) A kind of batch streaming computing system performance guarantee method based on modeling of queuing up
CN105868222A (en) Task scheduling method and device
CN111367951A (en) Method and device for processing stream data
CN108345450A (en) The method for generating the software architecture for managing data
CN107656973A (en) A kind of log audit subsystem applied to cloud auditing system
CN104320382A (en) Distributive real-time stream processing device, method and unit
CN106909624B (en) Real-time sequencing optimization method for mass data
CN108241644A (en) A kind of data Mining stream method
CN108241661A (en) A kind of distributed traffic analysis method
CN105468676A (en) Big data processing method
CN104407811A (en) Cloud computing-based merging IO (input/output) device
CN106791932A (en) Distributed trans-coding system, method and its device
CN108241525A (en) A kind of multinode task dynamic control method
CN115664992A (en) Network operation data processing method and device, electronic equipment and medium
CN106022374B (en) The method and device that a kind of pair of history flow data is classified

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 101399 No. 2 East Airport Road, Shunyi Airport Economic Core Area, Beijing (1st, 5th and 7th floors of Industrial Park 1A-4)

Applicant after: Zhongke Star Map Co., Ltd.

Address before: 101399 Building 1A-4, National Geographic Information Technology Industrial Park, Guomen Business District, Shunyi District, Beijing

Applicant before: Space Star Technology (Beijing) Co., Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180703