CN110569360A - Method for labeling and automatically associating network session data - Google Patents

Method for labeling and automatically associating network session data Download PDF

Info

Publication number
CN110569360A
CN110569360A CN201910840735.8A CN201910840735A CN110569360A CN 110569360 A CN110569360 A CN 110569360A CN 201910840735 A CN201910840735 A CN 201910840735A CN 110569360 A CN110569360 A CN 110569360A
Authority
CN
China
Prior art keywords
session data
matching
source
time
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910840735.8A
Other languages
Chinese (zh)
Inventor
刘洋
邓金祥
代先勇
谷峰
曾海刚
王文武
佘朝裕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CHENGDU PONDER TECHNOLOGY Co Ltd
Original Assignee
CHENGDU PONDER TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CHENGDU PONDER TECHNOLOGY Co Ltd filed Critical CHENGDU PONDER TECHNOLOGY Co Ltd
Priority to CN201910840735.8A priority Critical patent/CN110569360A/en
Publication of CN110569360A publication Critical patent/CN110569360A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

the invention discloses a method for labeling and automatically associating network session data, which comprises the following steps: step 1, establishing multidimensional classification labels aiming at a network session data source set in a system; step 2, importing the network session data source set into a system, marking the session data by using multidimensional classification labels, and generating a label ID (identity) by the label; and 3, performing three-time traversal matching on all session data according to different label classifications, performing first-time traversal matching on the accurate items, performing second-time traversal matching on the range items in the first-time matching result, performing third-time traversal matching on the fuzzy items in the second-time matching result, and storing the session data matched for the third time in association with the label ID. The invention can realize multi-classification statistics of network session data, applies the analysis result to the whole session data, expands the analysis success, and greatly improves the analysis and comparison efficiency of the data compared with the prior system which relies on a mode that manual recording cannot be automatically associated by manual one-by-one analysis.

Description

Method for labeling and automatically associating network session data
Technical Field
The invention relates to the technical field of data statistics, in particular to a method for labeling and automatically associating network session data.
Background
With the development of computer technology and internet, the improvement of broadband rate and the reduction of cost, the arrival of 5G technology and the popularization of the internet of things, the connection between the life and work of people and the network is tighter and tighter, the number of network sessions is increased in geometric level, and the level can easily reach hundreds of millions. When an analysis expert obtains session data of suspicious, safe or threatening states through long-term layer-by-layer screening analysis in mass data like the vast sea, technicians need to record the session data obtained through screening analysis and apply the session data to analysis and comparison of other data. Therefore, a method for tagging and automatically associating network session data is needed in the art.
Disclosure of Invention
The present invention is directed to a method for tagging and automatically associating network session data in order to solve the above problems.
in order to achieve the above object, the present disclosure provides a method for tagging and automatically associating network session data, comprising the following steps:
Step 1, establishing multi-dimensional classification labels aiming at a source/target IP (Internet protocol/target), a source/target port, a source/target MAC (media access control), a session protocol, an abnormal type, a sending/receiving/overall load, a sending/receiving/overall packet number, duration, a domain name, a URL (Uniform resource locator) and content details of a network session data source set in a system;
Step 2, importing the network session data source set into a system, marking the session data by using multidimensional classification labels, and generating a label ID (identity) by the label;
and 3, performing three-time traversal matching on all session data according to different label classifications, performing first-time traversal matching on accurate items, performing second-time traversal matching on range items in the first-time matching result, performing third-time traversal matching on fuzzy items in the second-time matching result, and storing the session data matched for the third time in association with the label ID, wherein the accurate items comprise source/target IPs, source/target MACs, source/target ports, session protocols and abnormal types, the range items comprise sending/receiving/overall loads, sending/receiving/overall packet numbers and duration, and the fuzzy items comprise domain names, URLs and content details.
the invention has the beneficial effects that:
1. The invention establishes multidimensional classification labels, associates the matched data with the label ID, realizes multi-classification statistics of network session data, combines the experience technology of an analytical expert with the high-concurrency, multi-task and high-efficiency calculation advantages of a modern computer, and can apply the analytical result of the expert to the whole session data through simple operation, so that the analysis is successfully expanded, compared with the existing system which depends on the mode that manual record cannot be automatically associated through manual one-by-one analysis, the analysis and comparison efficiency of the data is greatly improved;
2. The traversal matching mode of the data, disclosed by the invention, matches simple conditions first and then matches complex conditions, so that the data range is efficiently reduced and the matching efficiency is improved.
drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
FIG. 1 is an operational page of a method for tagging and automatically associating web session data according to the present invention;
FIG. 2 is a label configuration page for applying the method for automatically associating and annotating network session data according to the present invention;
FIG. 3 is a three-round multi-threaded concurrent session data matching process of the method for network session data annotation and auto-correlation according to the present invention;
FIG. 4 is a detailed diagram of the operation of a method for tagging and automatically associating network session data according to the present invention;
FIG. 5 is a detailed diagram of the operation of the method for tagging and automatically associating network session data according to the present invention.
Detailed Description
The following describes in detail specific embodiments of the present disclosure. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.
The invention relates to a method for labeling and automatically associating network session data, which comprises the following steps:
step 1, establishing multi-dimensional classification labels aiming at a source/target IP (Internet protocol/target), a source/target port, a source/target MAC (media access control), a session protocol, an abnormal type, a sending/receiving/overall load, a sending/receiving/overall packet number, duration, a domain name, a URL (Uniform resource locator) and content details of a network session data source set in a system;
step 2, importing the network session data source set into a system, marking the session data by using multidimensional classification labels, storing the labels as a single record, and generating label IDs;
and 3, performing three-time traversal matching on all session data according to different label classifications, performing first-time traversal matching on accurate items, performing second-time traversal matching on range items in the first-time matching result, performing third-time traversal matching on fuzzy items in the second-time matching result, and storing the session data matched for the third time in association with the label ID, wherein the accurate items comprise source/target IPs, source/target MACs, source/target ports, session protocols and abnormal types, the range items comprise sending/receiving/overall loads, sending/receiving/overall packet numbers and duration, and the fuzzy items comprise domain names, URLs and content details.
furthermore, in step 3, a multithreading segmentation mode is applied to perform traversal matching on the session data, the number of threads is freely configured, the number of segments is the total number of session data strips divided by the number of threads, each segment is the total number of session data strips divided by the number of segments, if a remainder exists, the rest data is uniformly put in from the first segment of data, the purpose of substantially equally dividing the data is achieved, and finally the matching results of each segment of data are spliced together to form a matching result set.
The first traversal matching is carried out in all session data, the second traversal matching is carried out in the result set of the first matching, the third traversal matching is carried out in the result set of the second matching, and through a matching mode of firstly matching simple conditions and then matching complex conditions, the data range is effectively reduced, and the matching efficiency is improved.
The method for labeling and automatically associating the network session data supports labeling any session data line, and simultaneously supports various types of conditions including source/target IP, source/target MAC, source/target ports, session protocols, abnormal types, sending/receiving/overall load, sending/receiving/overall packet number, duration, domain names, URLs and content details during labeling. The invention establishes multidimensional classification labels, associates the matched data with the label ID, realizes multi-classification statistics of network session data, combines the experience technology of an analytical expert with the high-concurrency, multi-task and high-efficiency calculation advantages of a modern computer, and can apply the analytical result of the expert to the whole session data through simple operation, so that the analysis is successfully expanded.
The specific implementation of the system to which the invention is applied is as follows:
the method comprises the following steps: and (4) acquiring a session data source collection list which is acquired and managed well from the outside, and displaying the list to the right side of the system. The display column contains the source IP, source port, source MAC, destination IP, destination port, destination MAC, session protocol, exception type, send load, send packet number, receive load, receive packet number, session load, session packet number, duration, start time, end time, exception type, domain name, URL, content details. As shown in the right-hand portion of fig. 1.
Step two: after the first step is completed, the analysis expert can perform gradual analysis in the session data source list, and when some session is found to be possibly abnormal, the line where the session is located can be clicked by a right button, a button of a pencil icon is selected in a popped right-click menu item, and finally a session data labeling configuration page is presented. As shown in fig. 2.
Step three: when the session data annotation page is opened, the conditions which can be used for marking in the session are automatically displayed. Respectively source/destination IP, source/destination MAC, source/destination port, session protocol, exception type, send/receive/bulk load, send/receive/bulk packet count, duration, domain name, URL, content details. Wherein, the source/target IP, the source/target MAC, the source/target port, the session protocol and the abnormal type only support the accurate matching, the sending/receiving/whole load, the sending/receiving/whole packet number and the continuous time support the accurate and range matching, and the domain name, URL and the content detail support the accurate and fuzzy matching. As shown in fig. 2.
Step four: after the configuration of the marking information is completed through the second step and the third step, the system automatically starts a background task, starts a plurality of threads and performs group-by-group traversal comparison on all session data. The comparison is carried out in three rounds, the first round is only matched with the accurate item, the second round is only matched with the range item, and the third round is only matched with the fuzzy item, so that the matching range can be efficiently reduced, and the final matching item can be obtained. And finally, adding marking information to all matched items, and recording the marking information into a marking bookmark. As shown in fig. 3.
step five: all the session data after the automatic association in step four is completed will be displayed in the bookmark. The mark bookmark is displayed as a two-layer tree structure, the top node of the tree supports classification according to IP, port, MAC, protocol and exception, and can be switched and selected by a user. And displaying the nodes of the second layer of the tree according to the aggregation of the mark names, and displaying how many sessions are associated according to the top-level node type and the marking condition. And each second-layer node is followed by a deleting button and an editing button, the mark can be modified and deleted, and a mouse is hovered over the node to display the marking configuration detail information of the node. The mark bookmark also supports a fuzzy query function, and can perform fuzzy matching on the second-layer node names. As shown on the left side of fig. 1 and in fig. 4.
Step six: the second level node of the marked bookmark supports double-click viewing detail operation. After the user double-clicks the node, the session detail data automatically associated with all the systems is displayed on the right side. If a certain session has label information, the session can also be seen through a label graphic representation of the session.
Step seven: the labeled diagrams in the session list are divided into two types. One is the left-most vertical bar labeled illustration of the session, which is displayed as long as the session is labeled. The other is a triangular mark diagram on the upper right corner of the conversation line cell, the mark only appears in IP, MAC, port, protocol and abnormal cells, and the mark diagram appears when the conversation mark condition is related to IP, MAC, port, protocol and abnormality. As shown on the right side of fig. 1 and in fig. 5.
step eight: when the mouse hovers over the marker graphic representation, the name of the marker and the condition configuration summary information are also displayed in the form of a floating window. As shown in fig. 5.
the classification checking function in the fifth step to the seventh step is specifically as follows:
firstly, displaying all marked items in a classification mode through a two-layer tree, wherein the first-layer tree node is a classification type, and the second-layer node is a marked statistical result;
The classification type of the second first layer tree node supports dynamic single selection switching, and the support types are classified according to IP, ports, MAC, protocols and exceptions;
the name of a node of a third second layer tree is a mark name, the number of session data associated with the mark is displayed behind the name, a mouse is hovered over the node to display detailed condition information (including all configurable and enabled conditions) of the mark, and a shortcut operation button for editing and deleting the configuration of the mark is arranged behind the node;
the names of the nodes of the second layer of tree support fuzzy matching search, after the keywords are input, only the nodes containing the keywords are displayed by the tree seeds, and other nodes are hidden;
Fifthly, double-clicking the second layer tree node can refresh the content displayed in the right session data detail list and only display the result set meeting the marking condition of the node;
sixthly, special marked diagrams are displayed in the conversation detail lines meeting any marking conditions. The labeled diagrams fall into two categories. One is the left-most vertical bar labeled illustration of the session, which is displayed as long as the session is labeled. And secondly, a triangular mark diagram on the upper right corner of the conversation line cell, wherein the mark only appears in the IP, MAC, port, protocol and abnormal cell, and the mark diagram appears when the conversation mark condition is related to the IP, MAC, port, protocol and abnormality. When the mouse is hovered over the mark graphic, the name of the mark and the condition configuration information are also displayed in the form of a floating form.
the analysis of the result recording, sharing and expanding functions in the step eight specifically comprises the following steps:
the record is a mark of the analysis result data, and the mark is stored in the server and can exist permanently as long as the mark is not deleted actively. Sharing refers to all users seeing the marked items and the marked results. The condition of the mark is expanded, and the condition can be configured to be a range or fuzzy matching, so that the effect of covering the surface with points is achieved.
The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.
It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various combinations that are possible in the present disclosure are not described again.
In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims (2)

1. the method for labeling and automatically associating the network session data is characterized by comprising the following steps of:
Step 1, establishing multi-dimensional classification labels aiming at a source/target IP (Internet protocol/target), a source/target port, a source/target MAC (media access control), a session protocol, an abnormal type, a sending/receiving/overall load, a sending/receiving/overall packet number, duration, a domain name, a URL (Uniform resource locator) and content details of a network session data source set in a system;
Step 2, importing the network session data source set into a system, marking the session data by using multidimensional classification labels, and generating a label ID (identity) by the label;
And 3, performing three-time traversal matching on all session data according to different label classifications, performing first-time traversal matching on accurate items, performing second-time traversal matching on range items in the first-time matching result, performing third-time traversal matching on fuzzy items in the second-time matching result, and storing the session data matched for the third time in association with the label ID, wherein the accurate items comprise source/target IPs, source/target MACs, source/target ports, session protocols and abnormal types, the range items comprise sending/receiving/overall loads, sending/receiving/overall packet numbers and duration, and the fuzzy items comprise domain names, URLs and content details.
2. the method according to claim 1, wherein in step 3, the session data is traversed and matched by means of multi-thread segmentation, the number of threads is freely configured, the number of segments is total number of session data divided by the number of threads, the number of segments is total number of session data divided by the number of segments, if there is a remainder, the rest data is uniformly put in from the first segment of data, and the matching results of each segment of data are spliced together to form a matching result set.
CN201910840735.8A 2019-09-06 2019-09-06 Method for labeling and automatically associating network session data Pending CN110569360A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910840735.8A CN110569360A (en) 2019-09-06 2019-09-06 Method for labeling and automatically associating network session data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910840735.8A CN110569360A (en) 2019-09-06 2019-09-06 Method for labeling and automatically associating network session data

Publications (1)

Publication Number Publication Date
CN110569360A true CN110569360A (en) 2019-12-13

Family

ID=68778117

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910840735.8A Pending CN110569360A (en) 2019-09-06 2019-09-06 Method for labeling and automatically associating network session data

Country Status (1)

Country Link
CN (1) CN110569360A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111813642A (en) * 2020-07-06 2020-10-23 成都深思科技有限公司 Multithreading-based network communication session data statistical operation method

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101119321A (en) * 2007-09-29 2008-02-06 杭州华三通信技术有限公司 Network flux classification processing method and apparatus
CN102325347A (en) * 2011-09-14 2012-01-18 中兴通讯股份有限公司 Transport stream template coupling method in LTE system and apparatus thereof
JP2012238926A (en) * 2011-05-09 2012-12-06 Canon Inc Data control device, data control method in the same, and program
CN104579941A (en) * 2015-01-05 2015-04-29 北京邮电大学 Message classification method in OpenFlow switch
CN106250480A (en) * 2016-08-01 2016-12-21 浪潮软件集团有限公司 Metadata-based visual statistical analysis method
CN106452948A (en) * 2016-09-22 2017-02-22 恒安嘉新(北京)科技有限公司 Automatic classification method and system of network flow
WO2018121153A1 (en) * 2016-12-29 2018-07-05 北京国双科技有限公司 Written judgment retrieval method and device
CN108449226A (en) * 2018-02-28 2018-08-24 华青融天(北京)技术股份有限公司 The method and system of information Fast Classification
CN108923954A (en) * 2018-06-07 2018-11-30 成都深思科技有限公司 A kind of network data visual analyzing and display systems
CN110069575A (en) * 2019-04-25 2019-07-30 中电科嘉兴新型智慧城市科技发展有限公司 A kind of dynamic data statistical method and system based on multidimensional data mark
CN110100415A (en) * 2016-12-30 2019-08-06 比特梵德荷兰私人有限责任公司 System for network flow to be ready for quickly analyzing

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101119321A (en) * 2007-09-29 2008-02-06 杭州华三通信技术有限公司 Network flux classification processing method and apparatus
JP2012238926A (en) * 2011-05-09 2012-12-06 Canon Inc Data control device, data control method in the same, and program
CN102325347A (en) * 2011-09-14 2012-01-18 中兴通讯股份有限公司 Transport stream template coupling method in LTE system and apparatus thereof
CN104579941A (en) * 2015-01-05 2015-04-29 北京邮电大学 Message classification method in OpenFlow switch
CN106250480A (en) * 2016-08-01 2016-12-21 浪潮软件集团有限公司 Metadata-based visual statistical analysis method
CN106452948A (en) * 2016-09-22 2017-02-22 恒安嘉新(北京)科技有限公司 Automatic classification method and system of network flow
WO2018121153A1 (en) * 2016-12-29 2018-07-05 北京国双科技有限公司 Written judgment retrieval method and device
CN110100415A (en) * 2016-12-30 2019-08-06 比特梵德荷兰私人有限责任公司 System for network flow to be ready for quickly analyzing
CN108449226A (en) * 2018-02-28 2018-08-24 华青融天(北京)技术股份有限公司 The method and system of information Fast Classification
CN108923954A (en) * 2018-06-07 2018-11-30 成都深思科技有限公司 A kind of network data visual analyzing and display systems
CN110069575A (en) * 2019-04-25 2019-07-30 中电科嘉兴新型智慧城市科技发展有限公司 A kind of dynamic data statistical method and system based on multidimensional data mark

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111813642A (en) * 2020-07-06 2020-10-23 成都深思科技有限公司 Multithreading-based network communication session data statistical operation method

Similar Documents

Publication Publication Date Title
CN104462113B (en) Searching method, device and electronic equipment
US20140067842A1 (en) Information processing method and apparatus
US20080306899A1 (en) Methods, apparatus, and computer-readable media for analyzing conversational-type data
US7539934B2 (en) Computer-implemented method, system, and program product for developing a content annotation lexicon
CN112104734B (en) Method, device, equipment and storage medium for pushing information
CN107861927A (en) Document annotation, device, readable storage medium storing program for executing and computer equipment
CN106528894A (en) Method and device for setting label information
US20080155430A1 (en) Integrating private metadata into a collaborative environment
CN106033438A (en) Public sentiment data storage method and server
US8402043B2 (en) Analytics of historical conversations in relation to present communication
CN104571804B (en) A kind of method and system to being associated across the document interface of application program
CN110569360A (en) Method for labeling and automatically associating network session data
US9355402B2 (en) System, method and computer program product for improving messages content using user'S tagging feedback
CN109558381A (en) A kind of data processing method and device
CN103220555B (en) The sorting technique of a kind of digital cable customers, Apparatus and system
US20120005202A1 (en) Method for Acceleration of Legacy to Service Oriented (L2SOA) Architecture Renovations
US11275803B2 (en) Contextually related sharing of commentary for different portions of an information base
CN107632972A (en) Sheet disposal method and apparatus
CN107767156A (en) A kind of information input method, apparatus and system
JP7206632B2 (en) System, method and program for visual exploration of subnetwork patterns in bimodal networks
US20160335573A1 (en) Describing a paradigmatic member of a task directed community in a complex heterogeneous environment based on non-linear attributes
Saravanan Segment based indexing technique for video data file
CN114116811B (en) Log processing method, device, equipment and storage medium
CN103902280B (en) transaction processing method and device
JP7119550B2 (en) System and method, program, and computer device for visual search of search results in bimodal networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination