WO2022100707A1 - 一种确定数据流信息的方法、装置及系统 - Google Patents

一种确定数据流信息的方法、装置及系统 Download PDF

Info

Publication number
WO2022100707A1
WO2022100707A1 PCT/CN2021/130427 CN2021130427W WO2022100707A1 WO 2022100707 A1 WO2022100707 A1 WO 2022100707A1 CN 2021130427 W CN2021130427 W CN 2021130427W WO 2022100707 A1 WO2022100707 A1 WO 2022100707A1
Authority
WO
WIPO (PCT)
Prior art keywords
group
data
flow
access mode
server
Prior art date
Application number
PCT/CN2021/130427
Other languages
English (en)
French (fr)
Inventor
薛莉
徐威旺
张亮
程剑
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202110131909.0A external-priority patent/CN114567455A/zh
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21891227.7A priority Critical patent/EP4236200A4/en
Publication of WO2022100707A1 publication Critical patent/WO2022100707A1/zh
Priority to US18/316,591 priority patent/US20230283624A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0236Filtering by address, protocol, port number or service, e.g. IP-address or URL
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0263Rule management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general

Definitions

  • the present application relates to the field of communication technologies, and in particular, to a method, apparatus and system for determining data flow information.
  • terminal devices With the increasing diversification and complexity of services in the communication field, the number of different types of terminal devices is increasing, which leads to the blurring of the trusted boundary of the network. Due to the wide distribution and scattered access points, these terminal devices are difficult to be managed centrally. They may act as a springboard for attackers to attack the network to achieve illegal purposes and cause serious economic losses.
  • the terminal device interacts with the server through data packets to request services.
  • the server sends a data packet to the terminal device to provide a service or send a feedback response.
  • a group of data packets exchanged between a terminal and a server is collectively referred to as a data stream.
  • the present application provides a method, device and system for determining data flow information, which are used to mine the access rules reflected by the data flow actually transmitted in the network.
  • the present application provides a method for determining data flow information.
  • the method can be applied to a first device, and the first device can be a forwarding device or a device attached to the forwarding device (hereinafter referred to as the bypass device). ) or a management device, the method is implemented by the first device, and specifically may be implemented by a component of the first device, such as a processing device, a circuit, a chip, etc. in the first device.
  • the method includes: the first device acquires flow parameters of multiple data flows within a period of time (referred to as the first period of time), wherein the flow parameters include but are not limited to: protocol type, terminal port number, server IP address and server port number ; Based on the flow parameters of the multiple data flows and the flow parameter rules of at least one preset access mode, at least one data flow group is obtained; wherein, a preset access mode corresponds to a set of preset flow parameter rules, each The relationship between the data streams included in the data stream group satisfies a certain preset stream parameter rule; for the determined data stream group, the group parameter of the data stream group is determined based on the stream parameters of the data streams in the data stream group , where the group parameters include but are not limited to: server IP address, server port number range, terminal port number range, and protocol type. Specifically, the lower limit of the server port number range in the group parameter is the minimum value of the server port number in the data flow in the data flow group, and the upper limit of the server port number range is the maximum server port
  • data flows with the same access rule can be regarded as a data flow group, and the group parameters of each data flow group can be determined, and these group parameters can be used to formulate security rules.
  • security or monitoring scenarios such as anomaly detection or anomaly detection
  • the existing security work scenarios that are completely dependent on experience can be avoided, and the information of the real transmission data flow can be better applied, which can be used to improve the reliability and guarantee of network security.
  • the group parameter of the above-mentioned data flow group may be used to identify abnormal data flow or to determine a security rule, wherein the security rule is used to control the forwarding device to forward the data flow.
  • the group parameters may also include, but are not limited to, some or all of the following: terminal IP address set, stream number of data streams, time mode information, access mode identifier, stream support, device access support degree; wherein, the terminal IP address set includes different terminal IP addresses corresponding to the data streams in the data stream group;
  • the number of streams of the data stream refers to the number of data streams contained in the data stream group; 2) the time mode information is used to indicate the preset time mode to which the data stream group belongs, wherein different preset time modes are different from the preset time modes. The time ranges are in one-to-one correspondence; 3) the access mode identifier is used to identify the preset access mode to which the data stream group belongs; 4) the stream support degree is based on the number of data streams in the data stream group and the total number of data streams in the first time period 5) The device access support degree is determined according to the number of terminals corresponding to the data flow group and the total number of terminals determined by the sample data.
  • the sample data refers to all the data streams (stream parameters) on which this data stream group mining is based.
  • the access behavior of the data stream can be mined through multiple dimensions, the accuracy of the data stream group mining can be improved, and the applicability is strong.
  • the at least one preset access mode includes one or more of the following modes: a first access mode, a second access mode, and a third access mode; wherein, the data flow belonging to the first access mode
  • the relationship between the data flows in the group satisfies the first flow parameter rule, and the first flow parameter rule includes: the data flow in the data flow group has the same protocol type, different terminal port numbers, the same server port number, and the same server IP address; Or the protocol types of the data streams in the data stream group are the same, the terminal port numbers are not exactly the same, the server port numbers are the same, and the server IP address belongs to the same preset IP address group;
  • the relationship between them satisfies the second flow parameter rule, and the second flow parameter rule includes: the data flow in the data flow group has the same protocol type, the server port numbers are not exactly the same, the terminal port number is the same, and the server IP address is the same; or the data flow group
  • the protocol types of the data streams within the same, the server port numbers are not exactly the same,
  • the third-stream parameter rules include: the protocol types of the data streams in the data stream group are the same, the server port numbers are not the same, the terminal port numbers are not the same, and the server IP addresses are the same; The protocol types of the data streams are the same, the server port numbers are not the same, the terminal port numbers are not the same, and the server IP addresses belong to the same preset IP address group.
  • the access behavior between the terminal device and the server can be mined more comprehensively and multi-dimensionally by targeting the server side or the terminal device side or combining the two sides, which is more convenient for subsequent abnormal data flow detection or security rule formulation. Strong applicability.
  • the at least one preset access mode includes a first access mode and a second access mode
  • Obtaining at least one data flow group according to the flow parameter rules of at least one preset access mode and the flow parameters of the multiple data flows includes: determining, based on the flow parameters of the multiple data flows in the first time period, that they belong to the first access mode The data flow group that belongs to the second access mode is determined based on the flow parameters of the remaining data flows.
  • the first device is a management device; the at least one preset access mode further includes a third access mode; the method further includes: the management device removes data streams belonging to The first access mode and the data streams other than the data streams belonging to the data stream group of the second access mode determine the data stream group belonging to the third access mode.
  • the first device is a forwarding device or a bypass device of the forwarding device; the method further includes: the first device acquires group parameters of multiple data flow groups determined within a reporting period; wherein the reporting period The length is greater than the length of the first time period; at least two data stream groups in the multiple data stream groups are combined, and the group parameters of the combined data stream group are determined according to the group parameters of the at least two data stream groups; wherein, the The relationship between the data flows in at least two data flow groups satisfies the first flow parameter rule or the second flow parameter rule.
  • the method further includes: acquiring scattered data streams within the reporting period, where the scattered data streams are any data streams within the reporting period that do not belong to any data stream group within the reporting period ; determine whether the relationship between each scattered data stream and the data stream in a currently existing data stream group satisfies the first stream parameter rule or the second stream parameter rule, if so, then this scattered data stream and this data stream
  • the group (or referred to as the target data stream group of the scattered data stream) is merged, and the group parameter of the merged data stream group is updated according to the stream parameter of the scattered data stream and the group parameter of the target data stream group.
  • the reporting method can effectively reduce the repeated reporting of redundant information and save resource overhead.
  • the first device is a forwarding device or a bypass device of the forwarding device; the method further includes: the management device sending the group parameter of the data flow group determined by the first device.
  • the first device is a management device
  • the flow parameters of the multiple data streams in the first time period come from multiple second devices
  • the multiple second devices include forwarding devices and/or the forwarding Bypass device for the device.
  • the first device is a management device; the management device stores group parameters of the historical data flow group; the method further includes: receiving a query request; the query request is used to indicate query conditions, and the query conditions include to be One or more of the group parameters of the query; determine the query results that meet the query conditions, and send the query results.
  • the group parameter to be queried includes stream support and/or device access support;
  • the query condition further includes a first query threshold and/or a second query threshold, and the first query threshold corresponds to the stream support degree, the second query threshold corresponds to the device access support degree;
  • the query result includes part or all of the group parameters of the data flow group whose flow support meets the first query threshold in the historical data flow group; and/or in the historical data flow group, the data whose flow support meets the second query threshold Some or all of the group parameters for the flow group.
  • the method of generating security rules based on the access behavior of the data stream transmitted on the network can also be realized, which avoids relying solely on manual experience to configure the security rules, and improves the reliability of data access in the network.
  • the forwarding device is a switch or a router or a virtual private network VPN device or a firewall virtual device.
  • the present application provides a method for determining data flow information, the method can be applied to a third device, the method is implemented by the third device, and specifically can be implemented by a component of the third device, such as by the third device.
  • the processing device, circuit, chip and other parts are realized.
  • the method includes: when formulating security rules, acquiring group parameters of a target data flow group, where the group parameters include server IP address, server port number range, terminal port number range, and protocol type; determining security rules according to the group parameters, the security rules including black Lists and/or whitelists; blacklists are used to indicate data flows that need to be intercepted, and whitelists are used to indicate data flows that need to be forwarded.
  • the method of generating security rules based on the access behavior of the data stream transmitted on the network can also be realized, which avoids relying solely on manual experience to configure the security rules, and improves the reliability of data access in the network.
  • the flow support degree of the target data flow group is higher than the first threshold or the device access support degree is higher than the second threshold; the group parameter is used to determine the whitelist; or, the target data flow group The flow support degree is lower than the third threshold or the device access support degree is lower than the fourth threshold, and the group parameter is used to determine the blacklist.
  • the present application provides a system for determining data flow information
  • the system includes at least one first device and at least one management device, wherein the first device may be a forwarding device or a bypass device of the forwarding device.
  • the first device acquires flow parameters of multiple data flows in the first time period, and obtains at least one data flow group based on the flow parameters of the multiple data flows and the flow parameter rules of at least one preset access mode;
  • the flow parameters include: protocol type, terminal port number, server IP address, server port number; then, determine the group parameters of each data flow group, and the group parameters include: server IP address, server port number range, terminal port number range, protocol type; each preset access mode corresponds to a set of preset flow parameter rules; send the statistical results of the first time period to the management device, and the statistical results include: the determined at least one data flow group group parameter.
  • the management device receives a plurality of statistical results from the one or more first devices.
  • the group parameter of the above-mentioned data flow group may be used to identify abnormal data flow or to determine a security rule, wherein the security rule is used to control the forwarding device to forward the data flow.
  • the group parameters may also include, but are not limited to, some or all of the following: terminal IP address set, stream number of data streams, time mode information, access mode identifier, stream support, device access support degree; wherein, the terminal IP address set includes different terminal IP addresses corresponding to the data streams in the data stream group;
  • the number of streams of the data stream refers to the number of data streams contained in the data stream group; 2) the time mode information is used to indicate the preset time mode to which the data stream group belongs, wherein different preset time modes are different from the preset time modes. The time ranges are in one-to-one correspondence; 3) the access mode identifier is used to identify the preset access mode to which the data stream group belongs; 4) the stream support degree is based on the number of data streams in the data stream group and the total number of data streams in the first time period 5) The device access support degree is determined according to the number of terminals corresponding to the data flow group and the total number of terminals determined by the sample data.
  • the sample data refers to all the data streams (stream parameters) on which this data stream group mining is based.
  • the at least one preset access mode includes one or more of the following modes: a first access mode, a second access mode, and a third access mode; wherein, the data flow belonging to the first access mode
  • the relationship between the data flows in the group satisfies the first flow parameter rule, and the first flow parameter rule includes: the data flow in the data flow group has the same protocol type, different terminal port numbers, the same server port number, and the same server IP address; Or the protocol types of the data streams in the data stream group are the same, the terminal port numbers are not exactly the same, the server port numbers are the same, and the server IP address belongs to the same preset IP address group;
  • the relationship between them satisfies the second flow parameter rule, and the second flow parameter rule includes: the data flow in the data flow group has the same protocol type, the server port numbers are not exactly the same, the terminal port number is the same, and the server IP address is the same; or the data flow group
  • the protocol types of the data streams within the same, the server port numbers are not exactly the same,
  • the third-stream parameter rules include: the protocol types of the data streams in the data stream group are the same, the server port numbers are not the same, the terminal port numbers are not the same, and the server IP addresses are the same; The protocol types of the data streams are the same, the server port numbers are not the same, the terminal port numbers are not the same, and the server IP addresses belong to the same preset IP address group.
  • the at least one preset access mode includes a first access mode and a second access mode
  • the first device obtains at least one data flow group according to the flow parameter rule of the at least one preset access mode and the flow parameters of the multiple data flows, including: determining, based on the flow parameters of the multiple data flows in the first time period, determining For the data flow group belonging to the first access mode, the data flow group belonging to the second access mode is determined based on the flow parameters of the remaining data flows.
  • determining the statistical result by the first device includes: the first device acquiring group parameters of multiple data flow groups determined within the reporting period; wherein the length of the reporting period is greater than the length of the first time period; Combining at least two data stream groups in the multiple data stream groups, and determining the group parameters of the merged data stream group according to the group parameters of the at least two data stream groups; wherein, the data streams in the at least two data stream groups The relationship between them satisfies the first flow parameter rule or the second flow parameter rule.
  • the first device determines the statistical result, and further includes: acquiring scattered data streams within the reporting period, wherein the scattered data streams are multiple data streams within the reporting period that do not belong to the reporting period The data flow of any data flow group within; determine whether the relationship between each scattered data flow and the data flow in a currently existing data flow group satisfies the first flow parameter rule or the second flow parameter rule, and if so, Then merge the scattered data stream with the data stream group (or the target data stream group called the scattered data stream), and update the merged data stream according to the stream parameters of the scattered data stream and the group parameter of the target data stream group The group parameter of the group.
  • the statistical result includes the group parameter of each data flow group that is not merged among the multiple data flow groups determined in the reporting period, the group parameter of the merged data flow group, and the remaining data flow groups.
  • the management device receives a plurality of statistical results in a second time period, and based on the plurality of statistical results in the second time period, the plurality of statistical results are At least two data stream groups in the result are merged, and the group parameters of the merged data stream group are determined according to the group parameters of the at least two data stream groups; wherein, the relationship between the data streams in the at least two data stream groups Either the first flow parameter rule is satisfied or the second flow parameter rule is satisfied.
  • At least one preset access mode further includes the third access mode; the statistical result further includes scattered data streams that are not divided into any data stream group; the management device One or more scattered data streams in the multiple statistical results of the The relationship between them satisfies the first flow parameter rule or the second flow parameter rule; the management device determines the data flow group belonging to the third access mode based on the remaining scattered data flows.
  • the management device stores group parameters of the historical data flow group; the method further includes: the management device receives a query request; the query request is used to indicate query conditions, and the query conditions include the group parameters to be queried. One or more of the query results; the management device determines the query results that meet the query conditions, and sends the query results.
  • the group parameter to be queried includes stream support and/or device access support;
  • the query condition further includes a first query threshold and/or a second query threshold, and the first query threshold corresponds to the stream support degree, the second query threshold corresponds to the device access support degree;
  • the query result is included in the historical data flow group, and the flow support degree satisfies some or all of the group parameters of the data flow group of the first query threshold; and/or in the historical data flow group , the device accesses some or all of the group parameters of the data flow group whose degree of support satisfies the second query threshold.
  • the present application provides a system for determining data flow information, the system includes at least one first device and at least one management device, wherein the first device may be a forwarding device or a bypass device of the forwarding device.
  • the first device sends stream parameters of multiple data streams in the first time period to the management device; the stream parameters include: protocol type, terminal port number, server IP address, and server port number;
  • a device receives stream parameters of multiple data streams within a first time period; obtains at least one data stream group based on the stream parameters of the multiple data streams and the stream parameter rules of at least one preset access mode; determines each data stream group parameters of the group; wherein, the flow parameters include: protocol type, terminal port number, server IP address, server port number; group parameters include: server IP address, server port number range, terminal port number range, protocol type; each The preset access mode corresponds to a set of preset flow parameter rules.
  • the group parameter of the above-mentioned data flow group can be used to identify abnormal data flow or to determine a security rule, wherein the security rule is used to control the forwarding device to forward the data flow.
  • the group parameters may also include, but are not limited to, some or all of the following: terminal IP address set, stream number of data streams, time mode information, access mode identifier, stream support, device access support degree; wherein, the terminal IP address set includes different terminal IP addresses corresponding to the data streams in the data stream group;
  • the number of streams of the data stream refers to the number of data streams contained in the data stream group; 2) the time mode information is used to indicate the preset time mode to which the data stream group belongs, wherein different preset time modes are different from the preset time modes. The time ranges are in one-to-one correspondence; 3) the access mode identifier is used to identify the preset access mode to which the data stream group belongs; 4) the stream support degree is based on the number of data streams in the data stream group and the total number of data streams in the first time period 5) The device access support degree is determined according to the number of terminals corresponding to the data flow group and the total number of terminals determined by the sample data.
  • the sample data refers to all the data streams (stream parameters) on which this data stream group mining is based.
  • the at least one preset access mode includes one or more of the following modes: a first access mode, a second access mode, and a third access mode; wherein, the data flow belonging to the first access mode
  • the relationship between the data flows in the group satisfies the first flow parameter rule, and the first flow parameter rule includes: the data flow in the data flow group has the same protocol type, different terminal port numbers, the same server port number, and the same server IP address; Or the protocol types of the data streams in the data stream group are the same, the terminal port numbers are not exactly the same, the server port numbers are the same, and the server IP address belongs to the same preset IP address group;
  • the relationship between them satisfies the second flow parameter rule, and the second flow parameter rule includes: the data flow in the data flow group has the same protocol type, the server port numbers are not exactly the same, the terminal port number is the same, and the server IP address is the same; or the data flow group
  • the protocol types of the data streams within the same, the server port numbers are not exactly the same,
  • the third-stream parameter rules include: the protocol types of the data streams in the data stream group are the same, the server port numbers are not the same, the terminal port numbers are not the same, and the server IP addresses are the same; The protocol types of the data streams are the same, the server port numbers are not the same, the terminal port numbers are not the same, and the server IP addresses belong to the same preset IP address group.
  • the at least one preset access mode includes a first access mode and a second access mode
  • the management device determines a data stream group belonging to the first access mode based on the stream parameters of the multiple data streams in the first time period, and determines that the remaining data streams other than the first access mode belong to the second access mode
  • the data flow group of the schema
  • the at least one preset access mode further includes a third access mode
  • the management device determines the data flow group belonging to the third access mode based on the remaining data flows except for the data flows belonging to the first access mode and the data flow group belonging to the second access mode.
  • the management device stores group parameters of the historical data flow group; the method further includes: receiving a query request; the query request is used to indicate query conditions, and the query conditions include one of the group parameters to be queried. Item or multiple items; determine the query results that meet the query conditions, and send the query results.
  • the group parameter to be queried includes stream support and/or device access support;
  • the query condition further includes a first query threshold and/or a second query threshold, and the first query threshold corresponds to the stream support degree, the second query threshold corresponds to the device access support degree;
  • the query result is included in the historical data flow group, and the flow support degree satisfies some or all of the group parameters of the data flow group of the first query threshold; and/or in the historical data flow group , the device accesses some or all of the group parameters of the data flow group whose degree of support satisfies the second query threshold.
  • the present application provides a system for determining data flow information, the system includes at least one first device and at least one management device, wherein the first device may be a forwarding device or a bypass device of the forwarding device.
  • the first device sends the received data stream to the management device; the management device receives multiple data streams, the multiple data streams are from one or more first devices; determines the stream parameters of each data stream in the multiple data streams , based on the flow parameter rules of at least one preset access mode and the flow parameters of the multiple data flows, obtain at least one data flow group; determine the group parameters of each data flow group; wherein, the flow parameters include: protocol type, terminal port number, server IP address, server port number; group parameters include: server IP address, server port number range, terminal port number range, protocol type; each of the preset access modes corresponds to a set of preset flow parameter rules .
  • the group parameter of the above-mentioned data flow group may be used to identify abnormal data flow or to determine a security rule, wherein the security rule is used to control the forwarding device to forward the data flow.
  • the group parameters may also include, but are not limited to, some or all of the following: terminal IP address set, stream number of data streams, time mode information, access mode identifier, stream support, device access support degree; wherein, the terminal IP address set includes different terminal IP addresses corresponding to the data streams in the data stream group;
  • the number of streams of the data stream refers to the number of data streams contained in the data stream group; 2) the time mode information is used to indicate the preset time mode to which the data stream group belongs, wherein different preset time modes are different from the preset time modes. The time ranges are in one-to-one correspondence; 3) the access mode identifier is used to identify the preset access mode to which the data stream group belongs; 4) the stream support degree is based on the number of data streams in the data stream group and the total number of data streams in the first time period 5) The device access support degree is determined according to the number of terminals corresponding to the data flow group and the total number of terminals determined by the sample data.
  • the sample data refers to all the data streams (stream parameters) on which this data stream group mining is based.
  • the at least one preset access mode includes one or more of the following modes: a first access mode, a second access mode, and a third access mode; wherein, the data flow belonging to the first access mode
  • the relationship between the data flows in the group satisfies the first flow parameter rule, and the first flow parameter rule includes: the data flow in the data flow group has the same protocol type, different terminal port numbers, the same server port number, and the same server IP address; Or the protocol types of the data streams in the data stream group are the same, the terminal port numbers are not exactly the same, the server port numbers are the same, and the server IP address belongs to the same preset IP address group;
  • the relationship between them satisfies the second flow parameter rule, and the second flow parameter rule includes: the data flow in the data flow group has the same protocol type, the server port numbers are not exactly the same, the terminal port number is the same, and the server IP address is the same; or the data flow group
  • the protocol types of the data streams within the same, the server port numbers are not exactly the same,
  • the third-stream parameter rules include: the protocol types of the data streams in the data stream group are the same, the server port numbers are not the same, the terminal port numbers are not the same, and the server IP addresses are the same; The protocol types of the data streams are the same, the server port numbers are not the same, the terminal port numbers are not the same, and the server IP addresses belong to the same preset IP address group.
  • the at least one preset access mode includes a first access mode and a second access mode
  • the management device determines a data stream group belonging to the first access mode based on the stream parameters of the multiple data streams in the first time period, and determines that the remaining data streams other than the first access mode belong to the second access mode
  • the data flow group of the schema
  • the at least one preset access mode further includes a third access mode
  • the management device determines a group of data flows belonging to the third access mode based on the remaining data flows except those belonging to the first access mode and belonging to the second access mode.
  • the management device stores group parameters of the historical data flow group; the method further includes: receiving a query request; the query request is used to indicate query conditions, and the query conditions include one of the group parameters to be queried. Item or multiple items; determine the query results that meet the query conditions, and send the query results.
  • the group parameter to be queried includes stream support and/or device access support;
  • the query condition further includes a first query threshold and/or a second query threshold, and the first query threshold corresponds to the stream support degree, the second query threshold corresponds to the device access support degree;
  • the query result is included in the historical data flow group, and the flow support degree satisfies some or all of the group parameters of the data flow group of the first query threshold; and/or in the historical data flow group , the device accesses some or all of the group parameters of the data flow group whose degree of support satisfies the second query threshold.
  • the present application further provides an apparatus for determining data flow information
  • the apparatus includes a plurality of functional units, and these functional units can perform the functions performed by each step in the method of the first aspect or perform the method of the second aspect The function performed by each step in .
  • These functional units can be implemented by hardware or by software.
  • the apparatus includes an acquisition unit and a processing unit.
  • the device includes an acquisition unit and a determination unit.
  • the present application further provides an apparatus for determining data flow information, the apparatus includes a processor, a memory and a transceiver, wherein program instructions are stored in the memory, and the processor executes the program instructions in the memory , communicate with other devices through the transceiver to implement the method provided in the first aspect or implement the method provided in the second aspect.
  • the present application further provides a device for determining data flow information, the device includes at least one processor and an interface circuit, where the processor is configured to communicate with other devices through the interface circuit, so as to implement the method described in the first aspect.
  • the provided method or the method provided by the second aspect is implemented.
  • the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium is run on a computer, the computer executes the method provided in the first aspect or implements the second aspect. provided method.
  • FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart corresponding to a method for determining data flow information provided by an embodiment of the present application
  • FIG. 3 is a schematic flowchart of a determination process of a data stream group provided by an embodiment of the present application
  • FIG. 4 is a schematic flowchart of a determination process of data flow information provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of the relationship between a reporting period and a statistical period provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart corresponding to another method for determining data flow information provided by an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of another method for determining data flow information provided by an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of a method for determining data flow information according to an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of another method for determining data flow information provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a query scenario provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of another query scenario provided by an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of an apparatus for determining data flow information provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of another apparatus for determining data flow information provided by an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of another apparatus for determining data flow information provided by an embodiment of the present application.
  • FIG. 1 is a schematic diagram of a network architecture to which an embodiment of the present application is applied.
  • the network architecture includes one or more servers (the server 100 is shown as an example in FIG. 1 , but is not limited in this application), one or more forwarding devices (the forwarding devices 200 and 201 are used as the forwarding devices 200 and 201 in FIG. 1 ) exemplified, but not limited in this application), terminal equipment (in FIG. 1, terminal equipment 10, 11, and 12 are shown as examples, but this application does not limit this), and one or more management equipment 300 (in FIG. 1 , the management device 300 is shown as an example, but this application does not limit it).
  • the server 100 is shown as an example in FIG. 1 , but is not limited in this application
  • one or more forwarding devices the forwarding devices 200 and 201 are used as the forwarding devices 200 and 201 in FIG. 1
  • terminal equipment in FIG. 1, terminal equipment 10, 11, and 12 are shown as examples, but this application does not limit this
  • management equipment 300 in FIG. 1 , the management device
  • a terminal device which can be a device with wired or wireless transceiver functions.
  • Terminal equipment which can be referred to as a terminal, can be deployed on land, including indoors, outdoors, and/or hand-held or vehicle-mounted; it can also be deployed on water (such as ships, etc.); it can also be deployed in the air (such as aircraft, balloons, and satellites) superior).
  • the terminal device may be a user equipment (UE), and the UE includes a handheld device, a vehicle-mounted device, a wearable device, or a computing device with a wired communication function or a wireless communication function.
  • UE user equipment
  • the UE may be a mobile phone (mobile phone), a tablet computer, or a computer with a wired transceiving function or a wireless transceiving function.
  • the terminal device may also be a virtual reality (VR) terminal device, an augmented reality (AR) terminal device, a wireless terminal in industrial control, a wireless terminal in unmanned driving, a wireless terminal in telemedicine, intelligent A wireless terminal in a power grid, a wireless terminal in a smart city, and/or a wireless terminal in a smart home, and so on.
  • VR virtual reality
  • AR augmented reality
  • the terminal device may also be an Internet of Things device based on Internet Protocol (Internet Protocol, IP) communication, such as a camera, a printer, an IP phone, an automated teller machine (Automated Teller Machine, ATM), a smart counter, Numbering machines, return order counters, etc.
  • IP Internet Protocol
  • Forwarding devices such as switches, routers, virtual private networks (Virtual Private Network, VPN), firewall virtual devices, etc., are mainly used to forward data streams. Specifically, data streams can be forwarded according to the configured security rules. Forward or block.
  • the security rules configured on different forwarding devices may be different, and the security rules will be introduced below.
  • a server a device for providing one or more services (or functions).
  • the network architecture shown in Figure 1 can be applied to various scenarios, such as financial networks, campus networks, medical networks, and so on.
  • the terminal device may be a surveillance camera, and the server may be a server of a monitoring platform; for another example, the terminal device may be an ATM machine, and the server may be a specific server of a financial institution.
  • it is a service server in a financial network, which can be used to provide specific service functions, such as transfer, deposit, transaction authentication, query service and other functions.
  • Management equipment which is used to configure security rules for forwarding equipment and support functions such as user access. In terms of specific form, it generally refers to the control device (interacting with the device and responsible for managing the device).
  • the management device can be a regional network management device used to manage network devices (such as forwarding devices) in a designated area, or a cloud platform.
  • the cloud platform can manage multiple regional network management devices, of course, it can also directly manage some or all network devices in a designated area.
  • the security analysis functional component can be integrated in a network management device or a cloud platform to implement the method for determining data flow information provided by this application.
  • the management device can also be used to implement security rule issuance.
  • the method for the management device to issue security rules may include: the cloud platform sends the (received) security rules to the network management device, and the network management device then downloads the security rules. sent to the forwarding device.
  • a data stream refers to a group of data packets that interact between two nodes.
  • a data stream consists of multiple data packets.
  • the data stream includes upstream packets. and downstream messages.
  • the data packet sent by the terminal device to the server is called an uplink packet
  • the data packet sent by the server to the terminal device is called a downlink packet.
  • the format of the data message includes a message header and a data part, wherein the data part is used to carry the information to be transmitted, and the message header is used to carry the quintuple information. The quintuple information will be described below. It is introduced, and will not be described here.
  • the network architecture shown in FIG. 1 is introduced as follows.
  • the terminal device when the server 100 is deployed in the enterprise network, the terminal device may be directly deployed inside the enterprise network, such as in the enterprise production network.
  • the terminal device may also be deployed on an external network of the enterprise, and this type of terminal device may access the enterprise network through a VPN or the like, and communicate with the server 100 .
  • the terminal device may be accessed in a wireless manner, or may be accessed in a wired manner, which is not limited in this embodiment of the present application.
  • the terminal device interacts with the server through data packets to request services.
  • the server sends a data packet to the terminal device to provide a service or send a feedback response.
  • a group of data packets exchanged between a terminal and a server is called a data stream.
  • the transmission path of the data stream may also include one or more forwarding devices, and the forwarding device may be used to receive the data stream, and obtain quintuple information by parsing the data packets in the data stream, and then According to the destination IP address in the quintuple information obtained by parsing, the data stream is sent to the device (such as a server) corresponding to the destination IP address.
  • security rules such as blacklists and/or whitelists
  • the whitelist records the information of the data streams that are allowed to be forwarded; the blacklist records the data streams that are not allowed to be forwarded or need to be intercepted.
  • These data streams that need to be intercepted may come from devices that attack servers or terminal devices in the network architecture shown in FIG. 1 . Therefore, after receiving the data stream, the forwarding device is also used to extract the information of the data stream, such as quintuple information, and judge whether the data stream can be forwarded or needs to be intercepted according to the extracted information and security rules. For example, after detecting that the data packet belongs to a data flow that is allowed to pass through the security rules, it is forwarded. Otherwise, the forwarding device intercepts the packet and cannot forward it, preventing the server or terminal device from being illegally attacked.
  • security rules depend on the experience of security administrators, that is, security rules are configured by security administrators based on known viruses or hacking techniques. Not only may the forwarding device be illegally released due to incorrect security rule configuration, but also There may be a possibility that unknown threats cannot be discovered, leading to major security incidents.
  • the access behavior when the terminal device interacts with the server is relatively fixed.
  • the data stream collected by the surveillance camera is usually sent to the server of a specific surveillance platform.
  • Data flow information has great application value and reference significance, for example, it can be used to formulate security rules, or to identify abnormal data flow and other scenarios, which will greatly improve network security.
  • an embodiment of the present application provides a method for determining data flow information.
  • this method by acquiring flow parameters of multiple data flows in a first time period, mining data with flow parameters of these data flows
  • the group parameter of the data stream group is determined based on the stream parameter of each data stream in the data stream group.
  • the access rules of a large number of data flows actually transmitted in the network can be mined, the data flows with the same access rule are regarded as a data flow group, and the group parameters of each data flow group can be determined.
  • security scenarios such as the formulation of security rules or abnormal data flow detection, the existing security work scenarios that rely entirely on experience can be avoided, and the information of the real data flow can be better applied, which can be used to improve the reliability and guarantee of network security. .
  • the method for determining data flow information provided by the embodiment of the present application will be described in detail below.
  • the method can be applied to the network architecture shown in FIG. 1 .
  • the network architecture shown in FIG. 1 is only an example, and the embodiment of the present application
  • the applicable network architecture is not limited.
  • more or less devices can be deployed compared to Figure 1.
  • a firewall can also be deployed under the server, that is, the data forwarded by the forwarding device to the server can also be deployed. It needs to be authenticated by the firewall and then forwarded to the server.
  • FIG. 2 is a flowchart of a method for determining data flow information provided by an embodiment of the present application, and the method may be executed by the forwarding device (eg, a switch or a router or a VPN) in FIG. 1 or a side-connected device or a management device of the forwarding device .
  • the forwarding device eg, a switch or a router or a VPN
  • FIG. 2 the method may include the following steps:
  • Step 201 The forwarding device acquires the flow parameters of each data flow received within N statistical periods, where N is a positive integer.
  • the statistical period here may be configured by other devices such as the management device for the forwarding device, or may be agreed between the management device and the forwarding device through a protocol, or determined in other ways, which are not limited in this embodiment of the present application.
  • the statistical period can be used as the granularity of the statistical data flow. For example, if the statistical period is configured to be 30 minutes, the forwarding device can execute the solution of the present application for determining data flow information based on the data flow detected within every 30 minutes.
  • the number of data streams counted in the period is neither too much nor too little, avoiding the computational burden and delay caused by the excessive amount of sample data, and at the same time, the amount of sample data is not too small, so that the data stream information can be analyzed.
  • the forwarding device can also perform the method of determining data flow information once based on the data flows collected in multiple statistical periods. For example, if the statistical period is 30 minutes, the forwarding device can obtain the data flow of two statistical periods, that is, within 60 minutes. The parameters are used to mine data flow information, or it can also be understood as directly configuring the statistical period to 60 minutes.
  • the forwarding device may perform step 201 by default, or may trigger the execution of step 201 after receiving the startup instruction.
  • the management device or other network device sends a startup instruction to the forwarding device, and the startup instruction is used to instruct the forwarding device to start. Data mining function to perform step 201 .
  • the startup instruction may further include configuration information about the aforementioned statistical period, where the configuration information is used to configure the statistical period for the forwarding device. In this way, the statistical period configured for the forwarding device can be dynamically adjusted, and the adjustment method is more flexible and does not cause much signaling overhead. If the start instruction does not include the configuration information of the statistical period, the forwarding device may perform step 201 based on the last configured statistical period or the statistical period agreed in the protocol or determined in other manners.
  • the start instruction may further include the number of valid times or the valid time, and the number of valid times or the valid time is used to indicate the number of valid times or the valid time when the statistical period takes effect.
  • the forwarding device may execute the solution for determining data flow information of the present application within the effective statistical period, and when the valid times or valid time arrives, the forwarding device turns off the data mining function, so that the forwarding device is in a more energy-saving state.
  • the number of valid times is three, that is, the effective statistical periods are three, and the three effective statistical periods may be three statistical periods after the start instruction is received.
  • Another way of turning off the data mining function is that other devices such as the management device can send an end instruction to instruct the forwarding device to turn off the data mining function.
  • a forwarding device is used as an example to introduce the statistical period.
  • the statistical period configured on different forwarding devices in a network architecture may be different.
  • the statistical period configured on the forwarding device 200 may be 20 minutes
  • the statistical period configured on the forwarding device 201 may be 30 minutes
  • the above-mentioned values of the statistical period are only examples, and this application does not refer to these contents. Not limited.
  • step 201 The implementation process of step 201 is specifically described below by taking a statistical period as an example.
  • the one statistical period is referred to as a first statistical period.
  • the forwarding device receives multiple data streams in the first statistical period, and determines the stream parameters of each received data stream respectively.
  • the stream parameters of the data stream here may include quintuple information and first time information.
  • the first time information may be the time when the forwarding device receives the data stream or may also be a period identifier of the statistical period when the data stream is received.
  • the period identifier can be represented by any moment in the statistical period, for example, the first statistical period is 2020.10.01 15:00-2020.10.01 15:30, then its corresponding period identifier can be 2020.10. 01 15:00; for another example, the period identifier may also be the number of the statistical period.
  • the numbering starts from 1, that is, the number of the first statistical period is 1, and after that, according to the time sequence, the number of each statistical period is incremented by 1, that is, the number of the statistical period recorded by the forwarding device is 1, 2, ... ..., n, n is a positive integer.
  • the number of the statistical period 2020.10.0115:00-2020.10.01 15:30 is 1, and the length of the statistical period is 30, then the number of 2020.10.01 15:30-2020.10.01 16:00 is 2 , 2020.10.01 16:00-2020.10.01 16:30 is numbered 3, and so on.
  • the first time information can also be determined by parsing the packet.
  • the first time information when the first time information is the time when the terminal device or the server sends the data stream, the first time information can be carried in the in the packets of the data flow.
  • the first time information may also be determined by the forwarding device itself, for example, the first time information is the time when the forwarding device receives the data stream, or the period identifier of the statistical period, or the like.
  • the above-mentioned first time information is only an example, and the first time information may also be determined in other ways.
  • the first time information may be the time when the forwarding device determines that the terminal device or the server sends the data stream. This embodiment of the present application This is not limited.
  • the quintuple information is introduced below.
  • a data stream is uniquely identified by a set of quintuple information.
  • the quintuple information includes (sip, sport, dip, dport, protocol), where sip(source ip) identifies the source IP address, and sport(source port) Identifies the source port number, dip(destination ip) identifies the destination IP address, dport(destination port) identifies the destination port number, and protocol (protocol) identifies the protocol type.
  • a data stream contains upstream packets and/or downstream packets, and the multiple data items contained in the upstream packets and downstream packets of the same data stream are the same, but the order is somewhat different.
  • the quintuple information corresponding to the upstream packet is (clientIP, clientPort, serverIP, serverPort, TCP), where the value of sip is clientIP, and the value of sport is clientPort , the value of dip is serverIP, the value of dport is serverPort, and the value of protocol is TCP.
  • the data packet from the server to the terminal is the following packet, and the quintuple information corresponding to the downstream packet is (serverIP, serverPort, clientIP, clientPort, TCP), where the value of sip is serverIP, and the value of sport is serverPort.
  • the value of dip is clientIP
  • the value of dport is clientPort
  • the value of protocol is TCP.
  • the above protocol type is only an example, and may also be a User Datagram Protocol (User Datagram Protocol, UDP), which is not limited in this embodiment of the present application.
  • the forwarding device may analyze the data packets of the data stream to obtain quintuple information of the data stream, and determine and record the stream parameters of the data stream. As shown in Table 1 below, Table 1 exemplarily shows the flow parameters of the data flow recorded by the forwarding device in the first statistical period.
  • the object for recording flow parameters (eg, Table 1) is referred to as a flow record table.
  • Table 1 the form shown in Table 1 is only an example, and the embodiment of the present application does not limit the recording form of the stream parameters of the data stream.
  • the forwarding device may also record the sip, sport, dip, dport, and protocol of the data stream. , that is, the entries of the flow record table include sip, sport, dip, dport, and protocol items, but do not directly reflect the server and terminal devices.
  • the forwarding device can count the quintuple information of the data flow based on the same rule.
  • sip is the terminal IP address
  • sport is the terminal port number
  • dip is the server IP address
  • dport is the server port number for statistics.
  • the forwarding device Based on this, if the forwarding device receives the first data packet of data flow A as an uplink packet, it directly records the sip in the uplink packet to sip, the sport to sport, the dip to dip, and the dport Log to dport. Subsequently, other packets (downlink packets and/or uplink packets) belonging to the same data flow A can be ignored, that is, the same data flow does not need to be repeatedly recorded.
  • the forwarding device receives the first data packet of data stream B as a downlink packet, since sip in the downlink packet is the server IP address, sport is the server port number, dip is the terminal IP address, and dport is the terminal port number, so
  • the terminal IP address (dip) in the downlink message can be recorded in the sip of the flow record table, and the terminal port number (dport) can be recorded in the flow record table.
  • the server IP address (sip) to the dip of the flow record table
  • sports record the server port number (sport) to the dport of the flow record table.
  • the statistical data streams in this embodiment of the present application may be all data streams received by the forwarding device, and it is not necessary to distinguish whether the data streams need to be forwarded or intercepted.
  • Step 202 The forwarding device obtains at least one data flow group according to the flow parameters of the data flow collected in the first statistical period and the flow parameter rules of one or more preset access modes.
  • a preset access mode corresponds to a preset flow parameter rule.
  • the preset access mode includes one or more of a first access mode, a second access mode, and a third access mode. It should be understood that these three access modes are only for illustration, and the embodiments of the present application do not limit the types and quantities of the preset access modes. The three access modes are described in detail as follows.
  • the preset flow parameter rule corresponding to the first access mode is referred to as the first flow parameter rule as follows.
  • the relationship between the data flows in a data flow group belonging to the first access mode satisfies the first flow parameter rule.
  • the first flow parameter rule includes: the data flows in the same data flow group have the same protocol type, the terminal port The number is not fixed, the server IP address is fixed, and the server port number is fixed.
  • the "fixed” here can be understood as unchanged or identical or the value does not fluctuate.
  • the server IP address of data stream 1 is 10.1.0.100
  • the server IP address of data stream 2 is 10.1.0.100
  • the server IP address of data stream 3 is 10.1.
  • the server IP address is 10.1.0.100, it can be said that the server IP addresses of data stream 1, data stream 2 and data stream 3 are fixed (same).
  • the "not fixed” here can be understood as the value fluctuates, or is completely different, or not exactly the same.
  • the terminal IP address of data stream 1 is 192.168.1.100, and the terminal IP address of data stream 2 is 192.168.1.101.
  • the terminal IP address of data stream 3 is 192.168.1.102, it can be said that the terminal IP addresses of data stream 1, data stream 2 and data stream 3 are not fixed.
  • the first flow parameter rule includes: the data flows in the data flow group have the same protocol type, the terminal IP address is not fixed, the terminal port number is not fixed, and the server IP address belongs to the same preset IP address group.
  • a group of servers that provide the same service or function is set.
  • the terminal device initiates a service invocation request, it can access any server in the group of servers, and different server IP addresses in the group of servers.
  • the addresses make up the IP address group. Therefore, if the protocol types of multiple data streams are the same, the terminal IP address is not fixed, the terminal port number is not fixed, and the server IP addresses are different but belong to the same group of server IP addresses, it can be considered that the first data stream rule is satisfied.
  • the preset There may be more than one IP address group, which is not limited in this application, and the similarities below will not be repeated.
  • the priority of the same preset IP address group is higher than that of individual IP addresses.
  • a separate data stream group is not generated.
  • the preset IP address The address group includes 10.0.1.10 and 10.0.1.11.
  • the current statistics of the flow parameters of the data flow 11 include: the server IP address is 10.0.1.10, the server port number is 80, the terminal port number is 45530, the protocol type is TCP, and the data flow 12
  • the stream parameters of the data stream include: the server IP address is 10.0.1.11, the server port number is 80, the terminal port number is 45531, the protocol type is TCP
  • the stream parameters of the data stream 13 include: the server IP address is 10.0.1.11, and the server port number is 80, the terminal port number is 45532, and the protocol type is TCP, then data stream 11, data stream 12, and data stream 13 satisfy the first stream parameter rule, are the same data stream group, and do not generate separate data streams for data stream 1 and data stream 2 Group.
  • the preset flow parameter rule corresponding to the second access mode is referred to as the second flow parameter rule as follows.
  • the relationship between the data flows in a data flow group belonging to the second access mode satisfies the first flow parameter rule.
  • the second flow parameter rule includes: the data flows in the data flow group have the same protocol type, the terminal port or the data streams in the data stream group are of the same protocol type, the terminal port number is fixed, the server IP address belongs to the same preset IP address group, and the server port number is not fixed.
  • the corresponding preset flow parameter rule in the third access mode is referred to as the third flow parameter rule as follows.
  • the relationship between the data flows in a data flow group belonging to the third access mode satisfies the third flow parameter rule.
  • the third flow parameter rule includes: the data flows in the data flow group have the same protocol type, the terminal The port number is not fixed, the server IP address is fixed, and the server port number is not fixed; or the data streams in the data stream group have the same protocol type, the terminal port number is not fixed, the server IP address belongs to the same preset IP address group, and the server port number Not fixed.
  • the server IP address in each flow parameter rule is fixed as an example for description. It should be understood that, in the same statistical period, there may be multiple independent data flow groups in the same access mode, and the multiple data flow groups belong to the same access mode but all the data flows included in the multiple data flow groups. The relationship between them does not satisfy the same flow parameter rule.
  • both data flow group 1 and data flow group 2 belong to the first access mode, wherein the server IP addresses of the data flows in data flow group 1 are all 10.0.
  • the port number is 80, the protocol type is TCP, and the terminal port number is not fixed.
  • the server IP addresses of the data flows in data flow group 2 are all 10.0.1.2, the server port numbers are all 90, the protocol types are all TCP, and the terminal port numbers are not fixed.
  • the preset access mode includes the first access mode, the second access mode, and the third access mode
  • the data flow groups belonging to the second access mode are continuously mined based on the flow parameters of the remaining data flows.
  • the executing device is a management device, it may continue to mine the data flow group belonging to the third access mode based on any remaining data flow that is not classified into the current data flow group, which will be described below. If the executing device is a forwarding device, the data flow group belonging to the third access mode may not be mined, or the preset access mode on the forwarding device does not include the third access mode.
  • FIG. 3 shows a schematic diagram of a process of mining (determining) a data flow group of a forwarding device.
  • the process includes the following steps:
  • Step 300 Based on the total flow record table, group the data flow according to the first flow parameter rule corresponding to the first access mode; specifically, group the data flow according to the server IP address + server port number + protocol type to obtain at least one initial group.
  • the server IP addresses of the data streams in each initial packet are the same, the server port numbers are the same, and the protocol types are the same.
  • the total flow record table here can be understood as a record table for recording flow parameters of all data flows in the first statistical period, for example, Table 1. It should be understood that if the forwarding device performs one data flow group mining based on data flows counted in multiple statistical periods, the total flow record table is a record table of flow parameters of all data flows counted in the multiple statistical periods.
  • a data stream may also be divided into an initial group, that is, when grouping is performed in step 300, the number of data streams in the initial group may not be limited.
  • Table 2 is an initial grouping determined on the basis of Table 1 and according to the foregoing grouping conditions (server IP address+server port number+protocol type).
  • the grouping conditions may also be based on the above-listed conditions, adding a condition that each initial grouping includes at least two data streams, so as to determine the initial grouping, so that It cannot be used as an initial packet for a single data stream.
  • Step 301 determine whether the total number of flows in the initial group is greater than a preset threshold, and if so, perform step 302 .
  • the preset threshold may be 1. It should be understood that a data stream cannot determine which stream parameter rule the data stream satisfies. Therefore, the initial packet can be filtered according to step 301, and the number of streams in the initial packet is 1. the initial grouping. It should be noted that, if the preset threshold is 1, and the grouping condition further includes a condition that an initial group includes at least two data streams, step 301 may not be performed. If the grouping condition does not limit the number of data streams in the initial grouping, step 301 is executed. It should be noted that the above preset threshold value of 1 is only an example, and the value of the preset threshold value is not limited in this embodiment of the present application.
  • it can also be any positive integer such as 10 or 20, which means that an initial group When the number of data streams in the data stream is small, it can be uncertain which access mode the initial group belongs to.
  • This method can reduce the amount of computation of the execution subject on the basis of improving the accuracy of the access behavior reflected by the mining data stream.
  • Step 302 Determine whether the terminal port number of the data stream in the initial packet is not fixed; if so, determine that the initial packet belongs to the data stream group of the first access mode (see step 303).
  • whether the terminal port number is not fixed can be determined by judging whether the value of the terminal port number fluctuates, or whether the fluctuation of the value of the terminal port number of the data flow in the initial packet is 0, if not 0, the The value of the terminal port number fluctuates, or the terminal port number is not fixed.
  • steps 301 to 302 may be cyclically executed steps. For example, as shown in Table 2, steps 301 to 302 may be executed for combination 1 first; then, steps 301 to 302 may be executed for combination 2 , and so on, until all initial groups are judged (ie, step 302). If the terminal port number of the data flow in the initial group is not fixed, it means that the data flow in the initial group satisfies the first flow parameter rule, and the initial group is a data flow group belonging to the first access mode; otherwise, the initial group is determined If it does not belong to the first access mode, when all the initial groupings are completed, go to step 304. It should be understood that, for the determined initial groups that do not belong to the first access mode, the data flows in these initial groups will continue to participate in the subsequent data flow group mining process.
  • Step 304 Clean the total flow record table to remove flow parameters of the data flow belonging to the data flow group of the first access mode.
  • the stream parameters of the data streams in the data stream group belonging to the first access mode in Table 1 are removed to obtain the stream parameters of the remaining data streams.
  • Step 305 Based on the flow parameters of the remaining data flows, group the data flows according to the second flow parameter rules corresponding to the second access mode; specifically, group the data flows according to the server IP address+terminal port number+protocol type to obtain at least one initial combination.
  • step 305 For the specific execution steps of step 305, please refer to the relevant description of step 300, which will not be repeated here. It should be understood that the difference between step 305 and step 300 is that the grouping conditions of the two are different. It should be noted that the initial grouping determined in step 305 is different from the initial grouping determined in step 300. For the convenience of distinction, the initial grouping determined in step 300 may also be referred to as the first initial grouping, and the initial grouping determined in step 305 may be referred to as the first initial grouping. is called the second initial grouping.
  • Step 306 determine whether the total number of flows in the second initial group is greater than a preset threshold, and if so, perform step 307 .
  • step 306 is an optional step. If in step 305, if the grouping condition further includes a condition that the second initial group includes at least two data streams, step 306 may not be performed. If the grouping condition does not limit the number of data streams in the second initial grouping, step 306 is executed.
  • Step 307 Determine whether the server port number of the data stream in the second initial group is not fixed; if so, determine that the second initial group is a data stream group belonging to the second access mode (see step 308).
  • FIG. 3 it is a schematic diagram of each data flow group obtained by mining the flow parameters of the data flow shown in Table 1 based on the method shown in FIG. 3 .
  • the two data streams displayed in the last two rows in the above table 3 do not belong to any current data stream group.
  • a set of scattered data streams constitutes a set of scattered data streams.
  • the column of the data flow group in Table 3 is an optional column, which is only for the convenience of describing the data flow group to which the data flow in Table 1 belongs.
  • the group of the data flow group determined by the forwarding device The index of the data stream group may not be included in the parameter, and the index of the data stream group may be used to store the group parameter when storing the parameter.
  • Step 203 For any data flow group, the forwarding device determines the group parameter of the data flow group.
  • the group parameters include but are not limited to: protocol type, server IP address, server port number range, terminal port number range; All: IP address set of the terminal, second time information (or referred to as time mode information), flow number, access mode identifier, number of terminals, flow support, and device access support.
  • the server port number range in the group parameter of a data flow group is determined according to the flow parameters of the data flow in the data flow group. Specifically, the lower limit of the server port number range is the number in the data flow in the data flow group. The minimum value of the server port number. Correspondingly, the upper limit of the server port number range is the maximum value of the server port number in the data flow in the data flow group.
  • the terminal port number range in the group parameter of a data flow group is determined according to the flow parameters of the data flow in the data flow group. Specifically, the lower limit of the terminal port number range is the data in the same data flow group. The minimum value of the terminal port number in the stream, correspondingly, the upper limit of the range of the terminal port number is the maximum value of the terminal port number in the data stream in the data stream group;
  • the minimum value of the server port number and the maximum value of the server port number are the same.
  • the minimum value of the terminal port number and the maximum value of the terminal port number are the same.
  • the IP address set of the terminal including all different terminal IP addresses corresponding to the data flow in the data flow group, for example, 192.168.1.100, 192.168.1.102, 192.168.1.103.
  • the symbol ⁇ is used to represent consecutive IP addresses.
  • the above example may also be represented as 192.168.1.100, 192.168.1.102-103, or 192.18.1.100
  • the IP address of the same terminal in the same data stream group is recorded only once, that is to say, the IP address set of the terminal is obtained by deduplicating all the IP addresses corresponding to the data streams of the data stream group.
  • the number of terminals is related to the IP address set of the terminal, and may be the number of different IP addresses included in the IP address set of the terminal, that is, the number of terminals with different IP addresses in the data flow group. For example, if the IP address set of the terminal includes 192.168.1.100
  • the second time information is used to indicate the time information corresponding to the statistical period.
  • the second time information may be the first time information of the data flow group, or it may be used to indicate the preset time range to which the data flow group belongs, or the first time information
  • the preset time range in which the statistical period is located, specifically, the preset time period to which the data flow group belongs may be determined according to the first time information of the data flow group.
  • the second time information can be the identifier corresponding to the preset time range, for example, the identifier of the preset time range 1 is 1, and the identifier of the preset time range 2 is 2, if the first statistical period is 2020.10.01 15:00-2020.10. 01 15:30 (that is, the first time information), then the first statistical period belongs to the preset time range 1, and correspondingly, the second time information is 1.
  • the rationality of the data flow can be better distinguished. If the terminal accesses the server that only provides services during working hours during non-working hours, it is likely to be illegal access, which is conducive to mining the normal access data flow. characteristics, and/or characteristics of abnormal access data streams.
  • the preset time range in the above configuration is only an example, which is not limited in this embodiment of the present application.
  • a more fine-grained time range may also be divided.
  • the preset time range includes 0:00-6:00, 6:00 :00-12:00, 12:00-18:00, 18:00-24:00, correspondingly, the identifiers corresponding to the four time ranges may be 1, 2, 3, and 4.
  • the identifiers corresponding to the above preset time ranges and preset time ranges are only examples, and the identifiers may also be represented by other means, such as being represented by one or more of numbers, letters, and symbols.
  • the embodiment is not limited. It should be noted that the preset time range here does not distinguish between dates, and only focuses on time, that is, the same time on different dates belongs to the same preset time range.
  • the time information in the stream parameter of the data stream above is recorded as the first time information
  • the time information in the group parameter is recorded as the second time information.
  • the first, second, and other numeral numbers involved in the present application are only for the convenience of description, and are not used to limit the scope or sequence of the embodiments of the present application.
  • description will be given by taking as an example that the identifiers of the second time information introduced above include 1 and 2.
  • the number of streams used to indicate the number of data streams contained in the data stream group. For example, taking Table 2 as an example, the number of streams in the data stream group in the first access mode is 3, the second access mode and the third access mode. The number of streams in the data stream groups below is 2 respectively.
  • the access mode identifier which is the identifier of the preset access mode, is used to indicate the preset access mode to which the data stream group belongs.
  • the access mode identifiers of the first access mode, the second access mode, and the third access mode in the above can be 1, 2, and 3, respectively.
  • any identifier in the embodiments of the present application may also have other representations, such as being represented by one or more of numbers, letters, and symbols, which are not limited in the embodiments of the present application.
  • the group parameter determined by the forwarding device or a device attached to the forwarding device does not include the flow support degree and the device access support degree, and the two parameters will be described in detail below.
  • the forwarding device determines the group parameter of the data flow group, and records the group parameter according to the preset format.
  • the forwarding device determines that the group parameters of each data flow group include: protocol type, server IP address, server port number range, terminal port number range, terminal IP address set, second time information, flow number, access Pattern ID.
  • the preset format of the group parameter of the data flow group may be: [server IP, server port number, terminal port number, minimum value of port number, maximum value of port number, protocol type, number of flows, set of IP addresses of the terminal , second time information, access mode identification].
  • Table 4 shows the group parameters of each data stream group in a preset format obtained on the basis of Table 3.
  • the minimum value of the port number and the maximum value of the port number can be used to indicate the range of server port numbers or the range of terminal port numbers.
  • the individual server port number can be represented by -1; if it is used to indicate the terminal port number range, the individual terminal port number can be represented by -1, where -1 represents an invalid value.
  • the terminal port number of data flow group 1 is -1, which means that the terminal port number of this data flow group is not fixed.
  • the minimum value of the terminal port number is 45527, and the maximum value of the terminal port number is 45529.
  • the server port number of data flow group 3 is -1, indicating that the server port number of this data flow group is not fixed, the minimum server port number is 8080, and the maximum server port number is 8081. Identifies fragmented data streams with an access mode identifier of -1.
  • step 202 to step 203 an achievable manner is to determine the group parameters of the data stream group together after a part or all of the data stream group is determined.
  • step 202 and step 203 may also be combined into one step, that is, the forwarding device determines the group parameter of the data stream group when determining the data stream group in step 202, for example, see FIG. 4.
  • Fig. 4 shows another schematic flow chart of a data mining method, wherein the steps shown in Fig. 4 and the steps in Fig. 3 are not repeated, and only the differences are described below:
  • Step 403 determine the The first initial grouping is a data flow group belonging to the first access mode, and the group parameters of the data flow group are recorded according to a preset format.
  • Step 408 Determine that the second initial group is a data stream group belonging to the second access mode, and record group parameters of the data stream group according to a preset format.
  • the granularity of data flow information mining performed by the forwarding device is one statistical period, that is, the forwarding device performs mining based on the flow parameters of the data flow in one statistical period each time, and obtains the statistical results of the statistical period, that is, one statistical period. corresponding to a statistical result.
  • the statistical result may include group parameters of a data flow group determined by flow parameters of multiple data flows in at least one statistical period, or group parameters of a data flow group determined by flow parameters of multiple data flows in at least one statistical period and the flow parameters of the determined fragmented data flow.
  • Step 204 the forwarding device sends at least one statistical result to the management device, and correspondingly, the management device receives the at least one statistical result sent by the forwarding device.
  • the forwarding device may report the group parameters of each data flow group shown in Table 4 to the management device.
  • the forwarding device may directly report the flow parameters of the scattered data flow.
  • the forwarding device may also align the reporting format of the group parameter of the data flow group, generate the "group parameter" of the scattered data flow according to the preset format of the group parameter, and report the "group parameter" of the scattered data flow, see Table 4. It should be understood that the "group parameter" of the scattered data stream is only used to indicate the reporting information of the flow parameter of the scattered data stream generated according to the preset format of the group parameter and the flow parameter of the scattered data stream and used for reporting to the management device.
  • the fragmented data stream is a data stream group.
  • the group parameter of scattered data flow in the following.
  • the forwarding device does not need to report the statistical results to the management device, or does not need to report the flow parameters of the scattered data flow to the management device, it can also not be used. Then, the flow parameters of the scattered data flow are processed. The following assumes that the statistical results include group parameters for scattered data streams.
  • the forwarding device determines the group parameters of the data flow group, it can report these group parameters to the management device, and subsequently, these group parameters are used to determine the security rules.
  • the forwarding device can directly report the statistical results of each statistical period to the management device, that is, after performing step 203, without waiting, the at least one data flow group determined in step 203 can be immediately sent to the management device.
  • the group parameters of scattered data streams are reported to the management device, reducing the delay for the group parameters to reach the management device.
  • the forwarding device may report to the management device based on the configured reporting period. That is, the forwarding device may execute Steps 201 to 203 multiple times in the reporting period, and the statistical period for each execution is different. It should be understood that each time Steps 201 to 203 are executed, a statistical result can be obtained.
  • FIG. 5 shows a schematic diagram of a reporting scenario based on a reporting period.
  • the forwarding device may separately report multiple statistical results obtained within the reporting period together. For example, if the reporting period includes m statistical periods, and the m statistical periods correspond to m statistical results, the forwarding device may report the m statistical results together.
  • the forwarding device may further process the m statistical results. Exemplarily, the processing methods of the m statistical results are described in detail as follows.
  • the same data flow group in the multiple data flow groups belonging to the first access mode in the m statistical results where the same data flow group refers to the data flow included in at least two data flow groups.
  • the relationship between them satisfies the first flow parameter rule, and the at least two data flow groups are combined.
  • the server IP address is 10.0.0.1
  • the server port number is 80
  • the terminal port number is not fixed.
  • the two data stream groups are the same data stream group, it should be understood that some items in the group parameter of the same data stream group may have Differences, such as different numbers of data streams, different sets of terminal IP addresses, and so on. It should be understood that at least two data flow groups belonging to the same data flow group exist in different statistical results. Details are described below. Similarly, the same data flow group among the multiple data flow groups belonging to the second access mode in the m statistical results is determined, and the at least two data flow groups are combined.
  • the results of the given two statistical periods are statistical result 1 and statistical result 2 respectively. It is assumed that statistical result 1 is shown in Table 4 above, and statistical result 2 is shown in Table 5 below.
  • the statistical result 1 includes data flow group 1, and the group parameter information of the preset format corresponding to the data flow group 1 is [10.1.0.100, 80, -1 , 45527, 45529, TCP, 3, 192.18.1.100
  • the statistical result 2 includes a data flow group 11, and the group parameter information in the preset format corresponding to the data flow group 11 is [10.1.0.100, 80, -1, 45523, 45528, TCP, 5, 192.18.1.100
  • the data flow group 1 and the data flow group 11 have the same protocol type, the same server IP address, the server port number is fixed, and the terminal port number is not fixed. Therefore, determine the difference between the data flow group 1 and the data flow group 11.
  • the relationship satisfies the first flow parameter rule, and the data flow group 1 and the data flow group 11 belong to the same data flow group.
  • the data flow group 1 and the data flow group 11 belong to the same data flow group.
  • the forwarding device performs screening based on Table 4 and Table 5, and screens out multiple data flow groups that satisfy the same flow parameter rule.
  • the screening results include: 1) Data flow group 1 and data flow group 11 satisfy the same first flow parameter rule , that is, the data flow group 1 and the data flow group 11 belong to the same data flow group. 2) Data flow group 3 and data flow 12 meet the same second flow parameter rules, in which the server IP address is fixed at 10.1.0.101, the terminal port number is fixed at 55555, the protocol type is the same as TCP, and the terminal port number is not fixed , that is, the data flow group 3 and the data flow group 12 belong to the same data flow group.
  • the forwarding device merges multiple data flow groups belonging to the same data flow group, specifically: merges the group parameters of the multiple data flow groups, wherein the operation of updating the group parameters of the merged data flow group includes: : Sums up the number of data streams, updates the range of port numbers, and merges and deduplicates the IP address sets of the terminals.
  • the number of combined flows the number of flows in data flow group A + the number of flows in data flow group B +,..., The number of streams in data stream group N.
  • the minimum value of the merged server port number is the minimum value of the server port number in the data flow group A, data flow group B, ..., and data flow group N; the maximum value of the server port number is the data flow group A, data flow group B, ..., and the maximum server port number in stream group N.
  • the minimum value of the combined terminal port number is the minimum value of the terminal port number in the data flow group A, data flow group B, ..., and the data flow group N; the maximum value of the terminal port number is in the data flow group A, Maximum number of terminal port numbers in stream group B, ..., and stream group N.
  • the terminal's IP address set includes all non-repetitive (or different) terminal IP addresses in data flow group A, data flow group B, ..., and data flow N, for example, the terminal IP address set of data flow group 1 includes 192.168 .1.100
  • the above port number range update method is only an introduction.
  • the server port number of some data flow groups may be fixed, or the terminal port number may be fixed. If it is fixed, it does not need to be updated.
  • Table 6 shows the group parameters of each group of parameters after combined processing based on Tables 4 and 5.
  • the data flow group 1 and the data flow group 11 are combined and recorded as the data flow group 1a; the data flow group 3 and the data flow group 12 are combined and recorded as the data flow group 2a.
  • the forwarding device may not process them. Subsequently, for the reporting period, the forwarding device only needs to report each set of parameters shown in Report 6, and this reporting method can effectively reduce the repeated reporting of redundant information and save resource overhead.
  • the forwarding device as the executing device as an example.
  • the executing device in the method may also be other devices.
  • the other devices may be A device attached to the forwarding device, such as a network probe, wherein the network probe used to listen for network data packets is called an Internet probe. Network packet capture, filtering, and analysis can all be implemented on the network probe.
  • the operation process when the method for determining data flow information of the present application is executed with the network probe as the execution body includes: after the forwarding device receives the data packet, two operations are performed in parallel: including operation 1, the data packet is normally processed If the forwarding is allowed, the packet is forwarded, otherwise the packet is intercepted. Operation 2: Copy the data packet to obtain a copy of the data packet, and mirror (or forward) the copy of the data packet to the network probe through a specific port number (called a mirror port number) on the forwarding device. Needle. Subsequent network probes determine and record the flow parameters of the data flow according to the received packets, and perform subsequent operations such as mining data flow groups and determining group parameters.
  • the group parameters determined by the forwarding device are only examples. It should be understood that the group parameters configured on different devices may be different.
  • the group parameters determined by the forwarding device are the group parameters shown in Table 4, while The group parameters determined by the management device may have more or less data items than those in Table 4.
  • the group parameters determined by the management device may also include stream support and/or device access support and the like.
  • the management device may receive group parameters reported by one or more forwarding devices.
  • the management device 300 may receive the group parameters reported by the forwarding device 200 and the forwarding device 201 at the same time. The parameters are processed again to mine the access rules of the data flow of the global network shown in Figure 1.
  • FIG. 6 is a flowchart of another method for determining data flow information provided by an embodiment of the present application, and the method may be executed by the management device in FIG. 1 . As shown in Figure 6, the method may include:
  • Step 601 The management device receives the first set of parameters sent by one or more first devices.
  • the group parameters received by the management device are referred to as the first group parameters, and the group parameters determined by the management device itself are referred to as the second group parameters hereinafter.
  • the first group parameters are those in step 204 in FIG. 2 above.
  • the related introduction of the group parameters determined by the forwarding device will not be repeated here.
  • the management device here may be a network management device integrating security analysis function components or a cloud platform integrating security analysis function components
  • the corresponding first device here may be a forwarding device or a forwarding device.
  • Bypass device for the device.
  • the forwarding device can send each first set of parameters determined within a reporting period to the network management device or cloud platform that integrates the security analysis functional component, that is, the network management device or cloud platform can receive one or more forwarding parameters. The first set of parameters reported by the device.
  • Step 602 The management device processes the first set of parameters sent by the one or more first devices.
  • the management device may perform step 602 according to the configured aggregation period.
  • the reporting period of the forwarding device is 1 hour
  • the aggregation period of the management device may be 4 hours, 1 week, or 1 month, and so on. Assuming it is one week, the management device can save the first set of parameters reported by all forwarding devices received within one week. When the convergence time is reached, the management device can use the multiple first set of parameters reported by multiple forwarding devices. Perform merge processing and/or data flow group mining.
  • the length of the statistics period may be different between different forwarding devices or bypass devices, and the length of the reporting period may also be different, but the configuration on different forwarding devices, bypass devices and management devices
  • the second time information can be unified.
  • the identifiers of the second time information on different devices are 1 and 2, and the identifier 1 represents 8:00-17:00, and the identifier 2 represents 17:00-the next day 8 :00.
  • the management device may group multiple first group parameters according to the second time information of the data flow group, and use multiple first group parameters with the same second time information as one.
  • the same second time information means that the data streams of the multiple data stream groups appear within the same preset time range, for example, all appear within 8:00-17:00. Subsequently, the first group of parameters as the same group will be merged and other processing.
  • the second time information is divided into 1 and 2 according to working time and non-working time, then the first set of parameters of all data flow groups during working time (that is, the second time information is 1) can be taken as a group, so that It is helpful to analyze the access rules of data flow during working hours.
  • the first group of parameters of all data streams during non-working hours is taken as a group to analyze the access rules of data streams during non-working hours.
  • the management device performs processing based on the first set of parameters of multiple data flow groups with the same second time information, for example, combines the first set of parameters of the same data flow group.
  • the management The device also processes scattered data streams, and mines data stream groups belonging to the third access mode based on the scattered data streams. The following describes how the management device processes multiple first group parameters in the aggregation period:
  • Processing method 1 first perform merge processing, then perform data stream cleaning, and finally perform data stream group mining.
  • merge processing means that the management device groups multiple data streams belonging to the first access mode based on multiple first group parameters in the aggregation period (with the same second time information). Grouping is performed, and multiple data stream groups whose data streams satisfy the same first stream parameter rule are regarded as a group, and subsequently, multiple data stream groups in the same group are merged, and the first group parameters of the merged data stream group are updated. Similarly, the merging processing method for multiple data flow groups belonging to the second access mode is the same. For details, refer to the above-mentioned method flow of the forwarding device merging multiple statistical results in the statistical period, which will not be repeated here.
  • the first group of parameters in the aggregation period includes Table 6, and it is determined according to Table 6 that the existing data flow group belonging to the first access mode or the second access mode includes: data flow group 1a , data flow group 2, data flow group 2a, determine whether there is a scattered data flow belonging to data flow group 1a or data flow group 2 or data flow group 2a in the scattered data flow in Table 6, if there is, then the scattered data
  • the streams are merged into their corresponding data stream groups, and the first set of parameter records for the fragmented data stream is cleaned up.
  • the way of judging whether the scattered data flow belongs to a certain data flow group may be: judging whether the scattered data flow satisfies the flow parameter rule corresponding to the data flow group, for example, the data flow group 1a belongs to the first access mode, and the data flow group 1a belongs to the first access mode.
  • the corresponding flow parameter rule is that the server IP address is fixed at 10.10.1.100, the server port number is fixed at 80, the protocol type is TCP, and the terminal port number is not fixed. If the port number is 80 and the protocol type is TCP, then it is determined that the scattered data flow belongs to the data flow group 1a. You can refer to the relevant introduction of determining whether multiple data flow groups belong to the same data flow group in the above merging process, which will not be repeated this time. .
  • data stream cleaning is performed in conjunction with Table 6, wherein the scattered data streams shown in the last row of Table 6 satisfy the flow parameter rules of data stream group 2, and the scattered data streams are merged into data stream group 2, and according to the data stream group 2
  • the first group parameter of the scattered data stream group updates the group parameter of the data stream group 2, and the records of the scattered data stream are cleaned. It should be understood that the group parameters may not have changed after the update. See Table 7, which shows the group parameters of the cleaned data stream group.
  • each scattered data stream needs to match each data stream group indicated in the plurality of first group parameters.
  • the flow parameter rules are compared to determine whether the scattered data flow can be merged into the current data flow group.
  • the process of data flow group mining includes: performing data mining again based on the remaining scattered data flows after cleaning the original samples, possibly mining new data flow groups belonging to the first access mode, or possibly mining new data flow groups.
  • the data flow group belonging to the second access mode, and the first set of parameters of these new data flow groups are determined respectively, and the new data flow group belonging to the first access mode and the new data belonging to the second access mode are dug out in sequence.
  • FIG. 7 shows the complete flow of the above processing method.
  • the process of mining the data flow groups of the first access mode and the second access mode in FIG. 7 is similar to the related process in FIG. 3 or FIG. 4 , here No longer.
  • the process includes:
  • Step 700 Receive multiple first group parameters reported by the forwarding device a to the forwarding device n within the aggregation time period.
  • Step 701a Among the plurality of first group parameters, select a data stream group whose mode identifier is 1.
  • Step 702a Grouping is performed according to the same protocol type + the same server IP address + the same server port number + the terminal port number is not fixed + the second time information is the same.
  • Step 703a Combine multiple data flow groups belonging to the same group, and update the first set of parameters of the combined data flow group.
  • Step 701b Among the plurality of first group parameters, select a data stream group whose mode identifier is 2.
  • Step 702b Grouping is performed according to the same protocol type + the same server IP address + the server port number is not fixed + the terminal port number is fixed + the second time information is the same.
  • Step 703b Combine multiple data flow groups of the same group, and update the first set of parameters of the combined data flow group.
  • Step 704 Determine whether any scattered data stream belongs to a currently existing data stream group.
  • Step 705 Merge the scattered data stream into the data stream group to which it belongs, and update the first set of parameters of the data stream group according to the first set of parameters of the scattered data stream.
  • Step 706 grouping according to the server IP address+protocol type to obtain at least one initial group.
  • Step 707 determine whether the number of the initial group is greater than a preset threshold, and if so, go to Step 707 .
  • Step 708 determine whether the server port number of the data stream in the initial grouping is not fixed, and whether the terminal port number is not fixed, if yes, then determine that the initial grouping is a data stream group belonging to the third access mode (see step 709) .
  • steps 707 to 709 may be performed repeatedly until all initial groups have been determined.
  • data mining is performed again based on the remaining scattered data streams shown in Table 7, wherein data stream a and data stream c satisfy the same first stream parameter rule and can be generated as a data stream group 4a, the data stream group 4a belongs to the first access mode.
  • the data flow b and the data flow e satisfy the third flow parameter rule, and a data flow group 5a is developed, and the data flow group 5a belongs to the third access mode.
  • the specific mining results are shown in Table 8 below.
  • Processing method 2 first perform data stream group mining, then perform data stream cleaning, and finally perform merge processing.
  • Data stream group mining and cleaning Exemplarily, first, perform data mining again based on the first group parameters of the scattered data streams in the original sample (multiple first group parameters in the aggregation period), and try to mine out the data flow group of the first access mode, the data flow group belonging to the second access mode, and the data flow group belonging to the third access mode, and determining or updating the first group of parameters (or the second group of parameters) of each data flow group, And clean the scattered data flow, that is, delete the records of the scattered data flow.
  • the data flow group that belongs to the first access mode or the second access mode excavated here may exist before mining, if so; New data flow group. For details, please refer to the above related introduction, which will not be repeated here.
  • the management device performs the above processing, there may also be scattered data streams that do not belong to any preset access mode (the first access mode, the second access mode or the third access mode) in the aggregation period. Part of the data stream can be discarded or retained to continue to participate in subsequent operations, such as determining the "second set of parameters" of the scattered data stream.
  • Step 603 The management device determines a second set of parameters for each data stream group.
  • the second group of parameters here can be the first group of parameters.
  • the group parameters configured on each device may be different. Therefore, in order to distinguish it from the group parameters determined by other devices, the management device here determines the group parameters It is called the second group of parameters, and the group parameters received by the management device and sent by other devices are called the first group of parameters.
  • the second set of parameters may include server IP address, minimum terminal port number, maximum terminal port number, minimum server port number, maximum server port number, protocol type, stream support, device access support, and the like.
  • the second set of parameters is introduced in the form of a list as follows. For example, see Table 9, which shows the second set of parameters obtained for a certain aggregation period. It should be noted that, Table 9 is a single example for illustration, and is not necessarily determined by the above-mentioned Tables 1 to 8.
  • the flow support and device access support are introduced as follows.
  • the stream support degree is determined according to the number of streams of data streams in a group of data streams and the total number of streams of all data streams in the current statistics (for example, within one aggregation period).
  • the forwarding device can discard scattered data flows that do not belong to any preset access mode, and the total number of flows can be the total number of data flows included in the data flow group; the forwarding device can also retain these scattered data flows, then The total number of flows may be all data flows in the aggregation period.
  • an aggregation period for illustration. If the management device performs statistics based on a preset time period or a specified time period, the total data flow is based on the data within the preset time period or the specified time period. The number of streams is determined.
  • stream support degree the number of streams in the data stream group/the total number of streams.
  • the device access support degree is determined according to the number of terminals in a data stream group and the total number of terminals corresponding to all data streams counted this time.
  • all the data streams in this statistics can be the data streams contained in the data stream group. If scattered data streams are reserved, all the data streams refer to the data streams and scattered data streams of the data stream group.
  • this All the data flows in the secondary statistics refer to the data flows in a convergence period or a preset time period or a specified time period, refer to the above introduction, and will not be repeated here.
  • the second set of parameters of the merged or updated data stream group may also be directly determined.
  • the management device is a cloud platform
  • the cloud platform can also receive the second set of parameters reported by one or more network management devices, the cloud platform can directly store the second set of parameters, and the cloud platform can also The data flow group mining is performed again for the second set of parameters.
  • the cloud platform can also receive the second set of parameters reported by one or more network management devices, the cloud platform can directly store the second set of parameters, and the cloud platform can also The data flow group mining is performed again for the second set of parameters. For details, refer to the operation performed by the execution body in the execution of FIG. 6 or FIG. 7 , which will not be repeated here.
  • the embodiment of the present application also provides another method for determining data flow information.
  • the forwarding device or a device attached to the forwarding device can send the flow parameters (for example, flow record table) of multiple data flows that have been counted to the
  • the management device that is, the forwarding device or the bypass device, does not perform data mining, and the management device performs data mining uniformly.
  • FIG. 8 is a schematic flowchart of the method for determining data flow information provided by an embodiment of the present application, and the method includes the following steps:
  • Step 801 The first device acquires the stream parameters of each data stream received within N statistical periods, where N is a positive integer.
  • the first device is a forwarding device.
  • the forwarding device performs step 801
  • the first device may also be a bypass device of the forwarding device (for example, the network probe described above).
  • a bypass device of the forwarding device for example, the network probe described above.
  • one management device can be connected to one or more bypass devices, and one bypass device can correspond to one or more forwarding devices. .
  • Step 802 The first device sends stream parameters of the multiple data streams to the management device, and correspondingly, the management device receives the stream parameters of the multiple data streams sent by one or more first devices.
  • step 801 The following describes the complete process of performing step 801 with the network probe as the execution subject by taking the bypass device as a network probe as an example: the method in which the forwarding device receives the data packet and mirrors the data packet to the network probe will be described below. Please refer to the above related descriptions, which will not be repeated here. Subsequently, the network probe respectively determines flow parameters of multiple data flows received from one or more forwarding devices, and sends these flow parameters to the management device.
  • the first device may directly send the acquired stream parameters of each data stream to the management device.
  • the first device may also report the stream parameters of multiple data streams to the management device together according to the reporting period.
  • the bypass device may report the quintuple information and the first time information of the data stream to the management device, and further exemplarily, the bypass device may also generate a flow record table, and report the flow record table to Manage Devices.
  • the flow parameters of the data flow are determined by the network probe
  • the first time information of the data flow can be the time when the network probe receives the data flow.
  • the flow record table generated by the forwarding device in FIG. 2 please refer to the flow record table generated by the forwarding device in FIG. 2 . The specific operation steps are not repeated here.
  • Step 803 The management device groups the multiple data streams according to the received stream parameters and at least one preset access mode of the multiple data streams within the first time period to obtain at least one data stream group.
  • the management device can receive stream parameters of multiple data streams sent by one or more first devices, where the stream parameters include quintuple information and first time information of the data streams. Since the reporting period lengths on different first devices may are different, therefore, the management device may divide multiple data streams belonging to the same time period (for example, denoted as the first time period) based on the first time information of the multiple data streams.
  • the second time information of the data stream is determined according to the first time information of the data stream, see the relevant description above, wherein the same second time information is the data stream in the same time period. Alternatively, it may also be a self-defined time period, which is not limited in this embodiment of the present application.
  • the management device determines a plurality of data streams within the first time period based on the aggregation period.
  • the first time period may be the same time period in different
  • the multiple data streams are grouped.
  • the method for the management device to group the multiple data streams includes: based on the stream parameters of the multiple data streams, firstly determine the data stream group belonging to the first access mode, and then, according to the remaining The data flow group belonging to the second access mode is determined. Finally, the data flow group belonging to the third access mode is determined based on the remaining data flow after the completion of the previous step, and the remaining data flows that are not divided into data flow groups are scattered. Data flow, as mentioned earlier, sporadic data flow can be discarded or retained. For details, reference may be made to or combined with the relevant descriptions of one or more embodiments above, which will not be repeated here.
  • the method for determining the data flow group belonging to the first access mode or the second access mode may refer to the description in FIG. 3 or FIG. 4
  • the method for determining the data flow group belonging to the third access mode may refer to steps 706 to 7 in FIG. 7 .
  • the description of 709 will not be repeated here.
  • Step 804 For any determined data flow group, the management device determines the group parameter of the data flow group.
  • the set of parameters may be the set of parameters shown in Table 9 above, and the description will not be repeated here.
  • the stream support degree is determined, the record of the number of streams in the group parameter can be deleted; similarly, if the device access support degree is determined, the record of the number of terminal devices set in the group parameter can be deleted.
  • the embodiment of the present application also provides another data processing method.
  • the forwarding device or the bypass device mirrors the data stream to the management device, and the management device generates stream parameters of the data stream and executes subsequent processes.
  • the method includes the following steps:
  • Step 901 The first device mirrors the received data packet to the management device, and correspondingly, the management device receives the data packet forwarded by the first device.
  • the first device may be a forwarding device.
  • the forwarding device copies the received data packet, and mirrors the obtained copy of the data packet to the management device.
  • the first device may further include a bypass device.
  • the forwarding device mirrors the copy of the data packet to the bypass device, and the bypass device can mirror the received data packet to the management device again.
  • Step 902 The management device determines stream parameters of the received data stream.
  • the stream parameter includes quintuple information of the data stream and first time information, where the first time information may be determined according to the time when the management device receives the data stream.
  • Step 903 The management device groups the multiple data streams according to the received stream parameters and at least one preset access mode of the multiple data streams within the first time period to obtain at least one data stream group.
  • Step 904 For any determined data flow group, the management device determines the group parameter of the data flow group.
  • step 902 please refer to the specific description of the above step 201, or similar steps such as step 801, and for steps 903 to 904, please refer to the specific description of steps 803 to 804, which will not be repeated here.
  • the management device can store the (second) group parameters of multiple data flow groups obtained in each aggregation period, for example, in a group parameter database, which can be deployed on the management device or on other devices.
  • a group parameter database which can be deployed on the management device or on other devices.
  • the historical group parameter information includes all the group parameters received by the cloud platform or determined by itself, and these group parameters can be subsequently used as abnormal data flow detection Or used to make safety rules.
  • the device used for abnormal data flow detection or for formulating security rules is referred to as the third device, and the third device may be a management device in the network architecture (for example, any network management device or cloud platform), or may be a Devices that are deployed independently.
  • FIG. 10 is a schematic flowchart of a group parameter application method provided by an embodiment of the present application. This method can be applied to the third device and the management device of the integrated group parameter database. It should be noted that the third device and the management device can be deployed on different devices or on the same device, as shown in FIG. 10 . shown, the method includes:
  • Step 1001 The third device receives a query condition input by a user, where the query condition includes a query field.
  • the embodiment of the present application further provides a user interface integrated on a third device, where the user interface includes a query input area and a result display area.
  • the query input area is used to input query fields, such as fields related to group parameters.
  • the result display area is used to display the query results.
  • the query field may be, but not limited to, some or all of the following: stream support, device access support, server IP, terminal port number, server port number, protocol type, flow number, terminal IP address set, terminal Number of devices, second time information, access mode identification.
  • the query condition may be that the server IP address is 10.0.0.1, and for example, the query condition may be that the server port number is 8080.
  • the query condition may also include a query threshold. For example, if a query condition is that the stream support degree is greater than 50%, the query threshold is 50%. For another example, a certain query condition is that the device access support degree is less than 2%. For another example, the query condition is that the device access support degree is between 60% and 100%.
  • Step 1002 The third device sends the query condition to the management device, and correspondingly, the management device receives the query condition sent by the third device.
  • Step 1003 The management device determines a query result that satisfies the query condition.
  • the management device determines the query result satisfying the query condition based on the group parameter database.
  • the query result includes part or all of the group parameters of the data flow group determined by the management device based on the group parameter database that match the query condition or the group parameter meets the query threshold. For example, if the query condition is that the server IP address is 10.0.0.1, the query result includes some or all group parameters of the data flow group whose server IP address is 10.0.0.1 determined by the management device based on the historical group parameters in the group parameter database.
  • the management device may determine a data stream group (called a target data stream group) with a stream support degree less than 2% based on the historical group parameters in the group parameter database, and the query result may be: Part or all of the group parameters of the target data flow group recorded in the historical group parameters, for example, the server IP address, server port number, protocol type, etc. of the target data flow group.
  • Step 1004 The management device sends the query result to the third device, and correspondingly, the third device receives the query result sent by the management device.
  • the third device may display the query result on the user interface in the above step 1001 for the user to browse and consult.
  • the above scenarios can be applied to abnormal data flow detection.
  • query data flows with less than 2% support these data flows are likely to be abnormal data flows.
  • abnormal data flows can be detected in time, and the abnormal data flow can be improved. detection efficiency and accuracy.
  • the third device can also automatically generate security rules according to the query result.
  • security rules can be formulated according to some or all of the data items in the group parameters of the destination data flow group included in the query result.
  • the query result includes target data flow group 1
  • the group parameter of the target data flow group 1 is 80%
  • the first threshold is 51%
  • the group parameters include the server IP 10.0. 0.1
  • the data flow with the server port number between 8080 and 8090 is the data flow that is allowed to be forwarded and belongs to the whitelist.
  • a blacklist can be applied, for example, in the query result.
  • the group parameters include the server IP 10.0.1.100, the server port number range 45532-45562,
  • the data flow with the server IP of 10.0.1.100 and the server port number between 45532 and 45562 is the data flow that needs to be intercepted and belongs to the blacklist.
  • FIG. 11 is a schematic flowchart of a group parameter application method provided by an embodiment of the present application. This method can be applied to the third device and the management device of the integrated group parameter database. It should be noted that the third device and the management device can be deployed on different devices or on the same device, as shown in FIG. 11 . shown, the method includes:
  • Step 1101 The third device monitors the configuration fields entered by the user on the security rule configuration interface.
  • the configuration fields include but are not limited to: server IP address, server port number range, terminal port number range, protocol type, and may also include terminal IP address, allowable access time, and the like.
  • the security rule fields of the data flow that are allowed to be forwarded include: the server IP address is 10.1.0.100, the minimum server port number is 45527, the maximum server port number is 65532, the terminal port number is 80, and the protocol type TCP, the allowable access time is 8:00-11:30, or 8:00-17:00, etc.
  • the forwarding device can forward the data flow after receiving that the data flow conforms to the whitelist.
  • Step 1102 The third device sends the monitored configuration field to the management device, and correspondingly, the management device receives the configuration field sent by the third device.
  • the third device can automatically and continuously send the detected configuration field to the management device.
  • the third device can continuously monitor the user input process, and synchronously send the configuration fields monitored from time to time to the management device.
  • the third device may also send the configuration field input by the current user to the management device after receiving the user's confirmation operation.
  • Step 1103 The management device determines a matching result matching the configuration field.
  • the management device queries the historical group parameters for the group parameters of the target data flow group matching the configuration field based on the group parameter database.
  • the management device may query the group parameter database containing the group parameters of all target data flow groups whose server IP address is 10.1.01.100.
  • the management device may also sort the queried target data flow groups according to dimensions such as time, flow support, and device access support, and sort (part or all of the top N target data flow groups) ) group parameters are sent to the third device. Specifically, when configuring the whitelist, you can sort the values from large to small, and take some or all of the group parameters of the top N target data streams for feedback.
  • the third device When configuring the blacklist, sort the values from small to large, and take some or all of the group parameters of the N target data flow groups before and after the ranking for feedback.
  • the third device when sending the configuration field, the third device also sends indication information for indicating that the third device is configuring a whitelist or a blacklist, which is used to notify the management device that the configuration field sent by the third device is for configuring a whitelist or a blacklist. List or blacklist.
  • the management device may continue to receive other fields. For example, after the management device receives field 1: the server IP address is 10.1.01.100, it may also receive To field 2: The terminal port number is 80. When receiving field 1, the management device searches for matching result 1 of field 1, and when receiving field 2, searches for matching result 2 of field 2 based on matching result 1.
  • Step 1104 The management device sends the matching result to the third device, and correspondingly, the third device receives the matching result sent by the management device.
  • the matching result may be displayed on the third device for the user to browse and view the matching result, and the user may generate a security rule with reference to the matching result according to experience.
  • the third device can also automatically generate security rules. For example, after the third device receives the matching result, it automatically extracts the flow parameters in the matching result and writes them into the corresponding parameter items in the security rule configuration interface.
  • the security rule is determined to be generated. For details, please refer to the description of generating a whitelist and a blacklist based on the group parameters of the query result above, which will not be repeated here.
  • the above method realizes the method of generating security rules based on the access behavior of the data stream transmitted on the network, avoids relying solely on manual experience to configure the security rules, and improves the reliability of data access in the network.
  • the embodiment of the present application further provides an apparatus for determining data flow information, which is used to execute the first device in FIGS. 2 to 4 in the above method embodiment or the management in FIGS. 8 and 9
  • the functions performed by the device as shown in FIG. 12 , the apparatus includes an acquisition unit 1201 and a processing unit 1202 .
  • the obtaining unit 1201 is configured to obtain flow parameters of multiple data flows in the first time period; the flow parameters include: protocol type, terminal port number, server IP address, and server port number; for the specific implementation, please refer to FIG. 2
  • the description of step 201 in FIG. 8 or steps 801 and 802 in FIG. 8 or steps 901 and 902 in FIG. 9 will not be repeated here.
  • the processing unit 1202 is configured to obtain at least one data flow group according to the flow parameter rule of at least one preset access mode and the flow parameters of multiple data flows; the relationship between the data flows in each data flow group satisfies a preset The flow parameter rule of the access mode; determine the group parameter of each data flow group; the group parameter includes the server IP address, the server port number range, the terminal port number range, and the protocol type; wherein, the group parameter of the data flow group is It is determined according to the stream parameters of the data streams included in the data stream group.
  • steps 202 and 203 in FIG. 2 or steps 803 and 804 in FIG. 3 or FIG. 4 or FIG. 8 or steps 903 and 904 in FIG. 9 , which are not repeated here.
  • the apparatus further includes a sending unit 1203; the sending unit 1203 is further configured to send the group parameters of the multiple data flow groups determined in the reporting period to the management device, or send the group parameters determined in the reporting period to the management device.
  • the group parameters of the multiple data flow groups and the flow parameters of the scattered data flow are sent to the management device, wherein the scattered data flow is the data flow that does not belong to any data flow group in the reporting period.
  • the scattered data flow is the data flow that does not belong to any data flow group in the reporting period.
  • the apparatus is a management device; the acquiring unit 1201 is further configured to receive multiple statistical results, the multiple statistical results are from one or more first devices.
  • the processing unit 1202 is further configured to, based on the plurality of statistical results in the second time period in the received plurality of statistical results, determine at least one of the plurality of statistical results based on the plurality of statistical results in the second time period.
  • Two data stream groups are merged, and the group parameter of the merged data stream group is updated according to the group parameter of each data stream group in the at least two data stream groups; The relationship between them satisfies the first flow parameter rule or satisfies the second flow parameter rule.
  • the statistical results further include scattered data streams that are not divided into data stream groups; at least one preset access mode further includes the third access mode; the processing unit 1202 is further configured to add the plurality of statistical results in the second time period into The scattered data stream and the target data stream group are merged, and the group parameters of the merged data stream are updated according to the stream parameters of the scattered data stream and the group parameter of the target data stream group; wherein, the data stream in the target data stream group and the scattered data stream The relationship between them satisfies the first flow parameter rule or the second flow parameter rule; the management device determines the data flow group belonging to the third access mode based on the remaining scattered data flows.
  • step 602 and step 603 in FIG. 6 or the description in FIG. 7 , which will not be repeated here.
  • the group parameter is used to identify abnormal data flows or to determine security rules, and the security rules are used to control data flow forwarding.
  • the device is a management device; the management device stores group parameters of the historical data flow group; the obtaining unit 1201 is further configured to receive a query request; the query request is used to indicate query conditions, and the query conditions include to be One or more of the group parameters of the query; the processing unit 1202 is further configured to determine a query result that satisfies the query condition, and send the query result.
  • the embodiment of the present application further provides a device for determining data flow information, which is used to perform the function performed by the third device in FIG. 10 or FIG. 11 in the above method embodiment, as shown in FIG. 13 .
  • the device includes an acquisition unit 1301 and a determination unit 1302.
  • the obtaining unit 1301 is used to obtain the group parameters of the target data flow group, the group parameters include the server IP address, the server port number range, the terminal port number range, and the protocol type; the determining unit 1302 is used to determine the security rules according to the group parameters, the security rules Including blacklists and/or whitelists; blacklists are used to indicate data flows that need to be intercepted, and whitelists are used to indicate data flows that need to be forwarded.
  • the flow support degree of the target data flow group is higher than the first threshold or the device access support degree is higher than the second threshold; the group parameter is used to determine the whitelist; or,
  • the flow support degree of the target data flow group is lower than the third threshold or the device access support degree is lower than the fourth threshold, and the group parameter is used to determine the blacklist.
  • the apparatus may be the forwarding device in the foregoing embodiment, a device attached to the forwarding device, a management device, or a third device.
  • the apparatus 1400 includes: a processor 1402 and a communication interface 1403 .
  • the apparatus 1400 may further include a memory 1401 and/or a communication line 1404 .
  • the communication interface 1403, the processor 1402 and the memory 1401 can be connected to each other through a communication line 1404;
  • the communication line 1404 can be a peripheral component interconnect (PCI for short) bus or an extended industry standard architecture (extended industry standard architecture). , referred to as EISA) bus and so on.
  • the communication line 1404 can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is shown in FIG. 14, but it does not mean that there is only one bus or one type of bus.
  • the processor 1402 may be a CPU, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the programs of the present application.
  • Communication interface 1403 using any transceiver-like device, for communicating with other devices or communication networks, such as Ethernet, radio access network (RAN), wireless local area networks (WLAN), Wired access network, etc.
  • RAN radio access network
  • WLAN wireless local area networks
  • Wired access network etc.
  • the memory 1401 can be a ROM or other types of static storage devices that can store static information and instructions, a RAM or other types of dynamic storage devices that can store information and instructions, or an electrically erasable programmable read-only memory (electrically erasable programmable read-only memory).
  • read-only memory EEPROM
  • compact disc read-only memory CD-ROM
  • optical disc storage including compact disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.
  • magnetic disk A storage medium or other magnetic storage device, or any other medium capable of carrying or storing desired program code in the form of instructions or data structures and capable of being accessed by a computer, without limitation.
  • the memory may exist independently and be connected to the processor through communication line 1404 .
  • the memory can also be integrated with the processor.
  • the memory 1401 is used for storing computer-executed instructions for executing the solution of the present application, and the execution is controlled by the processor 1402 .
  • the processor 1402 is configured to execute the computer-executed instructions stored in the memory 1401, thereby implementing the method for determining data flow information provided by the foregoing embodiments of the present application.
  • the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • the computer-executed instructions in the embodiment of the present application may also be referred to as application code, which is not specifically limited in the embodiment of the present application.
  • At least one item (single, species) of a, b, or c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple.
  • “Plurality” means two or more, and other quantifiers are similar.
  • occurrences of the singular forms "a”, “an” and “the” do not mean “one or only one” unless the context clearly dictates otherwise, but rather “one or more” in one".
  • "a device” means to one or more such devices.
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • software it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that a computer can access, or a data storage device such as a server, a data center, or the like that includes an integration of one or more available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), and the like.
  • a general-purpose processor may be a microprocessor, or alternatively, the general-purpose processor may be any conventional processor, controller, microcontroller, or state machine.
  • a processor may also be implemented by a combination of computing devices, such as a digital signal processor and a microprocessor, multiple microprocessors, one or more microprocessors in combination with a digital signal processor core, or any other similar configuration. accomplish.
  • a software unit may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art.
  • a storage medium may be coupled to the processor such that the processor may read information from, and store information in, the storage medium.
  • the storage medium can also be integrated into the processor.
  • the processor and storage medium may be provided in the ASIC.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请提供一种确定数据流信息的方法、装置及系统,其方法包括,获取第一时间段内的多个数据流的流参数,根据该多个数据流的流参数和至少一个预设访问模式的流参数规则,得到至少一个数据流组;每个数据流组内的数据流之间的关系满足一个预设访问模式的流参数规则;确定每一个数据流组的组参数。本申请实施例基于在网络中实际传输的大量数据流,将具有相同访问规则的数据流做为一个数据流组,并确定各数据流组的组参数,这些组参数可以用于制定安全规则或异常检测等众多安全或监测场景中,避免现有完全依赖经验的安全工作场景,能够更好地应用真实传输的数据流的信息,可以用于提高网络安全的可靠性和保障。

Description

一种确定数据流信息的方法、装置及系统
相关申请的交叉引用
本申请要求在2020年11月13日提交中国专利局、申请号为202011271196.X、申请名称为“一种挖掘互访模式的方法、装置和系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中;本申请要求在2021年01月30日提交中国专利局、申请号为202110131909.0、申请名称为“一种确定数据流信息的方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域,尤其涉及一种确定数据流信息的方法、装置及系统。
背景技术
随着通信领域的业务越来越多样化和复杂化,不同类型的终端设备数量在不断增加,导致网络的可信边界越来越模糊化。这些终端设备由于分布范围广、接入地点分散,难于集中管理,很可能作为攻击者的跳板对网络进行攻击,实现非法目的,造成严重的经济损失。
具体的,终端设备通过数据报文与服务器进行交互,以请求服务。对应的,服务器向终端设备发送数据报文以提供服务或发送反馈响应。终端和服务器之间交互的一组数据报文统称为一条数据流。
发明内容
本申请提供一种确定数据流信息的方法、装置及系统,用以挖掘实际在网络中传输的数据流所反映出的访问规律。
第一方面,本申请提供了一种确定数据流信息的方法,该方法可以应用于第一设备,该第一设备可以是转发设备或旁挂在转发设备上的设备(以下简称为旁挂设备)或管理设备,该方法由第一设备实现,具体可以由第一设备的部件实现,如由第一设备中的处理装置、电路、芯片等部分实现。该方法包括:第一设备获取一段时间(记为第一时间段)内的多个数据流的流参数,其中流参数包括但不限于:协议类型、终端端口号、服务器IP地址和服务器端口号;基于该多个数据流的流参数和至少一个预设访问模式的流参数规则,得到至少一个数据流组;其中,一个预设访问模式和一组预设的流参数规则相对应,每个数据流组所包含的数据流之间的关系满足某一个预设的流参数规则;对于确定出的数据流组,基于该数据流组内的数据流的流参数确定该数据流组的组参数,其中,组参数包括但不限于:服务器IP地址、服务器端口号范围、终端端口号范围、协议类型。具体的,组参数中的服务器端口号范围下限为该数据流组内的数据流中的服务器端口号最小值,服务器端口号范围上限为该数据流组内的数据流中的服务器端口号最大值。
通过上述设计,可以基于在网络中实际传输的大量数据流,将具有相同访问规律的数据流做为一个数据流组,并确定各数据流组的组参数,这些组参数可以用于制定安全规则 或异常检测等众多安全或监测场景中,避免现有完全依赖经验的安全工作场景,能够更好地应用真实传输的数据流的信息,可以用于提高网络安全的可靠性和保障。
在一种可能的实现方法中,上述数据流组的组参数可以用于识别异常数据流或用于确定安全规则,其中,安全规则用于控制转发设备进行数据流转发。
在一种可能的实现方法中,组参数还可以包括但不限于下列中的部分或全部:终端IP地址集合、数据流的流数、时间模式信息、访问模式标识、流支持度、设备访问支持度;其中,终端IP地址集合包括所述数据流组内的数据流对应的不同的终端IP地址;
其中,1)数据流的流数是指数据流组包含的数据流的数量;2)时间模式信息用于指示数据流组所属的预设时间模式,其中,不同的预设时间模式与预设时间范围一一对应;3)访问模式标识用于标识数据流组所属的预设访问模式;4)流支持度是根据数据流组的数据流的数量与第一时间段内的数据流的总数确定的;5)设备访问支持度是根据数据流组对应的终端的数量与,通过样本数据确定的终端的总数量确定的。其中,样本数据是指进行本次数据流组挖掘时所基于的全部数据流(的流参数)。
通过上述设计,可以通过多维度来挖掘数据流的访问行为,提高数据流组挖掘的准确性,应用性强。
在一种可能的实现方法中,至少一个预设访问模式包括下列模式中的一个或多个:第一访问模式、第二访问模式、第三访问模式;其中,属于第一访问模式的数据流组内的数据流之间的关系满足第一流参数规则,第一流参数规则包括:数据流组内的数据流的协议类型相同,终端端口号不完全相同,服务器端口号相同,服务器IP地址相同;或数据流组内的数据流的协议类型相同,终端端口号不完全相同,服务器端口号相同,服务器IP地址属于同一预设IP地址组;属于第二访问模式的数据流组内的数据流之间的关系满足第二流参数规则,第二流参数规则包括:数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号相同,服务器IP地址相同;或数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号相同,服务器IP地址属于同一预设IP地址组;属于第三访问模式内的数据流组内的数据流之间的关系满足第三流参数规则,第三流参数规则包括:数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号不完全相同,服务器IP地址相同;或数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号不完全相同,服务器IP地址属于同一预设IP地址组。
通过上述设计,可以通过针对服务器侧或针对终端设备侧或结合该两侧,更加全面、多维度来挖掘终端设备与服务器之间的访问行为,更便于后续进行异常数据流检测或安全规则制定,应用性强。
在一种可能的实现方法中,至少一个预设访问模式包括第一访问模式和第二访问模式;
根据至少一个预设访问模式的流参数规则和多个数据流的流参数,得到至少一个数据流组,包括:基于第一时间段内的多个数据流的流参数,确定属于第一访问模式的数据流组,基于剩余的数据流的流参数确定属于第二访问模式的数据流组。
在一种可能的实现方法中,第一设备为管理设备;至少一个预设访问模式还包括第三访问模式;该方法还包括:管理设备基于第一时间段内的多个数据流中除去属于第一访问模式以及属于第二访问模式的数据流组的数据流之外的数据流,确定属于第三访问模式的数据流组。
在一种可能的实现方法中,第一设备为转发设备或转发设备的旁挂设备;该方法还包 括:第一设备获取上报周期内确定的多个数据流组的组参数;其中,上报周期的长度大于第一时间段的长度;将多个数据流组中的至少两个数据流组合并,根据至少两个数据流组的组参数确定合并后的数据流组的组参数;其中,该至少两个数据流组中的数据流之间的关系满足第一流参数规则或第二流参数规则。
在一种可能的实现方法中,该方法还包括:获取上报周期内的零散数据流,其中,零散数据流为上报周期内不属于所述上报周期内的任一数据流组的任一数据流;确定每一零散数据流是否与当前存在的一个数据流组中的数据流之间的关系满足第一流参数规则或第二流参数规则,如果是,则将该零散数据流和该数据流组(或者称为该零散数据流的目标数据流组)合并,并根据该零散数据流的流参数和目标数据流组的组参数更新合并后的数据流组的组参数。
通过上述设计,该上报方式可以有效减少冗余信息的重复上报,节省资源开销。
在一种可能的实现方法中,第一设备为转发设备或转发设备的旁挂设备;该方法还包括:管理设备发送所述第一设备所确定的数据流组的组参数。
在一种可能的实现方法中,第一设备为管理设备,第一时间段内的多个数据流的流参数来自多个第二设备,多个第二设备包括转发设备和/或所述转发设备的旁挂设备。
在一种可能的实现方法中,第一设备为管理设备;管理设备上存储有历史数据流组的组参数;该方法还包括:接收查询请求;查询请求用于指示查询条件,查询条件包括待查询的组参数中的一项或多项;确定满足查询条件的查询结果,并发送查询结果。
通过上述设计,可以应用于异常数据流检测中,可以及时检测出异常数据流,提高异常数据流的检测效率和准确性。
在一种可能的实现方法中,待查询的组参数包括流支持度和/或设备访问支持度;查询条件还包括第一查询阈值和/或第二查询阈值,第一查询阈值对应于流支持度,第二查询阈值对应于设备访问支持度;
查询结果包括在历史数据流组中,流支持度满足第一查询阈值的数据流组的部分或全部组参数;和/或在历史数据流组中,设备访问支持度满足第二查询阈值的数据流组的部分或全部组参数。
通过上述设计,还可以实现基于在网传输的数据流的访问行为生成安全规则的方式,避免单纯依赖人工经验配置安全规则,提高了网络内数据访问的可靠性。
在一种可能的实现方法中,转发设备为交换机或路由器或虚拟专用网络VPN设备或防火墙虚拟设备。
第二方面,本申请提供了一种确定数据流信息的方法,该方法可以应用于第三设备,该方法由第三设备实现,具体可以由第三设备的部件实现,如由第三设备中的处理装置、电路、芯片等部分实现。该方法包括:在制定安全规则时,获取目标数据流组的组参数,组参数包括服务器IP地址、服务器端口号范围、终端端口号范围、协议类型;根据组参数确定安全规则,安全规则包括黑名单和/或白名单;黑名单用于指示需要被拦截的数据流,白名单用于指示需要被转发的数据流。
通过上述设计,还可以实现基于在网传输的数据流的访问行为生成安全规则的方式,避免单纯依赖人工经验配置安全规则,提高了网络内数据访问的可靠性。
在一种可能的实现方法中,目标数据流组的流支持度高于第一阈值或设备访问支持度高于第二阈值;组参数用于确定所述白名单;或者,目标数据流组的流支持度低于第三阈 值或设备访问支持度低于第四阈值,组参数用于确定所述黑名单。
第三方面,本申请提供了一种确定数据流信息的系统,该系统包括至少一个第一设备以及至少一个管理设备,其中,第一设备可以是转发设备或转发设备的旁挂设备。第一设备获取第一时间段内的多个数据流的流参数,并基于该多个数据流的流参数和至少一个预设访问模式的流参数规则,得到至少一个数据流组;其中,所述流参数包括:协议类型、终端端口号、服务器IP地址、服务器端口号;然后,确定每一个数据流组的组参数,组参数包括:服务器IP地址、服务器端口号范围、终端端口号范围、协议类型;每个预设访问模式与一组预设的流参数规则相对应;将所述第一时间段的统计结果发送至管理设备,所述统计结果包括:确定的至少一个数据流组的组参数。管理设备接收多个统计结果,所述多个统计结果来自一个或多个第一设备。
在一种可能的实现方法中,上述数据流组的组参数可以用于识别异常数据流或用于确定安全规则,其中,安全规则用于控制转发设备进行数据流转发。
在一种可能的实现方法中,组参数还可以包括但不限于下列中的部分或全部:终端IP地址集合、数据流的流数、时间模式信息、访问模式标识、流支持度、设备访问支持度;其中,终端IP地址集合包括所述数据流组内的数据流对应的不同的终端IP地址;
其中,1)数据流的流数是指数据流组包含的数据流的数量;2)时间模式信息用于指示数据流组所属的预设时间模式,其中,不同的预设时间模式与预设时间范围一一对应;3)访问模式标识用于标识数据流组所属的预设访问模式;4)流支持度是根据数据流组的数据流的数量与第一时间段内的数据流的总数确定的;5)设备访问支持度是根据数据流组对应的终端的数量与,通过样本数据确定的终端的总数量确定的。其中,样本数据是指进行本次数据流组挖掘时所基于的全部数据流(的流参数)。
在一种可能的实现方法中,至少一个预设访问模式包括下列模式中的一个或多个:第一访问模式、第二访问模式、第三访问模式;其中,属于第一访问模式的数据流组内的数据流之间的关系满足第一流参数规则,第一流参数规则包括:数据流组内的数据流的协议类型相同,终端端口号不完全相同,服务器端口号相同,服务器IP地址相同;或数据流组内的数据流的协议类型相同,终端端口号不完全相同,服务器端口号相同,服务器IP地址属于同一预设IP地址组;属于第二访问模式的数据流组内的数据流之间的关系满足第二流参数规则,第二流参数规则包括:数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号相同,服务器IP地址相同;或数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号相同,服务器IP地址属于同一预设IP地址组;属于第三访问模式内的数据流组内的数据流之间的关系满足第三流参数规则,第三流参数规则包括:数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号不完全相同,服务器IP地址相同;或数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号不完全相同,服务器IP地址属于同一预设IP地址组。
在一种可能的实现方法中,至少一个预设访问模式包括第一访问模式和第二访问模式;
第一设备根据至少一个预设访问模式的流参数规则和所述多个数据流的流参数,得到至少一个数据流组,包括:基于第一时间段内的多个数据流的流参数,确定属于第一访问模式的数据流组,基于剩余的数据流的流参数确定属于第二访问模式的数据流组。
在一种可能的实现方法中,第一设备确定统计结果,包括:第一设备获取上报周期内确定的多个数据流组的组参数;其中,上报周期的长度大于第一时间段的长度;将多个数 据流组中的至少两个数据流组合并,根据至少两个数据流组的组参数确定合并后的数据流组的组参数;其中,该至少两个数据流组中的数据流之间的关系满足第一流参数规则或第二流参数规则。
在一种可能的实现方法中,第一设备确定统计结果,还包括:获取上报周期内的零散数据流,其中,零散数据流为上报周期内的多个数据流中,不属于所述上报周期内的任一数据流组的数据流;确定每一零散数据流是否与当前存在的一个数据流组中的数据流之间的关系满足第一流参数规则或第二流参数规则,如果是,则将该零散数据流和该数据流组(或者称为该零散数据流的目标数据流组)合并,并根据该零散数据流的流参数和目标数据流组的组参数更新合并后的数据流组的组参数。
在一种可能的实现方法中,所述统计结果包括上报周期内确定的多个数据流组中未被合并的每一数据流组的组参数、合并后的数据流组的组参数和剩余的未被合并的零散数据流的流参数。
在一种可能的实现方法中,管理设备在接收到的多个统计结果中的第二时间段内的多个统计结果,基于该第二时间段内的多个统计结果,将该多个统计结果中的至少两个数据流组合并,根据该至少两个数据流组的组参数确定合并后的数据流组的组参数;其中,该至少两个数据流组中的数据流之间的关系满足第一流参数规则或满足第二流参数规则。
在一种可能的实现方法中,至少一个预设访问模式还包括所述第三访问模式;统计结果还包括未被划分为任一数据流组的零散数据流;管理设备将第二时间段内的多个统计结果中的一个或多个零散数据流加入目标数据流组,根据零散数据流的流参数更新目标数据流组的组参数;其中,目标数据流组内的数据流与零散数据流之间的关系满足第一流参数规则或第二流参数规则;管理设备基于剩余的零散数据流,确定属于第三访问模式的数据流组。
在一种可能的实现方法中,管理设备上存储有历史数据流组的组参数;该方法还包括:管理设备接收查询请求;查询请求用于指示查询条件,查询条件包括待查询的组参数中的一项或多项;管理设备确定满足查询条件的查询结果,并发送查询结果。
在一种可能的实现方法中,待查询的组参数包括流支持度和/或设备访问支持度;查询条件还包括第一查询阈值和/或第二查询阈值,第一查询阈值对应于流支持度,第二查询阈值对应于设备访问支持度;查询结果包括在历史数据流组中,流支持度满足第一查询阈值的数据流组的部分或全部组参数;和/或在历史数据流组中,设备访问支持度满足第二查询阈值的数据流组的部分或全部组参数。
第四方面,本申请提供了一种确定数据流信息的系统,该系统包括至少一个第一设备以及至少一个管理设备,其中,第一设备可以是转发设备或转发设备的旁挂设备。第一设备向管理设备发送第一时间段内的多个数据流的流参数;所述流参数包括:协议类型、终端端口号、服务器IP地址、服务器端口号;管理设备从一个或多个第一设备接收第一时间段内的多个数据流的流参数;基于该多个数据流的流参数和至少一个预设访问模式的流参数规则,得到至少一个数据流组;确定每一个数据流组的组参数;其中,流参数包括:协议类型、终端端口号、服务器IP地址、服务器端口号;组参数包括:服务器IP地址、服务器端口号范围、终端端口号范围、协议类型;每个所述预设访问模式与一组预设的流参数规则相对应。
在一种可能的实现方法中,上述数据流组的组参数可以用于识别异常数据流或用于确 定安全规则,其中,安全规则用于控制转发设备进行数据流转发。
在一种可能的实现方法中,组参数还可以包括但不限于下列中的部分或全部:终端IP地址集合、数据流的流数、时间模式信息、访问模式标识、流支持度、设备访问支持度;其中,终端IP地址集合包括所述数据流组内的数据流对应的不同的终端IP地址;
其中,1)数据流的流数是指数据流组包含的数据流的数量;2)时间模式信息用于指示数据流组所属的预设时间模式,其中,不同的预设时间模式与预设时间范围一一对应;3)访问模式标识用于标识数据流组所属的预设访问模式;4)流支持度是根据数据流组的数据流的数量与第一时间段内的数据流的总数确定的;5)设备访问支持度是根据数据流组对应的终端的数量与,通过样本数据确定的终端的总数量确定的。其中,样本数据是指进行本次数据流组挖掘时所基于的全部数据流(的流参数)。
在一种可能的实现方法中,至少一个预设访问模式包括下列模式中的一个或多个:第一访问模式、第二访问模式、第三访问模式;其中,属于第一访问模式的数据流组内的数据流之间的关系满足第一流参数规则,第一流参数规则包括:数据流组内的数据流的协议类型相同,终端端口号不完全相同,服务器端口号相同,服务器IP地址相同;或数据流组内的数据流的协议类型相同,终端端口号不完全相同,服务器端口号相同,服务器IP地址属于同一预设IP地址组;属于第二访问模式的数据流组内的数据流之间的关系满足第二流参数规则,第二流参数规则包括:数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号相同,服务器IP地址相同;或数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号相同,服务器IP地址属于同一预设IP地址组;属于第三访问模式内的数据流组内的数据流之间的关系满足第三流参数规则,第三流参数规则包括:数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号不完全相同,服务器IP地址相同;或数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号不完全相同,服务器IP地址属于同一预设IP地址组。
在一种可能的实现方法中,至少一个预设访问模式包括第一访问模式和第二访问模式;
管理设备基于该第一时间段内的多个数据流的流参数,确定属于第一访问模式的数据流组,基于除去属于所述第一访问模式之外,剩余的数据流确定属于第二访问模式的数据流组;
该至少一个预设访问模式还包括第三访问模式;
管理设备基于除去属于第一访问模式以及属于第二访问模式的数据流组的数据流之外,剩余的数据流确定属于第三访问模式的数据流组。
在一种可能的实现方法中,管理设备上存储有历史数据流组的组参数;该方法还包括:接收查询请求;查询请求用于指示查询条件,查询条件包括待查询的组参数中的一项或多项;确定满足查询条件的查询结果,并发送查询结果。
在一种可能的实现方法中,待查询的组参数包括流支持度和/或设备访问支持度;查询条件还包括第一查询阈值和/或第二查询阈值,第一查询阈值对应于流支持度,第二查询阈值对应于设备访问支持度;查询结果包括在历史数据流组中,流支持度满足第一查询阈值的数据流组的部分或全部组参数;和/或在历史数据流组中,设备访问支持度满足第二查询阈值的数据流组的部分或全部组参数。
第五方面,本申请提供了一种确定数据流信息的系统,该系统包括至少一个第一设备以及至少一个管理设备,其中,第一设备可以是转发设备或转发设备的旁挂设备。第一设 备将接收到的数据流发送至管理设备;管理设备接收多个数据流,该多个数据流来自于一个或多个第一设备;确定多个数据流中每一数据流的流参数,基于至少一个预设访问模式的流参数规则和该多个数据流的流参数,得到至少一个数据流组;确定每一个数据流组的组参数;其中,流参数包括:协议类型、终端端口号、服务器IP地址、服务器端口号;组参数包括:服务器IP地址、服务器端口号范围、终端端口号范围、协议类型;每个所述预设访问模式与一组预设的流参数规则相对应。
在一种可能的实现方法中,上述数据流组的组参数可以用于识别异常数据流或用于确定安全规则,其中,安全规则用于控制转发设备进行数据流转发。
在一种可能的实现方法中,组参数还可以包括但不限于下列中的部分或全部:终端IP地址集合、数据流的流数、时间模式信息、访问模式标识、流支持度、设备访问支持度;其中,终端IP地址集合包括所述数据流组内的数据流对应的不同的终端IP地址;
其中,1)数据流的流数是指数据流组包含的数据流的数量;2)时间模式信息用于指示数据流组所属的预设时间模式,其中,不同的预设时间模式与预设时间范围一一对应;3)访问模式标识用于标识数据流组所属的预设访问模式;4)流支持度是根据数据流组的数据流的数量与第一时间段内的数据流的总数确定的;5)设备访问支持度是根据数据流组对应的终端的数量与,通过样本数据确定的终端的总数量确定的。其中,样本数据是指进行本次数据流组挖掘时所基于的全部数据流(的流参数)。
在一种可能的实现方法中,至少一个预设访问模式包括下列模式中的一个或多个:第一访问模式、第二访问模式、第三访问模式;其中,属于第一访问模式的数据流组内的数据流之间的关系满足第一流参数规则,第一流参数规则包括:数据流组内的数据流的协议类型相同,终端端口号不完全相同,服务器端口号相同,服务器IP地址相同;或数据流组内的数据流的协议类型相同,终端端口号不完全相同,服务器端口号相同,服务器IP地址属于同一预设IP地址组;属于第二访问模式的数据流组内的数据流之间的关系满足第二流参数规则,第二流参数规则包括:数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号相同,服务器IP地址相同;或数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号相同,服务器IP地址属于同一预设IP地址组;属于第三访问模式内的数据流组内的数据流之间的关系满足第三流参数规则,第三流参数规则包括:数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号不完全相同,服务器IP地址相同;或数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号不完全相同,服务器IP地址属于同一预设IP地址组。
在一种可能的实现方法中,至少一个预设访问模式包括第一访问模式和第二访问模式;
管理设备基于该第一时间段内的多个数据流的流参数,确定属于第一访问模式的数据流组,基于除去属于所述第一访问模式之外,剩余的数据流确定属于第二访问模式的数据流组;
该至少一个预设访问模式还包括第三访问模式;
管理设备基于除去属于第一访问模式以及属于第二访问模式之外,剩余的数据流确定属于第三访问模式的数据流组。
在一种可能的实现方法中,管理设备上存储有历史数据流组的组参数;该方法还包括:接收查询请求;查询请求用于指示查询条件,查询条件包括待查询的组参数中的一项或多项;确定满足查询条件的查询结果,并发送查询结果。
在一种可能的实现方法中,待查询的组参数包括流支持度和/或设备访问支持度;查询条件还包括第一查询阈值和/或第二查询阈值,第一查询阈值对应于流支持度,第二查询阈值对应于设备访问支持度;查询结果包括在历史数据流组中,流支持度满足第一查询阈值的数据流组的部分或全部组参数;和/或在历史数据流组中,设备访问支持度满足第二查询阈值的数据流组的部分或全部组参数。
第六方面,本申请还提供了一种确定数据流信息的装置,该装置包括多个功能单元,这些功能单元可以执行第一方面的方法中各个步骤所执行的功能或执行第二方面的方法中各个步骤所执行的功能。这些功能单元可以通过硬件实现,也可以通过软件实现。在一个可能的设计中,该装置包括获取单元以及处理单元。在另一个可能的设计中,该装置包括获取单元以及确定单元。
第七方面,本申请还提供了一种确定数据流信息的装置,该装置包括处理器、存储器和收发机,所述存储器中存储有程序指令,所述处理器运行所述存储器中的程序指令,通过收发机与其他设备通信,以实现第一方面所提供的方法或实现第二方面所提供的方法。
第八方面,本申请还提供了一种确定数据流信息的装置,该设备包括至少一个处理器和接口电路,所述处理器用于通过所述接口电路与其它装置通信,以实现第一方面所提供的方法或实现第二方面所提供的方法。
第九方面,本申请还提供一种计算机可读存储介质,计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面所提供的方法或实现第二方面所提供的方法。
上述该第三方面至第九方面实现的有益效果,请参考第一方面或第二方面关于第一设备执行方法的有益效果描述,在此不再赘述。
附图说明
图1为本申请实施例提供的一种系统架构示意图;
图2为本申请实施例提供的一种确定数据流信息的方法所对应的流程示意图;
图3为本申请实施例提供的一种数据流组的确定流程示意图;
图4为本申请实施例提供的一种数据流信息的确定流程示意图;
图5为本申请实施例提供的一种上报周期和统计周期的关系示意图;
图6为本申请实施例提供的另一种确定数据流信息的方法所对应的流程示意图;
图7为本申请实施例提供的另一种确定数据流信息的方法的流程示意图;
图8为本申请实施例提供的一种确定数据流信息的方法的流程示意图;
图9为本申请实施例提供的另一种确定数据流信息的方法的流程示意图;
图10为本申请实施例提供的一种查询场景的示意图;
图11为本申请实施例提供的另一种查询场景的示意图;
图12为本申请实施例提供的一种确定数据流信息的装置的结构示意图;
图13为本申请实施例提供的另一种确定数据流信息的装置的结构示意图;
图14为本申请实施例提供的又一种确定数据流信息的装置的结构示意图。
具体实施方式
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进一步地详细描述。方法实施例中的具体操作方法也可以应用于装置实施例或系统实施例中。
请参见图1,图1为本申请实施例适用的一种网络架构示意图。该网络架构包括一个或多个服务器(图1中是以服务器100为例示出的,但本申请对此不做限定)、一个或多个转发设备(图1中是以转发设备200、201为例示出的,但本申请对此不做限定)、终端设备(图1中是以终端设备10、11、12为例示出的,但本申请对此不做限定)和一个或多个管理设备300(图1中是以管理设备300为例示出的,但本申请对此不做限定)。
以下,首先对本申请实施例中的部分用语进行解释说明,以便于本领域技术人员理解。需要说明的是,这些解释说明是为了让本申请实施例更容易被理解,而不应该视为对本申请实施例所要求的保护范围的限定。
1,终端设备,可以是一种具有有线或无线收发功能的设备。终端设备可以简称为终端,其可以部署在陆地上,包括室内、室外、和/或手持或车载;也可以部署在水面上(如轮船等);还可以部署在空中(例如飞机、气球和卫星上等)。终端设备可以是用户设备(user equipment,UE),UE包括具有有线通信功能或无线通信功能的手持式设备、车载设备、可穿戴设备或计算设备。示例性地,UE可以是手机(mobile phone)、平板电脑或带有线收发功能或无线收发功能的电脑。终端设备还可以是虚拟现实(virtual reality,VR)终端设备、增强现实(augmented reality,AR)终端设备、工业控制中的无线终端、无人驾驶中的无线终端、远程医疗中的无线终端、智能电网中的无线终端、智慧城市(smart city)中的无线终端、和/或智慧家庭(smart home)中的无线终端等等。示例性地,终端设备还可以是一种基于网际互连协议(Internet Protocol,IP)通信的物联网设备,例如摄像头、打印机、IP话机、自动取款机(Automated Teller Machine,ATM)、智能柜台、排号机、回单柜等等。
2,转发设备,例如可以是交换机、路由器、虚拟专用网络(Virtual Private Network,VPN)、防火墙虚拟设备等,主要用于对数据流进行转发,具体的,可以根据配置的安全规则对数据流进行转发或拦截。其中,不同的转发设备上配置的安全规则可能是不同的,安全规则下文会进行介绍。
3,服务器,用于提供一个或多个服务(或者功能)的设备。如图1所示的网络架构可以应用于多种场景中,例如金融网络、园区网络、医疗网络等。举例来说,在金融网络中,终端设备可以是监控摄像头,服务器可以是监控平台的服务器;又例如,终端设备可以是ATM机,服务器可以是金融机构的特定服务器。例如是金融网络中的业务服务器,可以用于提供特定业务功能,例如,转账、存款、交易认证、查询业务等等功能。
4,管理设备,用于为转发设备配置安全规则,同时支持用户访问等功能。具体形态上,一般指控制设备(与设备有交互、负责管理设备),管理设备可以是区域级的网络管理设备,用于管理指定区域内的网络设备(例如转发设备),也可以是云平台,云平台可以管理多个区域级的网络管理设备,当然也可以直接管理指定区域内的部分或全部网路设备。安全分析功能组件可以集成在网络管理设备中,也可以集成在云平台上,以实现本申请提供的确定数据流信息的方法。另外,管理设备还可以用于实现安全规则下发,示例性地,管理设备下发安全规则的方式可以包括:云平台将(接收到的)安全规则发送至网络管理设备,网络管理设备再下发至转发设备。
5,数据流,是指在两个节点之间进行交互的一组数据报文,一般的,一条数据流由多个数据报文组成,按照数据报文的传输方向,数据流包括上行报文和下行报文。其中,终端设备向服务器发送的数据报文称为上行报文,服务器向终端设备发送的数据报文称为下行报文。示例性地,数据报文的格式包括报文头部和数据部分,其中,数据部分用于承载待传输的信息,报文头部用于承载五元组信息,下文将会对五元组信息进行介绍,此处不再说明。
如下对图1所示的网络架构进行介绍。
参见图1,服务器100部署在企业网络中时,终端设备可以直接部署在企业网络内部,如企业生产网络中。终端设备也可以部署在企业外部网络,该类终端设备可以通过VPN方式等接入企业网络,与服务器100通信。具体的,终端设备可以通过无线方式接入,也可以是有线方式接入,本申请实施例对此不做限定。
终端设备通过数据报文与服务器进行交互,以请求服务。对应的,服务器向终端设备发送数据报文以提供服务或发送反馈响应。如前所述,终端和服务器之间交互的一组数据报文称为一条数据流。继续参见图1,数据流的传输路径中还可以包括一个或多个转发设备,转发设备可以用于接收数据流,并通过对数据流中的数据报文进行解析以获取五元组信息,随后根据解析得到的五元组信息中的目的IP地址将数据流发送至目的IP地址对应的设备(如服务器)。具体的,转发设备上设置有安全规则,例如,黑名单和/或白名单,其中,白名单记录有允许转发的数据流的信息;黑名单记录有不允许转发或者说需要拦截的数据流的信息,这些需要拦截的数据流可能来自于对图1所示的网络架构内的服务器或终端设备产生攻击的设备。因此,转发设备接收到数据流后,还用于提取数据流的信息,例如五元组信息,根据提取的信息和安全规则判断该数据流是否可以转发或需要拦截。例如,在检测到数据报文属于安全规则中允许通过的数据流后再进行转发,否则转发设备将该报文拦截下来不能转发,避免服务器或终端设备被非法攻击。
在实际应用中,安全规则依赖于安全管理员的经验,即安全规则是安全管理员根据已知的病毒或黑客技术配置的,不仅存在可能由于安全规则配置错误导致转发设备违法放行的情况,还可能存在无法发现未知威胁,从而导致重大安全事件发生的可能。
本领域技术人员可知,在一些场景中,终端设备与服务器进行交互时的访问行为比较固定,例如监控摄像头采集的数据流通常是送往一个特定的监控平台的服务器,这些在网络中实际传输的数据流信息具有较大的应用价值和参考意义,例如可以用于制定安全规则,或用于识别异常数据流等等场景,这将会较大地提升网络安全性。然而,现有还没有确定数据流信息的具体方案。
鉴于此,本申请实施例提供了一种确定数据流信息的方法,在该方法中,可以通过获取第一时间段内的多个数据流的流参数,基于这些数据流的流参数挖掘出具有固定访问模式的数据流组,基于该数据流组中的每一数据流的流参数确定该数据流组的组参数。本申请实施例可以挖掘出在网络中实际传输的大量数据流的访问规则,将具有相同访问规则的数据流做为一个数据流组,并确定各数据流组的组参数,这些组参数可以用于制定安全规则或异常数据流检测等众多安全场景中,避免现有完全依赖经验的安全工作场景,能够更好地应用真实传输的数据流的信息,可以用于提高网络安全的可靠性和保障。
下面具体介绍本申请实施例提供的确定数据流信息的方法,该方法可以应用于图1所示的网络架构中,需要说明的是,图1所示的网络架构仅为举例,本申请实施例对可以应 用的网络架构不做限定,在实际应用中的网络架构可以相比图1部署更多或更少的设备,例如,在服务器下还可以部署防火墙,即转发设备转发给服务器的数据还需要再经过防火墙验证通过后再转发给服务器。
图2为本申请实施例提供的一种确定数据流信息的方法的流程图,该方法可以由图1中的转发设备(例如交换机或路由器或VPN)或转发设备的旁挂设备或管理设备执行。如下首先以一个转发设备为例,对该方法进行具体说明,如图2所示,该方法可以包括如下步骤:
步骤201:转发设备获取在N个统计周期内接收到的每一数据流的流参数,所述N取正整数。
这里的统计周期可以是其他设备例如管理设备为转发设备配置的,也可以是管理设备和转发设备之间通过协议约定的,或其他方式确定的,本申请实施例对此不做限定。该统计周期可以作为统计数据流的粒度,例如,统计周期配置为30分钟,则转发设备可以基于每30分钟内侦测到的数据流执行一次本申请的确定数据流信息的方案,这样一个统计周期内统计到的数据流的数量不会太多也不会太少,避免样本数据量过多造成的计算负担和时延,同时样本数据量也不会太少,从而能够进行数据流信息的有效挖掘,尽量提高挖掘精度。当然,转发设备也可以基于多个统计周期统计到的数据流执行一次确定数据流信息的方法,例如,统计周期为30分钟,转发设备可以获取2个统计周期即60分钟内的数据流的流参数来进行数据流信息的挖掘,或者也可以理解为,直接将统计周期配置为60分钟。
示例性地,转发设备可以默认执行步骤201,也可以是在接收到启动指令后触发执行步骤201,例如,管理设备或者其他网络设备向转发设备发送启动指令,该启动指令用于指示转发设备开启数据挖掘功能,以执行步骤201。可选的,该启动指令还可以包含关于上述的统计周期的配置信息,该配置信息用于为转发设备配置统计周期。这种方式可以动态调整为转发设备配置的统计周期,调整方法比较灵活,并且也不会造成较多的信令开销。若启动指令不包含统计周期的配置信息,则转发设备可以基于上一次配置的统计周期或协议约定的或其他方式确定的统计周期执行步骤201。
可选的,该启动指令还可以包括有效次数或有效时间,有效次数或有效时间用于指示统计周期生效的有效次数或有效时间。后续,转发设备可以在生效的统计周期内执行本申请的确定数据流信息的方案,当有效次数或有效时间到达后,转发设备关闭数据挖掘功能,以使转发设备处于一种更加节能的状态。例如,有效次数为三次,即生效的统计周期为三个,该三个生效的统计周期可以是接收到启动指令后的三个统计周期。另一种关闭数据挖掘功能的方式为,其他设备例如管理设备可以发送结束指令,以指示转发设备关闭数据挖掘功能。
另外,这里是以一个转发设备为例来介绍统计周期的,需要说明的是,在一个网络架构中不同的转发设备上配置的统计周期可以是不同的。例如,在图2中,转发设备200上配置的统计周期可以是20分钟,转发设备201上配置的统计周期可以是30分钟,并且,上述统计周期的数值仅为举例,本申请对这些内容均不做限定。
下面以一个统计周期为例,对步骤201的实现过程进行具体描述,为方便描述,将该一个统计周期称为第一统计周期。
转发设备在第一统计周期内接收到多个数据流,并分别确定接收到的每个数据流的流 参数。
这里的数据流的流参数可以包括五元组信息、第一时间信息。其中,第一时间信息可以是转发设备接收到该数据流的时间或者也可以是接收到该数据流时的统计周期的周期标识。示例性地,该周期标识可以由该统计周期内的任意一个时刻来表示,例如第一统计周期为2020.10.01 15:00-2020.10.01 15:30,则其对应的周期标识可以是2020.10.01 15:00;再示例性地,周期标识也可以是该统计周期的编号。例如,编号从1开始记,即第一个统计周期的编号为1,之后按照时间顺序,每个统计周期的编号顺次加1,即转发设备记录的统计周期的编号为1,2,……,n,n为正整数。举例来说,给定统计周期2020.10.0115:00-2020.10.01 15:30的编号为1,统计周期的长度为30,则2020.10.01 15:30-2020.10.01 16:00的编号为2,2020.10.01 16:00-2020.10.01 16:30的编号为3,依此类推。示例性地,类似于五元组信息,第一时间信息也可以是解析报文确定的,例如第一时间信息为终端设备或服务器发送数据流的时间时,则该第一时间信息可以携带在该数据流的报文中。再示例性地,第一时间信息也可以由转发设备自身确定的,例如第一时间信息为转发设备接收数据流的时间,或者为统计周期的周期标识等。另外,上述第一时间信息仅为举例,该第一时间信息还可以是其他方式确定的,例如该第一时间信息可以是转发设备确定终端设备或服务器发送该数据流的时间,本申请实施例对此不做限定。
下面对五元组信息进行介绍。
一条数据流由一组五元组信息唯一标识,具体的,五元组信息包括(sip,sport,dip,dport,protocol),其中,sip(source ip)标识源IP地址,sport(source port)标识源端口号,dip(destination ip)标识目的IP地址,dport(destination port)标识目的端口号,protocol(协议)标识协议类型。其中,一条数据流中包含上行报文和/或下行报文,同一条数据流的上行报文和下行报文所包含的多个数据项是相同的但排列顺序有些差异,举例来说,给定终端IP地址为clientIP(客户端IP),终端端口号为clientPort(客户端端口号),服务器IP地址为serverIP(服务器IP),服务器端口号为serverPort(服务器端口号),协议类型为传输控制协议(Transmission Control Protocol,TCP)。以终端到服务器方向的数据报文为上行报文,则该上行报文对应的五元组信息为(clientIP,clientPort,serverIP,serverPort,TCP),这里sip取值为clientIP,sport取值为clientPort,dip取值为serverIP,dport取值为serverPort,protocol取值为TCP。服务器到终端方向的数据报文为下文报文,则该下行报文对应的五元组信息为(serverIP,serverPort,clientIP,clientPort,TCP),这里sip取值为serverIP,sport取值为serverPort,dip取值为clientIP,dport取值为clientPort,protocol取值为TCP。上述协议类型仅为举例,还可以是用户数据报协议(User Datagram Protocol,UDP),本申请实施例对此不做限定。
具体的,在第一统计周期内,转发设备可以每接收到一个数据流,解析该数据流的数据报文获取数据流的五元组信息,确定并记录该数据流的流参数。如下表1所示,表1示例性显示了转发设备在该第一统计周期内记录的数据流的流参数。
表1
Figure PCTCN2021130427-appb-000001
下文为便于描述,将用于记录流参数的对象(例如表1)称为流记录表。需要说明的是,表1所示的形式仅为举例,本申请实施例对数据流的流参数的记录形式不做限定,例如转发设备也可以记录数据流的sip,sport,dip,dport,protocol,即流记录表的表项包含sip,sport,dip,dport,protocol项,而不直接体现服务器以及终端设备。对于该类型的流记录表,需要说明的是,如前所述,由于同一条数据流中的上行报文和下行报文中的源地址和目的地址是对调的,源端口号和目的端口号也是对调的。因此,为了后续方便进行数据流信息挖掘,转发设备在统计数据流的五元组信息时可以基于同一规则进行统计,例如按照流记录表中的sip为终端IP地址,sport为终端端口号,dip为服务器IP地址,dport为服务器端口号进行统计。基于此,转发设备若接收到数据流A的首个数据报文为上行报文,则直接将该上行报文中的sip记录至sip,将sport记录至sport,将dip记录至dip,将dport记录至dport。后续,属于该同一数据流A的其他报文(下行报文和/或上行报文)可以忽略,即同一数据流不需要重复记录。若转发设备接收到数据流B的首个数据报文为下行报文,由于下行报文中的sip为服务器IP地址,sport为服务器端口号,dip为终端IP地址,dport为终端端口号,因此在根据该下文报文记录该数据流的流参数时,可以将该下行报文中的终端IP地址(dip)记录至流记录表的sip处,将终端端口号(dport)记录至流记录表的sport处,将服务器IP地址(sip)记录流记录表的至dip处,将服务器端口号(sport)记录至流记录表的dport处。
需要说明的是,本申请实施例中的统计的数据流可以是转发设备接收到的所有数据流,不需要区分该数据流是否需要转发或拦截,也就是说,上述流记录表中可能统计有需要拦截的数据流,因为需要拦截的数据流的访问规则也具有应用价值或参考价值,因此,本申请实施例对此不做限定。
步骤202:转发设备根据在第一统计周期内统计的数据流的流参数,以及一个或多个预设访问模式的流参数规则,得到至少一个数据流组。
具体的,一种预设访问模式对应于一种预设的流参数规则。示例性地,预设访问模式包括第一访问模式、第二访问模式和第三访问模式中的一种或多种。应理解,这三种访问模式仅为示意,本申请实施例对预设访问模式的类型和数量是不做限定的。如下对该三种访问模式进行详细介绍。
1,第一访问模式
第一访问模式对应的预设流参数规则,如下记为第一流参数规则。属于第一访问模式的一个数据流组内的数据流之间的关系满足于第一流参数规则,具体的,该第一流参数规则包括:同一数据流组内的数据流的协议类型相同、终端端口号不固定、服务器IP地址固定、服务器端口号固定。这里的“固定”可以理解为不变或者说完全相同或者说值没有波动,例如,数据流1的服务器IP地址为10.1.0.100,数据流2的服务器IP地址为10.1.0.100,数据流3的服务器IP地址为10.1.0.100,则可以说数据流1、数据流2和数据流3的服务器IP地址固定(相同)。这里的“不固定”可以理解为值有波动,或者说完全不同,或者说不完全相同,例如,数据流1的终端IP地址为192.168.1.100,数据流2的终端IP地址为192.168.1.101,数据流3的终端IP地址为192.168.1.102,则可以说数据流1、数据流2和数据流3的终端IP地址不固定。或者,该第一流参数规则包括:数据流组内的数据流的协议类型相同、终端IP地址不固定、终端端口号不固定、服务器IP地址属于同一预设IP地址组。例如,将提供相同服务或功能的服务器设置为一组,换句话说,终端设备发起一个服务调用请求时,可以访问该一组服务器内的任一服务器,该一组服务器中的不同的服务器IP地址组成该IP地址组。因此,若多个数据流协议类型相同、终端IP地址不固定、终端端口号不固定、服务器IP地址不同但属于同一组服务器IP地址时,可以认为是满足第一数据流规则,当然,预设IP地址组可以有多组,本申请对此也不做限定,下文相似之处,不再重复说明。
另外,在分组时,同一预设IP地址组的优先级高于单独的IP地址,换句话说,对于服务器IP地址属于预设IP地址组的数据流不单独生成数据流组,例如预设IP地址组包括10.0.1.10和10.0.1.11,当前统计的数据流11的流参数包括:服务器IP地址为10.0.1.10、服务器端口号为80,终端端口号为45530,协议类型为TCP,数据流12的流参数包括:服务器IP地址为10.0.1.11、服务器端口号为80,终端端口号为45531,协议类型为TCP,数据流13的流参数包括:服务器IP地址为10.0.1.11、服务器端口号为80,终端端口号为45532,协议类型为TCP,则数据流11、数据流12和数据流13满足第一流参数规则,为同一数据流组,不针对数据流1和数据流2单独生成数据流组。当然,如果统计到的数据流中仅包含预设IP地址组的一种IP地址,例如仅统计到数据流1和数据流2,则数据流1和数据流2为一个数据流组。下文相似之处,不再重复说明。
2,第二访问模式
第二访问模式对应的预设流参数规则,如下记为第二流参数规则。属于第二访问模式的一个数据流组中的数据流之间的关系满足于第一流参数规则,具体的,该第二流参数规则包括:数据流组内的数据流的协议类型相同、终端端口号固定、服务器IP地址固定、服务器端口号不固定;或者数据流组内的数据流的协议类型相同、终端端口号固定、服务器IP地址属于同一预设IP地址组、服务器端口号不固定。
3,第三访问模式
第三访问模式下对应的预设流参数规则,如下记为第三流参数规则。属于第三访问模式的一个数据流组中的数据流之间的关系满足于第三流参数规则,具体的,该第三流参数规则包括:数据流组内的数据流的协议类型相同、终端端口号不固定、服务器IP地址固定、服务器端口号不固定;或者该数据流组内的数据流的协议类型相同、终端端口号不固定、服务器IP地址属于同一预设IP地址组、服务器端口号不固定。
下文为便于描述,将以各流参数规则中的服务器IP地址是固定的为例进行介绍。应理解的是,在同一统计周期内,在同一访问模式下可能存在多个互相独立的数据流组,该多个数据流组属于相同的访问模式但多个数据流组所包括的全部数据流之间的关系不满足同一流参数规则,例如,数据流组1和数据流组2均属于第一访问模式,其中,数据流组1中的数据流的服务器IP地址均为10.0.1.1,服务器端口号均为80,协议类型均为TCP,终端端口号不固定。数据流组2中的数据流的服务器IP地址均为10.0.1.2,服务器端口号均为90,协议类型均为TCP,终端端口号不固定。
示例性地,若预设访问模式包括第一访问模式、第二访问模式和第三访问模式,则可以基于第一统计周期统计到的多条数据流的流参数,首先挖掘属于第一访问模式的数据流组。然后,基于剩余的数据流的流参数继续挖掘属于第二访问模式的数据流组。如果执行设备为管理设备,则可以基于再次剩余的未被划分到当前数据流组的任一数据流,继续挖掘属于第三访问模式的数据流组,下文会进行介绍。如果执行设备是转发设备,则可以不挖掘属于第三访问模式的数据流组,或者说在转发设备上预设访问模式不包括第三访问模式。
接下来以预设访问模式包括第一访问模式和第二访问模式为例,对转发设备基于该两种预设访问模式对第一统计周期内的数据流进行分组得到数据流组的过程进行具体说明。
参见图3,图3显示了转发设备数据流组挖掘(确定)的过程示意图。该过程包括如下步骤:
步骤300,基于总的流记录表,根据第一访问模式对应的第一流参数规则对数据流进行分组;具体的,按照服务器IP地址+服务器端口号+协议类型进行分组,得到至少一个初始分组。其中,每一初始分组中的数据流的服务器IP地址相同,服务器端口号相同,协议类型相同。
这里的总的流记录表可以理解为用于记录第一统计周期内的所有数据流的流参数的记录表,例如表1。应理解,若转发设备基于多个统计周期统计的数据流进行一次数据流组挖掘,则总的流记录表为该多个统计周期统计的所有数据流的流参数的记录表。
在确定初始分组时,一种可选的实施方式,一条数据流也可以划分为一个初始分组,即在步骤300进行分组时,可以不限定初始分组内数据流的条数。参见表2,表2是在表1的基础上,按照上述分组条件(服务器IP地址+服务器端口号+协议类型)确定的初始分组。
表2
Figure PCTCN2021130427-appb-000002
另一种可选的实施方式,在步骤300中,分组条件还可以在上述列举的条件的基础上,增加每个初始分组至少包括两个数据流这一条件,以此来确定初始分组,这样对于单独的一条数据流则不能作为一个初始分组。
步骤301,针对任一初始组合,判断该初始分组内的总流数是否大于预设阈值,如果是,则执行步骤302。
示例性地,该预设阈值可以是1,应理解,一条数据流无法判断该数据流满足哪一种流参数规则,因此可以根据步骤301对初始分组进行筛选,清洗掉初始分组中流数为1的初始分组。需要说明的是,若预设阈值为1,且分组条件还包括一个初始分组包括至少两个数据流的条件,则可以不执行步骤301。若分组条件不限定初始分组内的数据流的数量,则执行步骤301。需要说明的是,上述预设阈值为1仅为举例,本申请实施例对该预设阈值的取值不做限定,例如,还可以是10,20等任意正整数,意为当一个初始分组内的数据流数量较小时,可以不确定该初始分组属于哪一种访问模式,该方式可以在提高挖掘数据流所反映出的访问行为的准确性基础上,减少执行主体的运算量。
步骤302:判断该初始分组内的数据流的终端端口号是否不固定;如果是,则确定该初始分组属于第一访问模式的数据流组(参见步骤303)。
示例性地,终端端口号是否不固定可以通过判断终端端口号的值是否波动来确定,或者说该初始分组内的数据流的终端端口号的值的波动是否为0,如果不是0,则该终端端口号的值有波动,或者说终端端口号不固定。
另外,如前所述,同一访问模式可能存在多个数据流组。因此,应理解的是,步骤301-步骤302可以是循环执行的步骤,例如,结合表2所示,可以首先针对组合1执行步骤301-步骤302;之后,针对组合2执行步骤301-步骤302,依此类推,直到判断完所有的初始分组为止(即步骤302)。如果该初始分组内的数据流的终端端口号不固定,则说明该初始分组内的数据流满足第一流参数规则,该初始分组为属于第一访问模式的数据流组,否则,确定该初始分组不属于第一访问模式,当所有初始分组均执行完毕,继续执行步骤304。应理解,对于确定的不属于第一访问模式的初始分组,这些初始分组内的数据流将会继续 参与后续数据流组挖掘流程。
步骤304:对总的流记录表进行清洗,去除属于第一访问模式的数据流组的数据流的流参数。
示例性地,在该举例中为去除表1中属于第一访问模式的数据流组中的数据流的流参数,得到剩余的数据流的流参数。
步骤305:基于剩余的数据流的流参数,按照第二访问模式对应的第二流参数规则对数据流进行分组;具体的,按照服务器IP地址+终端端口号+协议类型进行分组,得到至少一个初始组合。
步骤305的具体执行步骤请见步骤300的相关描述,此处不再赘述,应理解,步骤305与步骤300中的不同之处在于两者的分组条件是不同的。需要说明的是,该步骤305确定的初始分组与步骤300确定的初始分组是不同的,为便于区分,也可以将步骤300确定的初始分组称为第一初始分组,将步骤305确定的初始分组称为第二初始分组。
步骤306,针对任一第二初始组合,判断该第二初始分组内的总流数是否大于预设阈值,如果是,则执行步骤307。
需要说明的是,步骤306是可选的步骤,若步骤305中,若分组条件还包括一个第二初始分组包括至少两个数据流的条件,则可以不执行步骤306。若分组条件不限定第二初始分组内的数据流的数量,则执行步骤306。
步骤307:判断该第二初始分组中的数据流的服务器端口号是否不固定;如果是,则确定该第二初始分组为属于第二访问模式的数据流组(参见步骤308)。
参见图3所示,为基于图3所示方法,对表1所示的数据流的流参数进行挖掘得到的各数据流组的示意。
表3
Figure PCTCN2021130427-appb-000003
其中,上述表3中最后两行显示的两条数据流不属于当前的任意一个数据流组,如下将当前未被划分到的任一数据流组的数据流称为零散数据流,一个或多个零散数据流构成零散数据流集合。另外,需要说明的是,表3中的数据流组这一列为可选列,仅为便于描述表1中的数据流所属的数据流组,实际应用中,转发设备确定的数据流组的组参数中可以不包括数据流组的索引,在存储组参数时可以通过数据流组的索引进行存储。
步骤203:针对任一数据流组,转发设备确定该数据流组的组参数。
示例性地,组参数包括但不限于:协议类型、服务器IP地址、服务器端口号范围、终端端口号范围;可选的,在上述基础上,组参数还可以包括但不限于下列中的部分或全部:终端的IP地址集合、第二时间信息(或称为时间模式信息)、流数、访问模式标识、终端的数量、流支持度、设备访问支持度。
下面分别对部分组参数进行解释说明。
1,服务器端口号范围/终端端口号范围
某数据流组的组参数中的服务器端口号范围,是根据该数据流组中的数据流的流参数确定的,具体的,服务器端口号范围的下限为该数据流组内的数据流中的服务器端口号最小值,对应的,该服务器端口号范围的上限为该数据流组内的数据流中服务器端口号最大值。
同理,某数据流组的组参数中的终端端口号范围,是根据该数据流组中的数据流的流参数确定的,具体的,终端端口号范围的下限为同一数据流组内的数据流中的终端端口号最小值,对应的,该终端端口号范围的上限为该数据流组内的数据流中的终端端口号最大值;
需要说明的是,在一些数据流组中,例如属于第一访问模式(即服务器端口号固定) 的数据流组中,服务器端口号的最小值和服务器端口号的最大值是相同的。例如属于第二访问模式(即终端端口号固定)的数据流组中,终端端口号的最小值和终端端口号的最大值是相同的。
2,终端的IP地址集合
终端的IP地址集合,包括该数据流组中的数据流所对应的所有不同的终端IP地址,例如,192.168.1.100、192.168.1.102、192.168.1.103。为简化描述,下文以符号~表示连续的IP地址,例如上述示例还可以表示为192.168.1.100、192.168.1.102~103或表示为192.18.1.100|102|103。应理解,同一数据流组内同一终端IP地址仅记录一次,也就是说终端的IP地址集合是对数据流组的数据流所对应的全部IP地址去重后得到的。
3,终端的数量
终端的数量,与终端的IP地址集合有关,可以是终端的IP地址集合所包含的不同的IP地址的数量,即该数据流组中具有不同IP地址的终端的数量。例如,终端的IP地址集合包括192.168.1.100|102|103,则终端的数量为3。
4,第二时间信息
第二时间信息,用于指示统计周期对应的时间信息,第二时间信息可以是数据流组的第一时间信息,也可以是用于表示数据流组所属的预设时间范围,或者说第一统计周期所在的预设时间范围,具体的,可以根据该数据流组的第一时间信息来确定该数据流组所属的预设时间段。
举例来说,按照工作时间作息配置两个预设时间范围,例如包括预设时间范围1(8:00-17:30)和预设时间范围2(17:30-次日8:00),其中,预设时间范围1用于表示工作时间、预设时间范围2用于表示非工作时间。第二时间信息可以是预设时间范围对应的标识,例如预设时间范围1的标识为1,预设时间范围2的标识为2,若第一统计周期为2020.10.01 15:00-2020.10.01 15:30(即第一时间信息),则第一统计周期属于预设时间范围1,对应的,第二时间信息为1。这样便能够更好的区分出数据流出现的合理性,如果是在非工作时间,终端访问了在工作时间才提供服务的服务器,则很可能为非法访问,这样有利于挖掘出正常访问数据流的特征,和/或异常访问数据流的特征。
当然,上述配置的预设时间范围仅为举例,本申请实施例对此不做限定,例如还可以划分更细粒度的时间范围,例如,预设时间范围包括0:00-6:00、6:00-12:00、12:00-18:00、18:00-24:00,对应的,该4个时间范围所对应的标识可以是1,2,3,4。需要说明的是,上述预设时间范围和预设时间范围对应的标识仅为举例,该标识还可以是其他表示方式,例如由数字、字母、符号中的一项或多项来表示,本申请实施例不做限定。需要说明的是,这里的预设时间范围不区分日期,仅关注时间,即不同日期的相同时间属于同一预设时间范围。
应理解,这里是为便于区分,将上文的数据流的流参数中的时间信息记为第一时间信息,将组参数中的时间信息记为第二时间信息。本申请中涉及的第一、第二等各种数字编号仅为描述方便进行的区分,并不用来限制本申请实施例的范围或先后顺序。下文为便于描述,将按照上文中介绍的第二时间信息的标识包括1,2为例进行描述。
5,流数
流数,用于表示该数据流组所包含的数据流的条数,例如以表2为例,第一访问模式下的数据流组的流数为3,第二访问模式和第三访问模式下的数据流组的流数分别为2。
6,访问模式标识
访问模式标识,为预设访问模式的标识,用于指示该数据流组所属于的预设访问模式,例如上文中的第一访问模式、第二访问模式、第三访问模式的访问模式标识可以分别为1、2、3。当然,本申请实施例中的任一标识还可以有其他表示方式,例如由数字、字母、符号中的一项或多项来表示,本申请实施例不做限定。
在一种实施方式中,转发设备或转发设备的旁挂设备确定的组参数不包括流支持度和设备访问支持度,下文将会在对该两个参数进行详细说明。
具体的,在步骤203中,针对任一个数据流组,转发设备确定该数据流组的组参数,并按照预设格式记录组参数。
举例来说,假设转发设备确定每个数据流组的组参数包含:协议类型、服务器IP地址、服务器端口号范围、终端端口号范围、终端的IP地址集合、第二时间信息、流数、访问模式标识。示例性地,数据流组的组参数的预设格式可以是:[服务器IP,服务器端口号,终端端口号,端口号最小值,端口号最大值,协议类型,流数,终端的IP地址集合,第二时间信息,访问模式标识]。
参见表4,表4显示了,在表3的基础上得到的预设格式的各数据流组的组参数。
表4
Figure PCTCN2021130427-appb-000004
其中,端口号最小值和端口号最大值可以用于表示服务器端口号范围,也可以用于表示终端端口号范围。当表示服务器端口号范围时,则单独的服务器端口号处可以用-1表示;若用于表示终端端口号范围时,则单独的终端端口号处用-1表示,其中-1代表无效值。举例来说,如表4所示,数据流组1的终端端口号为-1,表示该数据流组的终端端口号不固定,该终端端口号最小值为45527,终端端口号最大值为45529;数据流组3的服务器端口号为-1,表示该数据流组的服务器端口号不固定,服务器端口号最小值为8080,服务器 端口号最大值为8081。访问模式标识为-1的标识零散数据流。
需要说明的是,上述记录组参数的预设格式仅为举例,本申请实施例该格式不做限定,任何可以记录组参数的方式均适用于本申请实施例。
对于步骤202至步骤203,一种可实现的方式为,确定出部分或全部的数据流组后再一一起确定数据流组的组参数。在另一种可实现的方式中,步骤202和步骤203也可以合为一个步骤,即转发设备在步骤202中确定数据流组时确定该数据流组的组参数,示例性地,请参见图4,图4示出了另一种数据挖掘的方法流程示意图,其中,图4所示步骤与图3中步骤相同之处不再赘述,以下仅就不同之处进行说明:步骤403,确定该第一初始分组为属于第一访问模式的数据流组,并按照预设格式记录该数据流组的组参数。步骤408,确定该第二初始分组为属于第二访问模式的数据流组,并按照预设格式记录该数据流组的组参数。
为便于说明,假设转发设备进行数据流信息挖掘的粒度为一个统计周期,即转发设备每次基于一个统计周期内的数据流的流参数进行挖掘,得到该统计周期的统计结果,即一个统计周期对应一个统计结果。该统计结果可以包括至少一个统计周期内的多个数据流的流参数确定的数据流组的组参数,或包括至少一个统计周期内的多个数据流的流参数确定的数据流组的组参数和确定的零散数据流的流参数。
步骤204,转发设备将至少一个统计结果发送至管理设备,对应的,管理设备接收转发设备发送的至少一个统计结果。
示例性地,继续参见表4,转发设备可以将表4所示的各数据流组的组参数上报至管理设备。对于零散数据流,示例性地,转发设备可以直接上报该零散数据流的流参数。再示例性地,转发设备也可以对齐数据流组的组参数的上报格式,按照组参数的预设格式生成零散数据流的“组参数”,并上报该零散数据流的“组参数”,参见表4。应理解,该零散数据流的“组参数”,仅用于表示按照组参数的预设格式和该零散数据流的流参数生成的用于上报给管理设备该零散数据流的流参数的上报信息,并非表示该零散数据流为一个数据流组。为便于描述,下文均将其称为零散数据流的组参数,当然,如果转发设备不需要将统计结果上报给管理设备,或不需要向管理设备上报零散数据流的流参数,则也可以不再对零散数据流的流参数进行处理。下文假设统计结果包括零散数据流的组参数。
如果转发设备确定出数据流组的组参数后,可以将这些组参数上报至管理设备,后续,这些组参数用于确定安全规则。
在一种可实施的方式中,转发设备可以将每个统计周期的统计结果直接上报给管理设备,即在执行步骤203后,不需要等待,可以立即将步骤203确定的至少一个数据流组,零散数据流的组参数上报给管理设备,减少组参数到达管理设备的时延。
在另一种可实施的方式中,转发设备可以基于配置的上报周期向管理设备进行上报。即转发设备可以在上报周期内循环执行多次步骤201-步骤203,每次执行所针对的统计周期是不同的,应理解每一次执行步骤201-步骤203便可以得到一个统计结果。参见图5,图5显示了基于上报周期进行上报的场景示意图,示例性地,转发设备可以分别将该上报周期内得到的多个统计结果一起上报。举例来说,若该上报周期包括m个统计周期,m个统计周期对应m个统计结果,转发设备可以将m个统计结果一起上报。再示例性地,在上报之前,转发设备还可以对该m个统计结果进行处理,示例性地,如下以该m个统计结果为例,对其处理方式进行详细介绍。
示例性地,确定该m个统计结果中属于第一访问模式的多个数据流组中的相同的数据流组,这里相同的数据流组是指,至少两个数据流组所包含的数据流之间的关系满足第一流参数规则,并将该至少两个数据流组进行合并。举例来说,统计周期1和统计周期2中可能均存在一个协议类型为TCP,服务器IP地址为10.0.0.1,服务器端口号80,终端端口号不固定的数据流组,则该两个数据流组内的数据流之间的关系实际上满足相同的第一流参数规则,该两个数据流组为相同的数据流组,应理解,相同的数据流组的组参数中的部分项可以是有差异的,例如数据流数不同、终端IP地址集合不同等等。应理解,属于相同数据流组的至少两个数据流组存在于不同的统计结果中。下文进行详细介绍。同理,确定该m个统计结果中属于第二访问模式的多个数据流组中的相同的数据流组,将该至少两个数据流组进行合并。
具体的,可以根据至少两个数据流组中的数据流之间的关系是否满足同一流参数规则来判断该至少两个数据流组是否为同一数据流组。举例来说,假设m=2,即上报周期包含2个统计周期,一个统计周期对应一个统计结果,每个统计结果包括基于该统计周期确定的多个数据流组中每个数据流组的组参数。给定该2个统计周期的结果分别为统计结果1、统计结果2。假设统计结果1如上述表4所示,统计结果2如下表5所示。
表5
Figure PCTCN2021130427-appb-000005
下面结合表4和表5理解,首先参见表4,其中,统计结果1中包括数据流组1,该数据流组1对应的预设格式的组参数信息为[10.1.0.100,80,-1,45527,45529,TCP,3,192.18.1.100|101|102…]。参见表5,统计结果2中包含数据流组11,该数据流组11对应的预设格式的组参数信息为[10.1.0.100,80,-1,45523,45528,TCP,5,192.18.1.100|101|104…]。其中,数据流组1和数据组11的协议类型相同、服务器IP地址相同,服务器端口号固定,终端端口号不固定,因此,确定数据流组1的数据流和数据流组11的数据流之间的关系满足第一流参数规则,数据流组1和数据流组11属于相同的数据流组。具体的,确定数据流之间的关系是否满足预设流参数规则的方式可以参见上文图3或图4中的相关描述,此处不再赘述。为便于描述,下文将以数据流组代指数据流组的数据流,至少两个数据流组满 足相同的流参数规则是指该至少两个数据流组中的数据流之间的关系满足该流参数规则。
具体的,转发设备基于表4和表5进行筛选,筛选出满足相同流参数规则的多个数据流组,筛选结果包括:1)数据流组1和数据流组11满足相同的第一流参数规则,即数据流组1和数据流组11属于同一数据流组。2)数据流组3和数据流12满足相同的第二流参数规则,其中服务器IP地址固定均为10.1.0.101,终端端口号固定均为55555,协议类型相同均为TCP,终端端口号不固定,即数据流组3和数据流组12属于同一数据流组。
后续,转发设备将属于同一数据流组的多个数据流组进行合并,具体的:将该多个数据流组的组参数进行合并,其中,更新合并后的数据流组的组参数的操作包括:对数据流的流数进行求和、端口号范围进行更新、终端的IP地址集合进行合并去重等。例如,数据流组A,数据流组B,…,数据流组N为同一数据流组,则合并后的流数=数据流组A的流数+数据流组B的流数+,…,数据流组N的流数。合并后的服务器端口号最小值为在数据流组A,数据流组B,…,和数据流组N中的服务器端口号最小值;服务器端口号最大值为在数据流组A,数据流组B,…,和数据流组N中的服务器端口号最大值。同理,合并后的终端端口号最小值为在数据流组A,数据流组B,…,和数据流组N中的终端端口号最小值;终端端口号最大值为在数据流组A,数据流组B,…,和数据流组N中的终端端口号最大值。终端的IP地址集合包括在数据流组A,数据流组B,…,和数据流N中所有不重复的(或者不同的)终端IP地址,例如,数据流组1的终端IP地址集合包括192.168.1.100|101|102,数据流组11的终端IP地址集合包括192.168.1.100|101|103,其中,192.168.1.100|101重复,去重后的IP地址包括192.168.1.100|101|102|103。当然,上述端口号范围更新方式仅为介绍,某些数据流组的服务器端口号可能是固定的,或者终端端口号是固定的,若是固定的,则不需要更新。
参见表6,表6显示了基于表4和表5进行合并处理后的各组参数的组参数。为便于说明,数据流组1和数据流组11合并后记为数据流组1a;数据流组3和数据流12合并后记为数据流组2a。
表6
Figure PCTCN2021130427-appb-000006
对于零散数据流,转发设备可以不进行处理。后续,针对该上报周期,转发设备仅需上报表6所示的各组参数,该上报方式可以有效减少冗余信息的重复上报,节省资源开销。
上述是以转发设备作为执行设备为例对本申请提供的确定数据流信息的方法进行的具体介绍,需要说明的是,该方法中的执行设备还可以是其他设备,示例性地,其他设备可以是转发设备的旁挂设备,例如网络探针,其中,用于侦听网络数据包的网络探针称为互联网探针。网络数据包捕获、过滤、分析都能在网络探针上实现。
如下简要介绍,由网络探针作为执行主体,执行本申请确定数据流信息的方法时的操作流程包括:转发设备接收数据报文后,并行执行两个操作:包括操作1,正常进行数据报文的转发,即根据安全规则判断该数据报文是否转发或拦截,若允许转发,则转发报文,否则拦截该报文。操作2,对该数据报文进行复制,得到该数据报文的副本,将该数据报文的副本通过转发设备上特定的端口号(称为镜像端口号)镜像(或者说转发)给网络探针。后续网络探针根据接收到的报文确定并记录数据流的流参数,以及后续的挖掘数据流组及确定组参数等操作,其余流程请参见图2所示的确定数据流信息的方法中转发设备执行的操作步骤,此处不再赘述。这种方式下,对转发设备的硬件资源要求较低,不需要提高当前网络中转发设备的硬件资源,可以在不更改当前转发设备的软件和/或硬件资源以及不影响转发设备进行数据流转发等正常业务的基础上实现本申请技术方案,更有利于在现 有网络中推广部署该技术方案,实用性强。
需要说明的是,上文中转发设备确定的组参数仅为举例,应理解,不同的设备上配置的组参数可能是不同的,例如转发设备确定的组参数为表4所示的组参数,而管理设备确定的组参数可以比表4具有更多或更少的数据项,例如管理设备确定的组参数还可以包含流支持度和/或设备访问支持度等等。另外,管理设备可能接收到一个或多个转发设备上报的组参数,例如,参见图1,管理设备300可以同时接收到转发设备200和转发设备201上报的组参数,管理设备300可以基于这些组参数再次进行处理,以挖掘图1所示的全局网络的数据流的访问规律。
接下来以管理设备为例,对管理设备执行本申请实施例的确定数据流信息的方法的流程进行介绍。
请参见图6,图6为本申请实施例提供的另一种确定数据流信息的方法的流程图,该方法可以由图1中的管理设备执行。如图6所示,该方法可以包括:
步骤601:管理设备接收一个或多个第一设备发送的第一组参数。
其中,为了便于区分,将管理设备接收到的组参数称为第一组参数,下文将管理设备自身确定的组参数称为第二组参数,例如第一组参数为上文图2步骤204中转发设备确定的组参数的相关介绍,此处不再重复说明。
基于图1所示的网络架构,这里的管理设备可以是集成安全分析功能组件的网络管理设备或集成安全分析功能组件的云平台,对应的这里的第一设备可以是转发设备,也可以是转发设备的旁挂设备。以转发设备为例,转发设备可以将一个上报周期内确定的各第一组参数发送给集成安全分析功能组件的网络管理设备或云平台,即网络管理设备或云平台可以接收一个或多个转发设备上报的第一组参数。
步骤602:管理设备对该一个或多个第一设备发送的第一组参数进行处理。
示例性地,管理设备可以按照配置的汇聚周期来执行步骤602。例如,转发设备的上报周期为1小时,管理设备的汇聚周期可以是4小时、1周或1个月等等。假设为1周,则管理设备可以将1周内接收到的所有转发设备上报的第一组参数进行保存,当到达汇聚时间时,管理设备可以基于多个转发设备上报的多个第一组参数进行合并处理和/或数据流组挖掘。
这里需要说明的是,不同的转发设备或旁挂设备之间的统计周期的长度可能是不同的,上报周期的长度也可能是不同的,但不同的转发设备、旁挂设备以及管理设备上配置的第二时间信息可以是统一的,例如,不同设备上的第二时间信息的标识为1和2,标识1均表示8:00-17:00,标识2均表示17:00-次日8:00。管理设备在进行合并处理和/或数据流组挖掘之前,可以按照数据流组的第二时间信息对多个第一组参数进行分组,将第二时间信息相同的多个第一组参数作为一组,也可以理解为,第二时间信息相同是指该多个数据流组的数据流出现的时间在同一预设时间范围内,比如,均出现在8:00-17:00内。后续,将作为同一组的第一组参数进行合并等处理。例如,第二时间信息按照工作时间和非工作时间划分为1和2,那么可以将工作时间内(即第二时间信息为1)的所有数据流组的第一组参数作为一组,这样有利于分析出工作时间内的数据流的访问规律。同理,将非工作时间的所有数据流的第一组参数作为一组,以分析出非工作时间内的数据流的访问规律。
后续,管理设备将分别基于第二时间信息相同的多个数据流组的第一组参数进行处理,例如,对相同的数据流组的第一组参数进行合并,与转发设备不同的是,管理设备还会对 零散数据流进行处理,并基于零散数据流挖掘属于第三访问模式的数据流组,下面对管理设备对汇聚周期内的多个第一组参数进行处理的方法进行介绍:
处理方式一:先进行合并处理,再进行数据流清洗,最后进行数据流组挖掘。
下面分别对上述三个过程进行具体说明:
(1)合并处理:类似的,合并处理,是指管理设备基于汇聚周期内(第二时间信息相同的)的多个第一组参数,将其中的属于第一访问模式的多个数据流组进行分组,将数据流满足相同第一流参数规则的多个数据流组作为一组,后续,将同一组的多个数据流组进行合并,并更新合并后的数据流组的第一组参数。同理,对属于第二访问模式的多个数据流组的合并处理方式相同,具体可以参见上文转发设备对统计周期内的多个统计结果进行合并处理的方法流程,此处不再赘述。
(2)数据流清洗:基于(1)合并处理中的原始样本(即汇聚周期内第二时间信息相同的多个第一组参数)中的零散数据流进行清洗,清洗掉属于当前第一访问模式的数据流组的零散数据流,或属于第二访问模式的数据流组的零散数据流。
示例性地,仍结合上述例子,假设汇聚周期内的第一组参数包括表6,根据表6确定当前已存在的属于第一访问模式或第二访问模式的数据流组包括:数据流组1a、数据流组2、数据流组2a,确定表6中的零散数据流中是否有属于数据流组1a或数据流组2或数据流组2a的零散数据流,如果有,则将该零散数据流合并到其对应的数据流组中,并清洗掉该零散数据流的第一组参数记录。其中,判断零散数据流是否属于某数据流组的方式可以是,判断该零散数据流是否满足该数据流组对应的流参数规则,例如,数据流组1a属于第一访问模式,数据流组1a对应的流参数规则为服务器IP地址固定为10.10.1.100,服务器端口号固定为80,协议类型为TCP,终端端口号不固定,其中,若某零散数据流的服务器IP地址为10.10.1.100,服务器端口号为80,协议类型为TCP,则确定该零散数据流属于数据流组1a,可以参考上文合并处理中确定多个数据流组是否属于同一数据流组的相关介绍,此次不再赘述。
具体的,结合表6进行数据流清洗,其中,表6中最后一行所示的零散数据流满足数据流组2的流参数规则,将该零散数据流合并到数据流组2中,并根据该零散数据流组的第一组参数更新该数据流组2的组参数,以及清洗掉该零散数据流的记录。应理解的是,更新后组参数可能没有变化。参见表7,表7显示了清洗后的数据流组的组参数。
表7
Figure PCTCN2021130427-appb-000007
应理解,表6仅为举例,若汇聚周期包含多个第二时间信息相同的第一组参数,则每一零散数据流需要与该多个第一组参数中指示的每个数据流组的流参数规则进行比对,以判断该零散数据流是否可以被合并到当前的数据流组中。
(3)数据流组挖掘:由于清洗后,剩余的零散数据流中还可能存在属于第一访问模式或第二访问模式的数据流组或属于第三访问模式的数据流组。因此,数据流组挖掘的过程包括:基于对原始样本进行清洗后的剩余的零散数据流再进行一次数据挖掘,可能挖掘出新的属于第一访问模式的数据流组,或可能挖掘出新的属于第二访问模式的数据流组,并分别确定这些新的数据流组的第一组参数,在顺序挖掘出属于第一访问模式的新的数据流组、属于第二访问模式的新的数据流组后,基于除去这些新的数据流组的数据流之外剩余的零散数据流继续挖掘属于第三访问模式的数据流组。具体请参见图7,图7显示了上述处理方法的完整流程,其中图7中挖掘第一访问模式和第二访问模式的数据流组的过程与图3或图4的相关流程相似,此处不再赘述。该流程包括:
步骤700:接收转发设备a至转发设备n在汇聚时间周期内上报的多个第一组参数。
步骤701a:在该多个第一组参数中,选择模式标识为1的数据流组。
步骤702a:按照协议类型相同+服务器IP地址相同+服务器端口号相同+终端端口号不固定+第二时间信息相同进行分组。
步骤703a:将属于同一组的多个数据流组进行合并,并更新合并后的数据流组的第一组参数。
步骤701b:在该多个第一组参数中,选择模式标识为2的数据流组。
步骤702b:按照协议类型相同+服务器IP地址相同+服务器端口号不固定+终端端口号固定+第二时间信息相同进行分组。
步骤703b:将同一组的多个数据流组进行合并,并更新合并后的数据流组的第一组参数。
步骤704:判断任一零散数据流是否属于当前存在的数据流组。
步骤705:将该零散数据流合并至其所属于的数据流组,并根据该零散数据流的第一组参数更新该数据流组的第一组参数。
步骤706,按照服务器IP地址+协议类型进行分组,得到至少一个初始分组。
步骤707,判断该初始分组的数量是否大于预设阈值,如果是,则执行步骤707。
步骤708,判断该初始分组中的数据流的服务器端口号是否不固定,且终端端口号是否不固定,如果是,则确定该初始分组为属于第三访问模式的数据流组(参见步骤709)。
应理解,步骤707-步骤709可以是重复执行的步骤,直至所有的初始分组均已判断完毕。
举例来说,仍结合上述示例,基于该表7中所示的剩余的零散数据流再次进行数据挖掘,其中,数据流a和数据流c满足相同的第一流参数规则,可以生成为数据流组4a,数据流组4a属于第一访问模式。数据流b和数据流e满足第三流参数规则,开生成数据流组5a,数据流组5a属于第三访问模式。具体的挖掘结果如下表8所示。
表8
Figure PCTCN2021130427-appb-000008
处理方式二:先进行数据流组挖掘,再进行数据流清洗,最后进行合并处理。
(1)数据流组挖掘及清洗:示例性地,首先,基于原始样本(汇聚周期内的多个第一组参数)中的零散数据流的第一组参数进行再次数据挖掘,尝试挖掘出属于第一访问模式的数据流组、属于第二访问模式的数据流组、属于第三访问模式的数据流组,并确定或更新各数据流组的第一组参数(或第二组参数),并清洗掉该零散数据流,即删除该零散数据流的记录。需要说明的是,这里挖掘出的属于第一访问模式或第二访问模式的数据流组可能是挖掘之前已存在的,如果是已存在的;当然,也可能是当前不存在的,即挖掘出新的数据流组。具体参见上文的相关介绍,此处不再赘述。
(2)合并处理:基于(1)中挖掘出的数据流组进行相同数据流组的合并。具体的执行方式请参见上文的相关描述,此处不做重复说明。
需要说明的是,管理设备进行上述处理之后,该汇聚周期内还可以存在不属于任一预设访问模式(第一访问模式、第二访问模式或第三访问模式)的零散数据流,对于这部分数据流可以丢弃也可以保留下来继续参与后续的运算,例如确定该零散数据流的“第二组参数”。
步骤603:管理设备确定每一数据流组的第二组参数。
这里的第二组参数可以是第一组参数,如前所述,每个设备上配置的组参数可能是不同的,因此为与其他设备确定的组参数进行区分,这里的管理设备确定组参数称为第二组参数,将管理设备接收到的其他设备发送的组参数称为第一组参数。
示例性地,第二组参数可以包括服务器IP地址、终端端口号最小值、终端端口号最大值、服务器端口号最小值、服务器端口号最大值、协议类型、流支持度和设备访问支持等。
为便于说明,如下以列表形式来介绍第二组参数。举例来说,参见表9,表9显示某个汇聚周期得到的第二组参数。需要说明的是,表9为单独举例说明,不一定是通过上述表1~表8确定的。
表9
Figure PCTCN2021130427-appb-000009
如下分别对流支持度和设备访问支持度进行介绍。
1)流支持度
流支持度,是根据一组数据流组的数据流的流数与本次统计(例如一个汇聚周期内)的所有数据流的总流数确定的。其中,转发设备可以丢弃不属于任一预设访问模式的零散数据流,则总流数可以是数据流组所包含的数据流的总流数;转发设备也可以保留这部分零散数据流,则总流数可以是该汇聚周期内的所有的数据流。这里是一个汇聚周期为例进行说明,若管理设备基于一个预设的时间段或指定的时间段进行统计,该总的数据流则是根据该预设的时间段或指定的时间段内的数据流的条数确定的。示例性地,流支持度满足于:流支持度=数据流组的流数/总流数。举例来说,参照表9第一行所示的数据流组,该数据流组的流支持度=100/(100+150+50+1)*100%=33.22%。
2)设备访问支持度
设备访问支持度,是根据一组数据流组中的终端的数量与本次统计的所有数据流对应的终端的总数量确定的。同理,本次统计的所有数据流可以是数据流组所包含的数据流,如果保留零散数据流,则该所有的数据流是指数据流组的数据流和零散数据流,同样的, 本次统计的所有数据流是指一个汇聚周期或一个预设的时间段或指定的时间段内的数据流,参见上文介绍,此处不再赘述。示例性地,设备访问支持度满足于:设备访问支持度=数据流组中终端的数量/总的终端的数量。举例来说,继续参照表9第一行所示的数据流组,该数据流组的设备访问支持度=50/(50+30+2+1)*100%=60.24%。
需要说明的是,(1)图7中也可以直接确定合并或更新后的数据流组的第二组参数。(2)若管理设备为云平台时,则云平台还可以接收一个或多个网络管理设备上报的第二组参数,云平台可以直接存储该第二组参数,云平台还可以基于多个第二组参数再次进行数据流组挖掘,具体参见执行图6或图7中执行主体执行的操作,此处不再赘述。
本申请实施例还提供了另一种确定数据流信息的方法,该方法中,转发设备或转发设备的旁挂设备可以将统计到的多个数据流的流参数(例如流记录表)发送至管理设备,即转发设备或旁挂设备不进行数据挖掘,统一由管理设备来进行数据挖掘。参见图8,图8为本申请实施例提供的该确定数据流信息的方法的流程示意图,该方法包括如下步骤:
步骤801:第一设备获取在N个统计周期内接收到的每一数据流的流参数,所述N取正整数。
示例性地,该第一设备为转发设备,转发设备执行步骤801时可以参见上文步骤201的具体介绍,此处不再赘述。
再示例性地,该第一设备还可以是转发设备的旁挂设备(例如上文所述的网络探针)。为便于理解,首先对包含旁挂设备的网络架构进行简要介绍,在同一网络架构中,一个管理设备下可以连接一个或多个旁挂设备,一个旁挂设备可以对应于一个或多个转发设备。
步骤802:第一设备将多个数据流的流参数发送至管理设备,对应的,管理设备接收一个或多个第一设备发送的多个数据流的流参数。
下面对旁挂设备为网络探针为例,对网络探针作为执行主体执行步骤801的完整流程进行介绍:其中对于转发设备接收到数据报文以及将数据报文镜像至网络探针的方式请参见上文相关说明,此处不再赘述。后续,网络探针分别确定从一个或多个转发设备处接收到的多个数据流的流参数,并将这些流参数发送至管理设备。
类似的,一种可实施的方式,第一设备可以将获取到的数据流的每个数据流的流参数直接发送至管理设备。另一种可实施的方式,第一设备也可以按照上报周期进行将多个数据流的流参数一起上报至管理设备。具体的,示例性地,旁挂设备可以将数据流的五元组信息和第一时间信息上报至管理设备,再示例性地,旁挂设备也可以生成流记录表,将该流记录表上报至管理设备。当然,若由网络探针确定数据流的流参数时,则数据流的第一时间信息便可以是网络探针接收该数据流的时间,其余流程请参见图2中转发设备生成流记录表的具体操作步骤,此处不再赘述。
步骤803:管理设备根据接收到的在第一时间段内的多个数据流的流参数和至少一个预设访问模式,对该多个数据流进行分组,得到至少一个数据流组。
管理设备可以接收一个或多个第一设备发送的多个数据流的流参数,该流参数包含数据流的五元组信息和第一时间信息,由于不同的第一设备上的上报周期长度可能是不同的,因此,管理设备可以基于该多个数据流的第一时间信息划分出属于同一时间段(例如记为第一时间段)的多个数据流。示例性地,根据数据流的第一时间信息确定该数据流的第二时间信息,参见上文的相关描述,其中,第二时间信息相同的为同一时间段的数据流。或者,也可以是自定义的时间段,本申请实施例对此不做限定。
示例性地,管理设备基于汇聚周期确定第一时间段内的多个数据流,同理,该第一时间段可以是不同日期中的同一时间段,对该汇聚周期内的第一时间段的多个数据流进行分组,具体的,管理设备对该多个数据流进行分组的方式包括:基于该多个数据流的流参数,首先确定属于第一访问模式的数据流组,然后,根据剩余的数据流确定属于第二访问模式的数据流组,最后,基于上一步完成后剩余的数据流确定属于第三访问模式的数据流组,剩下没有被划分为数据流组的数据流为零散数据流,如前所述,零散数据流可以丢弃也可以保留。具体可以参见或结合上文中的一个或多个实施例的相关描述,这里不再赘述。
其中,确定属于第一访问模式或第二访问模式的数据流组的方法可以参见图3或图4的描述,确定属于第三访问模式的数据流组的方法可以参见图7中步骤706至步骤709的描述,此处不再赘述。
步骤804:针对确定的任一数据流组,管理设备确定该数据流组的组参数。
示例性地,该组参数可以是上文中表9所示的组参数,此处不再重复说明。当然,如果确定了流支持度,则可以删除组参数中流数的记录;同理,如果确定了设备访问支持度,则可以删除组参数中设终端设备的数量的记录。
本申请实施例还提供了另一种数据处理的方法,在该方法中,转发设备或旁挂设备将数据流镜像到管理设备,由管理设备生成数据流的流参数,并执行后续的流程。请参见图9,该方法包括如下步骤:
步骤901:第一设备将接收到的数据报文镜像到管理设备,对应的,管理设备接收第一设备转发的数据报文。
示例性地,第一设备可以是转发设备,如前所述,转发设备将接收到的数据报文进行复制,将得到的数据报文的副本镜像至管理设备。
再示例性地,第一设备还可以包括旁挂设备,如前所述,转发设备将数据报文的副本镜像至旁挂设备,旁挂设备可以将接收到的数据报文再次镜像到管理设备。
步骤902:管理设备确定接收到的数据流的流参数。
其中,流参数包括数据流的五元组信息和第一时间信息,这里的第一时间信息可以是根据管理设备接收到该数据流的时间确定的。
步骤903:管理设备根据接收到的在第一时间段内的多个数据流的流参数和至少一个预设访问模式,对该多个数据流进行分组,得到至少一个数据流组。
步骤904:针对确定的任一数据流组,管理设备确定该数据流组的组参数。
对于步骤902,请参见上文步骤201,或步骤801等相似步骤的具体描述,步骤903-步骤904请参见步骤803-步骤804的具体描述,此处不再赘述。
管理设备可以将每个汇聚周期得到的多个数据流组的(第二)组参数进行存储,例如存储在组参数数据库中,该组参数数据库可以部署在管理设备上,也可以部署在其他设备上,例如独立的存储服务器上,例如,组参数数据库部署在云平台,则历史组参数信息包括云平台接收到的或自身确定的所有组参数,这些组参数后续可以用于作为异常数据流检测或用于制定安全规则。下文将用于异常数据流检测或用于制定安全规则的设备称为第三设备,该第三设备可以是该网络架构中的管理设备(例如任一网络管理设备或云平台),也可以是独立部署的设备。
如下对历史组参数的应用方式进行介绍。
参见图10,图10为本申请实施例提供的一种组参数应用方法的流程示意图。该方法 可以应用于第三设备和集成组参数数据库的管理设备,需要说明的是,该第三设备和该管理设备可以部署在不同的设备上,也可以部署在同一设备上,如图10所示,该方法包括:
步骤1001:第三设备接收用户输入的查询条件,该查询条件包含查询字段。
示例性地,本申请实施例还提供了一种集成在第三设备上的用户界面,该用户界面包含查询输入区,结果显示区。其中,查询输入区,用于输入查询字段,例如组参数相关的字段。结果显示区,用于显示查询结果。
其中,该查询字段可以是但不限于下列中的部分或全部:流支持度、设备访问支持度、服务器IP,终端端口号,服务器端口号,协议类型,流数,终端的IP地址集合,终端设备的数量,第二时间信息,访问模式标识。
例如,查询条件可以是服务器IP地址为10.0.0.1,又例如,查询条件可以是服务器端口号为8080。查询条件中还可能包括查询阈值,例如,某查询条件为流支持度大于50%,则查询阈值为50%。再例如,某查询条件为设备访问支持度小于2%。又例如,查询条件为设备访问支持度在60%~100%。
步骤1002:第三设备向管理设备发送该查询条件,对应的,管理设备接收第三设备发送的查询条件。
步骤1003:管理设备确定满足查询条件的查询结果。
一种可实施的方式,管理设备基于组参数数据库确定满足查询条件的查询结果。其中,查询结果包括管理设备基于组参数数据库确定的匹配于查询条件或组参数满足查询阈值的数据流组的部分或全部组参数。例如,查询条件为服务器IP地址为10.0.0.1,则查询结果包括管理设备基于组参数数据库中的历史组参数,确定的服务器IP地址为10.0.0.1的数据流组的部分或全部组参数。又例如,查询条件为流支持度小于2%,则管理设备可以基于组参数数据库中的历史组参数确定流支持度小于2%的数据流组(称为目标数据流组),查询结果可以是历史组参数中记录的该目标数据流组的部分或全部组参数,例如,目标数据流组的服务器IP地址、服务器端口号、协议类型,等等。
步骤1004:管理设备将查询结果发送给该第三设备,对应的,该第三设备接收管理设备发送的查询结果。
示例性地,第三设备可以在上述步骤1001中的用户界面显示查询结果,以供用户浏览查阅。
上述场景可以应用于异常数据流检测中,例如,查询流支持度小于2%的数据流,这些数据流很可能是异常数据流,通过该方式,可以及时检测出异常数据流,提高异常数据流的检测效率和准确性。或者第三设备也可以根据查询结果自动生成安全规则,示例性地,可以根据查询结果包括的目的数据流组的组参数中的部分或全部数据项制定安全规则,具体的,对于其中流支持度高于第一阈值或设备访问支持度高于第二阈值的目标数据流组的一套组参数中的部分或全部数据项可以用于制定白名单,例如,查询结果中包括目标数据流组1的组参数,该目标数据流组1的流支持度为80%,第一阈值为51%,该组参数包括服务器IP 10.0.0.1,服务器端口号范围8080—8090,则对于服务器IP为10.0.0.1,服务器端口号在8080至8090之间的数据流为允许转发的数据流,属于白名单。同理,对于流支持度低于第三阈值或设备访问支持度低于第四阈值的目标数据流组的一套组参数中的部分或全部数据项可以应用制定黑名单,例如,查询结果中包括目标数据流组2的组参数,该目标数据流组2的流支持度为3%,第三阈值为15%,该组参数包括服务器IP 10.0.1.100, 服务器端口号范围45532—45562,则对于服务器IP为10.0.1.100,服务器端口号在45532至45562之间的数据流为需要拦截的数据流,属于黑名单。上述方式,避免单纯依赖人工经验配置安全规则,提高了网络内数据访问的可靠性。
参见图11,图11为本申请实施例提供的一种组参数应用方法的流程示意图。该方法可以应用于第三设备和集成组参数数据库的管理设备,需要说明的是,该第三设备和该管理设备可以部署在不同的设备上,也可以部署在同一设备上,如图11所示,该方法包括:
步骤1101:第三设备监测用户在安全规则配置界面输入的配置字段。
其中,配置字段包括但不限于:服务器IP地址、服务器端口号范围、终端端口号范围、协议类型,还可以包括终端IP地址、允许访问时间等等。例如,配置白名单时,允许转发的数据流的安全规则字段包括:服务器IP地址为10.1.0.100,服务器端口号最小值为45527,服务器端口号最大值为65532,终端端口号为80,协议类型TCP,允许访问时间为8:00-11:30,或8:00-17:00等。那么后续,转发设备接收到数据流符合该白名单,则可以转发该数据流。
步骤1102:第三设备将监测到的配置字段发送给管理设备,对应的,管理设备接收第三设备发送的配置字段。
一种可实施的方式,第三设备可以自动将检测到的配置字段持续发送给管理设备。实际上,第三设备可以在用户输入过程中持续监测,并将时时监测到的配置字段同步发送到管理设备。另一种可实施的方式,第三设备也可以接收到用户的确认操作后将当前用户输入的配置字段发送给管理设备。
步骤1103:管理设备确定匹配该配置字段的匹配结果。
一种可实施的方式,管理设备基于组参数数据库查询历史组参数中匹配该配置字段的目标数据流组的组参数。示例性地,管理设备可以查询组参数数据库中包含服务器IP地址为10.1.01.100的全部目标数据流组的组参数。再示例性地,管理设备也可以根据时间、流支持度、设备访问支持度等维度对查询到的多个目标数据流组进行排序,将排名前N个的目标数据流组的(部分或全部)组参数发送给第三设备。具体的,在配置白名单时,可以按照数值由大到小进行排序,取排名前N个的目标数据流的部分或全部组参数进行反馈。在配置黑名单时,按照数值由小到大进行排序,取排名前后N个的目标数据流组的部分或全部组参数进行反馈。示例性地,第三设备在发送配置字段时还发送了用于指示该第三设备在配置白名单或黑名单的指示信息,用于通知管理设备第三设备发送的配置字段是用于配置白名单还是黑名单。
上述介绍了管理设备基于部分字段确定匹配结果的流程,后续,管理设备还可能持续接收到其余的字段,例如,管理设备在接收到字段1:服务器IP地址为10.1.01.100,之后,还可能接收到字段2:终端端口号为80。管理设备在接收到字段1时,查询字段1的匹配结果1,在接收到字段2时,基于匹配结果1查询字段2的匹配结果2。
需要说明的是,若未查询到匹配该查询字段的结果则反馈给第三设备不存在或未匹配成功等信息。
步骤1104:管理设备将匹配结果发送给第三设备,对应的,该第三设备接收管理设备发送的匹配结果。
可选的,第三设备上可以显示该匹配结果,用于用户浏览查看匹配结果,用户可以根据经验参考该匹配结果生成安全规则。或者,第三设备也可以自动生成安全规则,例如第三设备接收到匹配结果后,自动提取匹配结果中的流参数,并写入安全规则配置界面中对 应的参数项处。可选的,用户点击确认后,确定生成该条安全规则。具体请参见上文根据查询结果的组参数生成白名单和黑名单的描述,此处不再赘述。
上述方式,实现基于在网传输的数据流的访问行为生成安全规则的方式,避免单纯依赖人工经验配置安全规则,提高了网络内数据访问的可靠性。
基于与方法实施例同一发明构思,本申请实施例还提供了一种确定数据流信息的装置,用于执行上述方法实施例中图2-图4中第一设备或图8、图9中管理设备执行的功能,如图12所示,该装置包括获取单元1201、处理单元1202。
获取单元1201,用于获取第一时间段内的多个数据流的流参数;所述流参数包括:协议类型、终端端口号、服务器IP地址、服务器端口号;具体实现方式请参见图2中的步骤201或图8中的步骤801及802或图9中的步骤901及步骤902的描述,此处不再赘述。
处理单元1202,用于根据至少一个预设访问模式的流参数规则和多个数据流的流参数,得到至少一个数据流组;每个数据流组内的数据流之间的关系满足一个预设访问模式的流参数规则;确定每一个数据流组的组参数;所述组参数包括服务器IP地址、服务器端口号范围、终端端口号范围、协议类型;其中,所述数据流组的组参数是根据所述数据流组包括的数据流的流参数确定的。具体实现方式请参见图2中的步骤202及步骤203或图3或图4或图8中的步骤803及步骤804,或图9中的步骤903及步骤904的描述,此处不再赘述。
可选的,所述装置还包括发送单元1203;发送单元1203,还用于将所述上报周期内确定的多个数据流组的组参数发送给管理设备,或将所述上报周期内确定的多个数据流组的组参数和零散数据流的流参数发送给管理设备,其中零散数据流为不属于所述上报周期内的任一数据流组的数据流。具体实现方式请参见图2中的步骤204此处不再赘述。
在一种可能的实现方法中,所述装置为管理设备;获取单元1201,还用于接收多个统计结果,所述多个统计结果来自一个或多个第一设备。具体实现方式请参见图6中的步骤601的描述,此处不再赘述。处理单元1202,还用于在接收到的多个统计结果中的第二时间段内的多个统计结果,基于该第二时间段内的多个统计结果,将该多个统计结果中的至少两个数据流组合并,根据该至少两个数据流组中的每一数据流组的组参数更新合并后的数据流组的组参数;其中,该至少两个数据流组中的数据流之间的关系满足第一流参数规则或满足第二流参数规则。统计结果还包括未被划分为数据流组的零散数据流;至少一个预设访问模式还包括所述第三访问模式;处理单元1202,还用于将第二时间段内的多个统计结果中的零散数据流和目标数据流组合并,根据零散数据流的流参数和目标数据流组的组参数更新合并后的数据流的组参数;其中,目标数据流组内的数据流与零散数据流之间的关系满足第一流参数规则或第二流参数规则;管理设备基于剩余的零散数据流,确定属于第三访问模式的数据流组。具体实现方式请参见图6中的步骤602及步骤603,或图7的描述,此处不再赘述。
在一种可能的实现方法中,组参数用于识别异常数据流或用于确定安全规则,安全规则用于控制数据流转发。
在一种可能的实现方法中,该装置为管理设备;管理设备存储有历史数据流组的组参数;获取单元1201,还用于接收查询请求;查询请求用于指示查询条件,查询条件包括待查询的组参数中的一项或多项;处理单元1202,还用于确定满足所述查询条件的查询结果,并发送所述查询结果。具体实现方式请参见图10或图11的描述,此处不再赘述。
基于与方法实施例同一发明构思,本申请实施例还提供了一种确定数据流信息的设备,用于执行上述方法实施例中图10或图11中第三设备执行的功能,如图13所示,该设备包括获取单元1301、确定单元1302。
获取单元1301,用于获取目标数据流组的组参数,组参数包括服务器IP地址、服务器端口号范围、终端端口号范围、协议类型;确定单元1302,用于根据组参数确定安全规则,安全规则包括黑名单和/或白名单;黑名单用于指示需要被拦截的数据流,白名单用于指示需要被转发的数据流。
在一种可能的实现方法中,目标数据流组的流支持度高于第一阈值或设备访问支持度高于第二阈值;组参数用于确定所述白名单;或者,
目标数据流组的流支持度低于第三阈值或设备访问支持度低于第四阈值,组参数用于确定所述黑名单。
参阅图14所示,为本申请提供的一种装置示意图,该装置可以是上述实施例中的转发设备、旁挂在转发设备上的设备、管理设备或第三设备。该装置1400包括:处理器1402、通信接口1403。可选的,装置1400还可以包括存储器1401和/或通信线路1404。其中,通信接口1403、处理器1402以及存储器1401可以通过通信线路1404相互连接;通信线路1404可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。所述通信线路1404可以分为地址总线、数据总线、控制总线等。为便于表示,图14中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
处理器1402可以是一个CPU,微处理器,ASIC,或一个或多个用于控制本申请方案程序执行的集成电路。
通信接口1403,使用任何收发器一类的装置,用于与其他设备或通信网络通信,如以太网,无线接入网(radio access network,RAN),无线局域网(wireless local area networks,WLAN),有线接入网等。
存储器1401可以是ROM或可存储静态信息和指令的其他类型的静态存储设备,RAM或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于承载或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过通信线路1404与处理器相连接。存储器也可以和处理器集成在一起。
其中,存储器1401用于存储执行本申请方案的计算机执行指令,并由处理器1402来控制执行。处理器1402用于执行存储器1401中存储的计算机执行指令,从而实现本申请上述实施例提供的确定数据流信息的方法。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
可选的,本申请实施例中的计算机执行指令也可以称之为应用程序代码,本申请实施例对此不作具体限定。
本领域普通技术人员可以理解:本申请中涉及的第一、第二等各种数字编号仅为描述方便进行的区分,并不用来限制本申请实施例的范围,也表示先后顺序。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。“至少一个”是指一个或者多个。至少两个是指两个或者多个。“至少一个”、“任意一个”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个、种),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。“多个”是指两个或两个以上,其它量词与之类似。此外,对于单数形式“a”,“an”和“the”出现的元素(element),除非上下文另有明确规定,否则其不意味着“一个或仅一个”,而是意味着“一个或多于一个”。例如,“a device”意味着对一个或多个这样的device。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包括一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。
本申请实施例中所描述的各种说明性的逻辑单元和电路可以通过通用处理器,数字信号处理器,专用集成电路(ASIC),现场可编程门阵列(FPGA)或其它可编程逻辑装置,离散门或晶体管逻辑,离散硬件部件,或上述任何组合的设计来实现或操作所描述的功能。通用处理器可以为微处理器,可选地,该通用处理器也可以为任何传统的处理器、控制器、微控制器或状态机。处理器也可以通过计算装置的组合来实现,例如数字信号处理器和微处理器,多个微处理器,一个或多个微处理器联合一个数字信号处理器核,或任何其它类似的配置来实现。
本申请实施例中所描述的方法或算法的步骤可以直接嵌入硬件、处理器执行的软件单元、或者这两者的结合。软件单元可以存储于RAM存储器、闪存、ROM存储器、EPROM存储器、EEPROM存储器、寄存器、硬盘、可移动磁盘、CD-ROM或本领域中其它任意形式的存储媒介中。示例性地,存储媒介可以与处理器连接,以使得处理器可以从存储媒介中读取信息,并可以向存储媒介存写信息。可选地,存储媒介还可以集成到处理器中。处理器和存储媒介可以设置于ASIC中。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他 可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管结合具体特征及其实施例对本申请进行了描述,显而易见的,在不脱离本申请的精神和范围的情况下,可对其进行各种修改和组合。相应地,本说明书和附图仅仅是所附权利要求所界定的本申请的示例性说明,且视为已覆盖本申请范围内的任意和所有修改、变化、组合或等同物。显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包括这些改动和变型在内。

Claims (36)

  1. 一种确定数据流信息的方法,应用于第一设备,其特征在于,包括:
    获取第一时间段内的多个数据流的流参数;所述流参数包括:协议类型、终端端口号、服务器IP地址、服务器端口号;
    根据至少一个预设访问模式的流参数规则和所述多个数据流的流参数,得到至少一个数据流组;每个数据流组内的数据流之间的关系满足一个预设访问模式的流参数规则;
    确定每一个数据流组的组参数;所述组参数包括服务器IP地址、服务器端口号范围、终端端口号范围、协议类型;其中,所述数据流组的组参数是根据所述数据流组包括的数据流的流参数确定的。
  2. 如权利要求1所述的方法,其特征在于,所述组参数用于识别异常数据流或用于确定安全规则,所述安全规则用于控制数据流转发。
  3. 如权利要求1所述的方法,其特征在于,所述组参数还包括下列中的部分或全部:
    终端IP地址集合、数据流的流数、时间模式信息、访问模式标识、流支持度、设备访问支持度;其中,终端IP地址集合包括所述数据流组内的数据流对应的不同的终端IP地址;数据流的流数是指所述数据流组包含的数据流的数量;时间模式信息用于指示所述数据流组所属的预设时间模式,其中,不同的预设时间模式与预设时间范围一一对应;访问模式标识用于标识数据流组所属的预设访问模式;所述流支持度是根据所述数据流组的所述数据流的数量与所述第一时间段内的数据流的总数确定的;所述设备访问支持度是根据所述数据流组对应的终端的数量与所述第一时间段内的数据流对应的终端的总数量确定的。
  4. 如权利要求1-3任一项所述的方法,其特征在于,所述至少一个预设访问模式包括下列模式中的一个或多个:第一访问模式、第二访问模式、第三访问模式;
    其中,属于所述第一访问模式的数据流组内的数据流之间的关系满足第一流参数规则,所述第一流参数规则包括:所述数据流组内的数据流的协议类型相同,终端端口号不完全相同,服务器端口号相同,服务器IP地址相同;或所述数据流组内的数据流的协议类型相同,终端端口号不完全相同,服务器端口号相同,服务器IP地址属于同一预设IP地址组;
    属于所述第二访问模式的数据流组内的数据流之间的关系满足第二流参数规则,所述第二流参数规则包括:所述数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号相同,服务器IP地址相同;或所述数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号相同,服务器IP地址属于同一预设IP地址组;
    属于所述第三访问模式内的数据流组内的数据流之间的关系满足第三流参数规则,所述第三流参数规则包括:所述数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号不完全相同,服务器IP地址相同;或所述数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号不完全相同,服务器IP地址属于同一预设IP地址组。
  5. 如权利要求4所述的方法,其特征在于,所述至少一个预设访问模式包括所述第一访问模式和所述第二访问模式;
    所述根据至少一个预设访问模式的流参数规则和所述多个数据流的流参数,得到至少一个数据流组,包括:
    基于所述第一时间段内的多个数据流的流参数,确定属于所述第一访问模式的数据流组,基于剩余的数据流的流参数确定属于所述第二访问模式的数据流组。
  6. 如权利要求5所述的方法,其特征在于,所述第一设备为管理设备;所述至少一个预设访问模式还包括所述第三访问模式;
    该方法还包括:
    基于所述第一时间段内的多个数据流中除去属于所述第一访问模式以及属于所述第二访问模式的数据流组的数据流之外的数据流,确定属于所述第三访问模式的数据流组。
  7. 如权利要求4-6任一项所述的方法,其特征在于,所述第一设备为转发设备或为旁挂在所述转发设备上的设备;
    该方法还包括:
    获取基于多个时间段内的数据流的流参数确定的多个数据流组的组参数;所述多个时间段包括所述第一时间段;
    将所述多个数据流组中的至少两个数据流组合并,根据所述至少两个数据流组的组参数确定合并后的数据流组的组参数;其中,所述至少两个数据流组中的数据流之间的关系满足所述第一流参数规则或所述第二流参数规则。
  8. 如权利要求7所述的方法,其特征在于,该方法还包括:
    获取所述多个时间段内的第一数据流的流参数,所述第一数据流为所述多个时间段内的多个数据流中不属于所述多个时间段内的任一数据流组的数据流;
    当确定所述第一数据流与所述多个时间段内的一个数据流组中的数据流之间的关系满足所述第一流参数规则或所述第二流参数规则时,将所述第一数据流加入所述数据流组,并根据所述第一数据流的流参数更新所述数据流组的组参数。
  9. 如权利要求1-5、7、8任一项所述的方法,其特征在于,所述第一设备为转发设备或为旁挂在所述转发设备上的设备;该方法还包括:
    向管理设备发送所述第一设备所确定的数据流组的组参数。
  10. 如权利要求1-6任一项所述的方法,其特征在于,所述第一设备为管理设备,所述第一时间段内的多个数据流的流参数来自多个第二设备,所述多个第二设备包括转发设备和/或为旁挂在所述转发设备上的设备。
  11. 如权利要求2所述的方法,其特征在于,所述第一设备为管理设备;所述管理设备存储有历史数据流组的组参数;该方法还包括:
    接收第三设备发送的查询请求;所述查询请求用于指示查询条件,所述查询条件包括流支持度阈值和/或设备访问支持度阈值;
    从所述历史数据流组的组参数中确定目标组参数,所述目标组参数的流支持度满足所述流支持度阈值和/或设备访问支持度满足所述设备访问支持度阈值;
    向所述第三设备发送所述目标组参数。
  12. 如权利要求1-5、7-9任一项所述的方法,其特征在于,所述转发设备为交换机或路由器或虚拟专用网络VPN设备。
  13. 一种确定数据流信息的方法,其特征在于,该方法包括:
    获取目标数据流组的组参数,所述组参数包括服务器IP地址、服务器端口号范围、终端端口号范围、协议类型;
    根据所述组参数确定安全规则,所述安全规则包括黑名单和/或白名单;所述黑名单用 于指示需要被拦截的数据流,所述白名单用于指示需要被转发的数据流。
  14. 如权利要求13所述的方法,其特征在于,所述目标数据流组的流支持度高于第一阈值或设备访问支持度高于第二阈值;所述组参数用于确定所述白名单;或者,
    所述目标数据流组的流支持度低于第三阈值或设备访问支持度低于第四阈值,所述组参数用于确定所述黑名单。
  15. 一种确定数据流信息的系统,包括至少一个第一设备及至少一个管理设备,其特征在于:
    所述第一设备获取第一时间段内的多个数据流的流参数,并基于至少一个预设访问模式的流参数规则和所述多个数据流的流参数,得到至少一个数据流组;确定每一个数据流组的组参数,并将所述第一时间段的统计结果发送至管理设备,所述统计结果包括确定的所述至少一个数据流组的组参数;其中,所述流参数包括:协议类型、终端端口号、服务器IP地址、服务器端口号;所述组参数包括:服务器IP地址、服务器端口号范围、终端端口号范围、协议类型;每个所述预设访问模式与一组预设的流参数规则相对应;
    所述管理设备接收多个统计结果,所述多个统计结果来自一个或多个第一设备。
  16. 如权利要求15所述的系统,其特征在于,所述至少一个预设访问模式包括下列模式中的一个或多个:第一访问模式、第二访问模式、第三访问模式;
    其中,属于所述第一访问模式的数据流组内的数据流之间的关系满足第一流参数规则,所述第一流参数规则包括:所述数据流组内的数据流的协议类型相同,终端端口号不完全相同,服务器端口号相同,服务器IP地址相同;或所述数据流组内的数据流的协议类型相同,终端端口号不完全相同,服务器端口号相同,服务器IP地址属于同一预设IP地址组;
    属于所述第二访问模式的数据流组内的数据流之间的关系满足第二流参数规则,所述第二流参数规则包括:所述数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号相同,服务器IP地址相同;或所述数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号相同,服务器IP地址属于同一预设IP地址组;
    属于所述第三访问模式内的数据流组内的数据流之间的关系满足第三流参数规则,所述第三流参数规则包括:所述数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号不完全相同,服务器IP地址相同;或所述数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号不完全相同,服务器IP地址属于同一预设IP地址组。
  17. 如权利要求16所述的系统,其特征在于,所述管理设备基于第二时间段内的多个统计结果,将所述多个统计结果中的至少两个数据流组合并,根据所述至少两个数据流组中组参数确定合并后的数据流组的组参数;其中,所述第二时间段包含所述第一时间段,所述至少两个数据流组中的数据流之间的关系满足同一所述预设的流参数规则;所述至少两个数据流组属于所述第一访问模式或所述第二访问模式。
  18. 如权利要求16所述的系统,其特征在于,所述至少一个预设访问模式还包括所述第三访问模式;
    所述统计结果还包括所述第一时间段内未被划分为任一数据流组的零散数据流;
    管理设备将所述多个统计结果中的一个或多个零散数据流加入目标数据流组,根据所述一个或多个零散数据流的流参数更新所述目标数据流组的组参数;其中,所述目标数据流组内的数据流与所述一个或多个零散数据流之间的关系满足同一所述预设的流参数规 则;所述目标数据流组属于所述第一访问模式或所述第二访问模式;
    管理设备基于剩余的零散数据流,确定属于所述第三访问模式的数据流组。
  19. 一种确定数据流信息的系统,包括至少一个第一设备及至少一个管理设备,其特征在于:
    所述第一设备向所述管理设备发送第一时间段内的多个数据流的流参数;所述流参数包括:协议类型、终端端口号、服务器IP地址、服务器端口号;
    所述管理设备从一个或多个第一设备接收所述第一时间段内的多个数据流的流参数;基于至少一个预设访问模式的流参数规则和所述多个数据流的流参数,得到至少一个数据流组;确定每一个数据流组的组参数;其中,所述流参数包括:协议类型、终端端口号、服务器IP地址、服务器端口号;所述组参数包括:服务器IP地址、服务器端口号范围、终端端口号范围、协议类型;每个所述预设访问模式与一组预设的流参数规则相对应。
  20. 如权利要求19所述的系统,其特征在于,所述至少一个预设访问模式包括下列模式中的一个或多个:第一访问模式、第二访问模式、第三访问模式;
    其中,属于所述第一访问模式的数据流组内的数据流之间的关系满足第一流参数规则,所述第一流参数规则包括:所述数据流组内的数据流的协议类型相同,终端端口号不完全相同,服务器端口号相同,服务器IP地址相同;或所述数据流组内的数据流的协议类型相同,终端端口号不完全相同,服务器端口号相同,服务器IP地址属于同一预设IP地址组;
    属于所述第二访问模式的数据流组内的数据流之间的关系满足第二流参数规则,所述第二流参数规则包括:所述数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号相同,服务器IP地址相同;或所述数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号相同,服务器IP地址属于同一预设IP地址组;
    属于所述第三访问模式内的数据流组内的数据流之间的关系满足第三流参数规则,所述第三流参数规则包括:所述数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号不完全相同,服务器IP地址相同;或所述数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号不完全相同,服务器IP地址属于同一预设IP地址组。
  21. 如权利要求20所述的系统,其特征在于,所述至少一个预设访问模式包括所述第一访问模式、所述第二访问模式和所述第三访问模式;
    所述管理设备基于接收到的所述多个数据流的流参数,确定属于所述第一访问模式的数据流组,基于除去属于所述第一访问模式之外剩余的数据流确定属于所述第二访问模式的数据流组,基于除去属于所述第一访问模式以及属于所述第二访问模式之外剩余的数据流确定属于所述第三访问模式的数据流组。
  22. 一种确定数据流信息的系统,包括多个第一设备及至少一个管理设备,其特征在于:
    所述第一设备将接收到的数据流发送至管理设备;
    所述管理设备接收多个数据流,所述多个数据流来自于一个或多个第一设备;确定所述多个数据流中每一数据流的流参数,基于至少一个预设访问模式的流参数规则和所述多个数据流的流参数,得到至少一个数据流组;确定每一个数据流组的组参数;其中,所述流参数包括:协议类型、终端端口号、服务器IP地址、服务器端口号;所述组参数包括:服务器IP地址、服务器端口号范围、终端端口号范围、协议类型;每个所述预设访问模式与一组预设的流参数规则相对应。
  23. 一种确定数据流信息的装置,其特征在于,该装置包括获取单元、处理单元;
    所述获取单元,用于获取第一时间段内的多个数据流的流参数;所述流参数包括:协议类型、终端端口号、服务器IP地址、服务器端口号;
    所述处理单元,用于根据至少一个预设访问模式的流参数规则和所述多个数据流的流参数,得到至少一个数据流组;每个数据流组内的数据流之间的关系满足一个预设访问模式的流参数规则;确定每一个数据流组的组参数;所述组参数包括服务器IP地址、服务器端口号范围、终端端口号范围、协议类型;其中,所述数据流组的组参数是根据所述数据流组包括的数据流的流参数确定的。
  24. 如权利要求23所述的装置,其特征在于,所述组参数用于识别异常数据流或用于确定安全规则,所述安全规则用于控制数据流转发。
  25. 如权利要求23或24所述的装置,其特征在于,所述组参数还包括下列中的部分或全部:
    终端IP地址集合、数据流的流数、时间模式信息、访问模式标识、流支持度、设备访问支持度;其中,终端IP地址集合包括所述数据流组内的数据流对应的不同的终端IP地址;数据流的流数是指所述数据流组包含的数据流的数量;时间模式信息用于指示所述数据流组所属的预设时间模式,其中,不同的预设时间模式与预设时间范围一一对应;访问模式标识用于标识数据流组所属的预设访问模式;所述流支持度是根据所述数据流组的所述数据流的数量与所述第一时间段内的数据流的总数确定的;所述设备访问支持度是根据所述数据流组对应的终端的数量与所述第一时间段内的数据流对应的终端的总数量确定的。
  26. 如权利要求23-25任一项所述的装置,其特征在于,所述至少一个预设访问模式包括下列模式中的一个或多个:第一访问模式、第二访问模式、第三访问模式;
    其中,属于所述第一访问模式的数据流组内的数据流之间的关系满足第一流参数规则,所述第一流参数规则包括:所述数据流组内的数据流的协议类型相同,终端端口号不完全相同,服务器端口号相同,服务器IP地址相同;或所述数据流组内的数据流的协议类型相同,终端端口号不完全相同,服务器端口号相同,服务器IP地址属于同一预设IP地址组;
    属于所述第二访问模式的数据流组内的数据流之间的关系满足第二流参数规则,所述第二流参数规则包括:所述数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号相同,服务器IP地址相同;或所述数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号相同,服务器IP地址属于同一预设IP地址组;
    属于所述第三访问模式内的数据流组内的数据流之间的关系满足第三流参数规则,所述第三流参数规则包括:所述数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号不完全相同,服务器IP地址相同;或所述数据流组内的数据流的协议类型相同,服务器端口号不完全相同,终端端口号不完全相同,服务器IP地址属于同一预设IP地址组。
  27. 如权利要求26所述的装置,其特征在于,所述至少一个预设访问模式包括所述第一访问模式和所述第二访问模式;
    所述处理单元在根据至少一个预设访问模式的流参数规则和所述多个数据流的流参数,得到至少一个数据流组时,具体用于:
    基于所述第一时间段内的多个数据流的流参数,确定属于所述第一访问模式的数据流 组,基于剩余的数据流的流参数确定属于所述第二访问模式的数据流组。
  28. 如权利要求27所述的装置,其特征在于,所述装置为管理设备;所述至少一个预设访问模式还包括所述第三访问模式;
    所述处理单元还用于:
    基于所述第一时间段内的多个数据流中除去属于所述第一访问模式以及属于所述第二访问模式的数据流组的数据流之外的数据流,确定属于所述第三访问模式的数据流组。
  29. 如权利要求26-28任一项所述的装置,其特征在于,所述装置为转发设备或为旁挂在所述转发设备上的设备;
    所述获取单元还用于:获取基于多个时间段内数据流的流参数确定的多个数据流组的组参数;所述多个时间段包括所述第一时间段;
    所述处理单元还用于:将所述多个数据流组中的至少两个数据流组合并,根据所述至少两个数据流组的组参数确定合并后的数据流组的组参数;其中,所述至少两个数据流组中的数据流之间的关系满足所述第一流参数规则或所述第二流参数规则。
  30. 如权利要求28所述的装置,其特征在于,所述获取单元还用于:获取所述多个时间段内的第一数据流的流参数,所述第一数据流为所述多个时间段内的多个数据流中不属于所述多个时间段内的任一数据流组的数据流;
    所述处理单元还用于:当确定所述第一数据流与所述多个时间段内的一个数据流组中的数据流之间的关系满足所述第一流参数规则或所述第二流参数规则时,将所述第一数据流加入所述数据流组,并根据所述第一数据流的流参数更新所述数据流组的组参数。
  31. 如权利要求23-27、29、30任一项所述的装置,其特征在于,所述装置还包括发送单元;所述装置为转发设备或为旁挂在所述转发设备上的设备;
    所述发送单元还用于:向管理设备发送所述装置所确定的数据流组的组参数。
  32. 如权利要求23-28任一项所述的装置,其特征在于,所述通信设备为管理设备,所述第一时间段内的多个数据流的流参数来自多个第二设备,所述多个第二设备包括转发设备和/或为旁挂在所述转发设备上的设备。
  33. 如权利要求24所述的装置,其特征在于,所述装置还包括发送单元;所述装置为管理设备,所述管理设备存储有历史数据流组的组参数;
    所述获取单元还用于:接收第三设备发送的查询请求;所述查询请求用于指示查询条件,所述查询条件包括流支持度阈值和/或设备访问支持度阈值;
    所述处理单元还用于:从所述历史数据流组的组参数中确定目标组参数,所述目标组参数的流支持度满足所述流支持度阈值和/或设备访问支持度满足所述设备访问支持度阈值;
    所述发送单元还用于,向所述第三设备发送所述目标组参数。
  34. 一种确定数据流信息的装置,其特征在于,该装置包括获取单元、确定单元;
    所述获取单元,用于获取目标数据流组的组参数,所述组参数包括服务器IP地址、服务器端口号范围、终端端口号范围、协议类型;
    所述确定单元,用于根据所述组参数确定安全规则,所述安全规则包括黑名单和/或白名单;所述黑名单用于指示需要被拦截的数据流,所述白名单用于指示需要被转发的数据流。
  35. 一种确定数据流信息的装置,其特征在于,包括存储器及处理器;所述存储器存储 有程序指令,所述处理器运行所述程序指令,以执行权利要求1-12任一所述的方法,或用于执行权利要求13或14所述的方法。
  36. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行如权利要求1-12中任意一项所述的方法,或使得所述计算机执行如权利要求13或14所述的方法。
PCT/CN2021/130427 2020-11-13 2021-11-12 一种确定数据流信息的方法、装置及系统 WO2022100707A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21891227.7A EP4236200A4 (en) 2020-11-13 2021-11-12 METHOD, APPARATUS AND SYSTEM FOR DETERMINING DATA FLOW INFORMATION
US18/316,591 US20230283624A1 (en) 2020-11-13 2023-05-12 Method, apparatus, and system for determining data flow information

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202011271196 2020-11-13
CN202011271196.X 2020-11-13
CN202110131909.0 2021-01-30
CN202110131909.0A CN114567455A (zh) 2020-11-13 2021-01-30 一种确定数据流信息的方法、装置及系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/316,591 Continuation US20230283624A1 (en) 2020-11-13 2023-05-12 Method, apparatus, and system for determining data flow information

Publications (1)

Publication Number Publication Date
WO2022100707A1 true WO2022100707A1 (zh) 2022-05-19

Family

ID=81600781

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/130427 WO2022100707A1 (zh) 2020-11-13 2021-11-12 一种确定数据流信息的方法、装置及系统

Country Status (3)

Country Link
US (1) US20230283624A1 (zh)
EP (1) EP4236200A4 (zh)
WO (1) WO2022100707A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115988574A (zh) * 2023-03-15 2023-04-18 阿里巴巴(中国)有限公司 基于流表的数据处理方法、系统、设备和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1716867A (zh) * 2004-06-29 2006-01-04 杭州华为三康技术有限公司 数据流量统计方法及装置
CN101505218A (zh) * 2009-03-18 2009-08-12 杭州华三通信技术有限公司 攻击报文的检测方法和装置
JP2012175338A (ja) * 2011-02-21 2012-09-10 Oki Electric Ind Co Ltd フロー監視装置、フロー監視方法およびプログラム
CN106506541A (zh) * 2016-12-16 2017-03-15 北京匡恩网络科技有限责任公司 生成网络白名单的方法和装置
CN110392013A (zh) * 2018-04-17 2019-10-29 深圳先进技术研究院 一种基于网络流量分类的恶意软件识别方法、系统及电子设备
US20200169509A1 (en) * 2018-11-27 2020-05-28 Xaxar Inc. Systems and methods of data flow classification

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9083712B2 (en) * 2007-04-04 2015-07-14 Sri International Method and apparatus for generating highly predictive blacklists
US9800592B2 (en) * 2014-08-04 2017-10-24 Microsoft Technology Licensing, Llc Data center architecture that supports attack detection and mitigation
CN110858229B (zh) * 2018-08-23 2023-04-07 阿里巴巴集团控股有限公司 数据处理方法、设备、访问控制系统及存储介质
US11206276B2 (en) * 2019-01-16 2021-12-21 Sri International Cyber security using host agent(s), a network flow correlator, and dynamic policy enforcement

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1716867A (zh) * 2004-06-29 2006-01-04 杭州华为三康技术有限公司 数据流量统计方法及装置
CN101505218A (zh) * 2009-03-18 2009-08-12 杭州华三通信技术有限公司 攻击报文的检测方法和装置
JP2012175338A (ja) * 2011-02-21 2012-09-10 Oki Electric Ind Co Ltd フロー監視装置、フロー監視方法およびプログラム
CN106506541A (zh) * 2016-12-16 2017-03-15 北京匡恩网络科技有限责任公司 生成网络白名单的方法和装置
CN110392013A (zh) * 2018-04-17 2019-10-29 深圳先进技术研究院 一种基于网络流量分类的恶意软件识别方法、系统及电子设备
US20200169509A1 (en) * 2018-11-27 2020-05-28 Xaxar Inc. Systems and methods of data flow classification

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4236200A1 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115988574A (zh) * 2023-03-15 2023-04-18 阿里巴巴(中国)有限公司 基于流表的数据处理方法、系统、设备和存储介质
CN115988574B (zh) * 2023-03-15 2023-08-04 阿里巴巴(中国)有限公司 基于流表的数据处理方法、系统、设备和存储介质

Also Published As

Publication number Publication date
EP4236200A4 (en) 2024-05-29
US20230283624A1 (en) 2023-09-07
EP4236200A1 (en) 2023-08-30

Similar Documents

Publication Publication Date Title
CN106941480B (zh) 安全管理方法及安全管理系统
US9860154B2 (en) Streaming method and system for processing network metadata
CN107667505B (zh) 用于监控和管理数据中心的系统及方法
CN107683597B (zh) 用于异常检测的网络行为数据收集和分析
US10355949B2 (en) Behavioral network intelligence system and method thereof
CN108353068B (zh) Sdn控制器辅助的入侵防御系统
KR20140106547A (ko) 네트워크 메타데이터를 처리하기 위한 스트리밍 방법 및 시스템
WO2018057609A1 (en) Systems and methods for network security event filtering and translation
WO2014110293A1 (en) An improved streaming method and system for processing network metadata
WO2020228527A1 (zh) 数据流的分类方法和报文转发设备
US11343143B2 (en) Using a flow database to automatically configure network traffic visibility systems
WO2022100707A1 (zh) 一种确定数据流信息的方法、装置及系统
Qiu et al. Global Flow Table: A convincing mechanism for security operations in SDN
CN107294743B (zh) 一种网络路径探测方法、控制器及网络设备
EP3166279B1 (en) Integrated security system having rule optimization
WO2017070965A1 (zh) 一种基于软件定义网络的数据处理方法及相关设备
EP3092737B1 (en) Systems for enhanced monitoring, searching, and visualization of network data
JP6476853B2 (ja) ネットワーク監視システム及び方法
CN114567455A (zh) 一种确定数据流信息的方法、装置及系统
TWI666568B (zh) 在Netflow上以會話型式之P2P殭屍網路偵測方法
EP3166281B1 (en) Integrated security system having threat visualization
EP3166280B1 (en) Integrated security system having threat visualization and automated security device control
US11665079B1 (en) Probe-triggered full device state capture, export, and correlation
Lee et al. An Abnormal Connection Detection System based on network flow analysis
KR20160063155A (ko) Sdn 기반의 에러 탐색 네트워크 시스템

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021891227

Country of ref document: EP

Effective date: 20230523