WO2022156492A1 - 判断终端设备类型的方法和相关设备 - Google Patents

判断终端设备类型的方法和相关设备 Download PDF

Info

Publication number
WO2022156492A1
WO2022156492A1 PCT/CN2021/141759 CN2021141759W WO2022156492A1 WO 2022156492 A1 WO2022156492 A1 WO 2022156492A1 CN 2021141759 W CN2021141759 W CN 2021141759W WO 2022156492 A1 WO2022156492 A1 WO 2022156492A1
Authority
WO
WIPO (PCT)
Prior art keywords
terminal
type
terminal device
flow
access behavior
Prior art date
Application number
PCT/CN2021/141759
Other languages
English (en)
French (fr)
Inventor
薛莉
徐威旺
叶浩楠
张亮
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022156492A1 publication Critical patent/WO2022156492A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/06Generation of reports
    • H04L43/065Generation of reports related to network devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation

Definitions

  • the present application relates to the field of information technology, and more particularly, to a method for judging the type of terminal equipment and related equipment.
  • Type identification of current terminal devices relies on commercial fingerprint library scanning and manual static maintenance.
  • the fingerprint database generally relies on manual input, and many terminal devices related to specific industries do not have a complete static fingerprint database.
  • data collection needs to rely on a specific protocol to scan the terminal device, which requires the terminal device to support protocol scanning or install a client that supports the inventory function to discover assets.
  • Many terminal devices have few interactive messages, the information required by the fingerprint database cannot be sent, or the terminal device itself has no hardware and other environments to support protocol scanning or install the inventory client.
  • the present application provides a method for judging the type of terminal equipment and related equipment, which can improve the effect of determining the type of terminal equipment.
  • an embodiment of the present application provides a method for judging the type of a terminal device, including: acquiring a first data flow, where the sender of the first data flow is a first terminal device; according to a packet in the first data flow The identification information of the receiving end of the device determines the access behavior of the first terminal device; according to the terminal type judgment rule and the access behavior of the first terminal device, the type of the first terminal device is determined, wherein the terminal type judgment rule is used to indicate the terminal.
  • the terminal type judgment rule is obtained by training according to the historical data traffic.
  • the above technical solution can use the pre-trained terminal type judgment rules to determine the type of each terminal device in the network, thereby laying a good foundation for the subsequent device inventory.
  • the terminal type determination rule used in the above technical solution is determined based on historical traffic data, rather than based on a static fingerprint database. Therefore, the above technical solutions can be applied to terminal devices that do not support static fingerprint database or protocol scanning. In this way, the application range of the above technical solution is wider, and it is a more effective solution for determining the type of terminal equipment.
  • the sender of the historical data traffic includes multiple types of terminal devices, and the type of the first terminal device is one of the multiple types.
  • the sending end of the historical data traffic may not include the first terminal device.
  • the terminal type judgment rule is obtained by training according to the historical data traffic and terminal classification information, wherein the terminal classification information is used to indicate the multiple types and multiple groups of terminal identification information, each group of terminal identification information in the multiple groups of terminal identification information includes identification information of at least one terminal, and the terminal classification information is also used to indicate the corresponding relationship between the multiple types and the multiple groups of terminal identification information,
  • the multiple types are in one-to-one correspondence with multiple sets of terminal identification information, each terminal identification information in the multiple terminal identification information includes identification information of at least one terminal device, and the historical data flow is determined according to the terminal classification information.
  • the historical data flow includes multiple reference flows, the multiple reference flows are in one-to-one correspondence with the multiple types, and the multiple reference flows include the first Reference traffic, the type corresponding to the first reference traffic is the type of the first terminal device; the terminal type judgment rule includes multiple sub-rules, the multiple sub-rules correspond to the multiple types one-to-one, and the multiple sub-rules correspond to The sub-rule of the type of the first terminal device is determined according to the first reference flow and reference flows other than the first reference flow among the plurality of reference flows.
  • the first reference flow is determined according to a first candidate flow
  • the first candidate flow is a connection between multiple candidate flows and the first terminal device
  • the traffic corresponding to the type, the number of times that the access behavior corresponding to each data flow in the first reference traffic occurs in the first candidate traffic is greater than the access behavior corresponding to the data flow that does not belong to the first reference traffic in the first candidate traffic. The number of occurrences in the traffic.
  • the terminal type judgment rule is determined according to a clustering result obtained by clustering P terminal devices on a set of P servers, and the P terminal devices It is determined according to the historical data flow, the P terminal devices are in one-to-one correspondence with the P server sets, each server set in the P server sets is a set of servers accessed by the corresponding terminal device, and the P terminal devices are in one-to-one correspondence.
  • the device includes the multiple types of terminal devices, and P is a positive integer greater than or equal to the total number of types of terminal devices.
  • the historical data traffic is the upstream data stream of the P terminal devices, and the P terminal devices are senders of the historical data traffic.
  • the number of times that each terminal device in the P terminal devices is used as the sender of the synchronization message in the historical data flow and the number of times that each terminal device is used as the receiving end of the synchronization message in the historical data flow The ratio of the times of the ends is greater than the second preset ratio.
  • the historical data flow includes P reference flows
  • the multiple reference flows correspond to the P terminal devices one-to-one
  • the P reference flows correspond to P
  • the number of times that the two candidate flows appear, and the second reference flow is any one of the P reference flows.
  • the terminal type judgment rule is a judgment matrix, and the judgment matrix includes multiple rows of elements, and the multiple row elements correspond to the multiple types one-to-one;
  • the terminal type judgment rule and the access behavior of the first terminal device, and determining the type of the first terminal device includes: according to the access behavior of the first terminal, determining from the judgment matrix that it matches the access behavior of the first terminal device. target row; determine the type of the first terminal device as the type corresponding to the target row.
  • the target row corresponding to the access behavior of the first terminal device is determined from the judgment matrix according to the access behavior of the first terminal, including: According to the access behavior of the first terminal, a reference matrix is determined, wherein the values of a plurality of elements included in the reference matrix match the access behavior of the first terminal device; the judgment matrix is multiplied by the reference matrix to obtain a target matrix , the multiple elements included in the target matrix are in one-to-one correspondence with the multiple row elements of the judgment rule; the row element corresponding to the element with the largest value in the target matrix is determined as the target row.
  • an embodiment of the present application provides a computer device, where the computer device includes a unit for implementing the first aspect or any possible implementation manner of the first aspect.
  • embodiments of the present application provide a computer device, the computer device includes a processor, and the processor is configured to be coupled with a memory, and read and execute instructions and/or program codes in the memory, so as to execute the first aspect or Any possible implementation of the first aspect.
  • an embodiment of the present application provides a chip system, the chip system includes a logic circuit, the logic circuit is configured to be coupled with an input/output interface, and transmit data through the input/output interface, so as to execute the first aspect or the first any possible implementation of the aspect.
  • an embodiment of the present application provides a computer-readable storage medium, where program codes are stored in the computer-readable storage medium, and when the computer storage medium runs on a computer, the computer is made to execute the first aspect or the first aspect any possible implementation.
  • an embodiment of the present application provides a computer program product, the computer program product comprising: computer program code, when the computer program code is run on a computer, the computer is made to perform any of the first aspect or the first aspect. one possible implementation.
  • FIG. 1 is a schematic diagram of a possible application scenario provided according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a centralized deployment scheme.
  • FIG. 3 is a schematic diagram of a distributed deployment scheme.
  • FIG. 4 is a schematic flowchart of supervised learning to determine the terminal type judgment rule.
  • FIG. 5 is a schematic flowchart of unsupervised learning to determine the terminal type judgment rule.
  • FIG. 6 is a schematic flowchart of a method for judging the type of a terminal device according to an embodiment of the present application.
  • FIG. 7 is a structural block diagram of a computer device provided according to an embodiment of the present application.
  • At least one means one or more, and “plurality” means two or more.
  • And/or which describes the association relationship of the associated objects, indicates that there can be three kinds of relationships, for example, A and/or B, which can indicate: the existence of A alone, the existence of A and B at the same time, and the existence of B alone, where A, B can be singular or plural.
  • the character “/” generally indicates that the contextual objects are an "or” relationship.
  • “At least one of the following” or similar expressions refers to any combination of these items, including any combination of a single item(s) or a plurality of items(s).
  • At least one (a) of a, b or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, and c may be single or multiple.
  • words such as “first” and “second” do not limit the quantity and execution order.
  • a data stream can also be referred to simply as a stream.
  • a stream contains several packets. Packets have upstream and downstream directions. In general, the direction from the terminal device to the server can be used as the upstream direction, and the direction from the server to the terminal device can be used as the downstream direction.
  • a stream is identified by a quintuple. From the establishment of the connection between the terminal device and the server until the connection is disconnected, the source Internet Protocol (IP) address of all uplink packets transmitted during this period is the terminal device, and the destination IP address is the server; all The source IP address in the downlink packet is the server, and the destination IP address is the terminal device. Therefore, it can be considered that all the packets transmitted during this period are packets in one flow.
  • IP Internet Protocol
  • the terminal equipment serving as the sender of the upstream message in the data stream and the receiver of the downlink message can be the terminal equipment in the data stream or the terminal equipment corresponding to the data stream; as the receiver of the upstream message and the downlink message in the data stream
  • the server at the sending end of the data stream can be called the server in the data stream or the terminal device corresponding to the data stream.
  • terminal device A in data flow A means that the sender of all upstream packets in data flow A is terminal device A
  • server A in data flow A means that the sender of all downlink packets in data flow A is is server B.
  • Traffic can also be referred to as data traffic. Traffic is a collection of all data flows counted over a period of time. The traffic may include multiple data streams, and the communicating parties of any two streams in the multiple data streams may be the same or different.
  • the terminal devices referred to in the embodiments of this application may include IoT terminals and production terminals.
  • IoT terminals are specialized computer equipment with specific uses, such as medical devices, oil sensors, etc.
  • a production terminal is a computer device running a general-purpose operating system (such as a Windows operating system, a Linux operating system, etc.) but performing special functions, such as a queuing machine, a pick-up/registration machine, and the like.
  • FIG. 1 is a schematic diagram of a possible application scenario provided according to an embodiment of the present application.
  • the system 100 includes a network control device 101 , a network forwarding device 111 , a network forwarding device 112 , a terminal device 121 , a terminal device 122 , a terminal device 123 , a terminal device 124 , a terminal device 125 , a server 131 and a server 132 .
  • the terminal devices in the embodiments of the present application may be computer devices with one or more specific functions (for example, , ATM, electronic receipt cabinet, call/take machine, X-ray printer, camera, etc.), or it can be computer equipment with general functions (such as mobile phones, tablet computers, desktop computers, laptop computers, etc.).
  • the terminal device referred to in the embodiments of the present application can communicate with the server through a network forwarding device, read data stored in the server and/or write data to the server.
  • the terminal device 121 can access the server 131 through the network forwarding device 111, and read the data stored in the server 131; the terminal device 124 can access the server 132 through the network forwarding device 112, and send the data input.
  • the network forwarding devices may be switches/routers.
  • the network forwarding device can monitor the traffic generated by the end device. In some embodiments, the network forwarding device may also extract characteristics of the monitored traffic.
  • the network control device (eg, the network control device 101 shown in FIG. 1 ) may be a network controller, a server, a computer, or the like.
  • the network control device may determine the type of the terminal device based on the terminal type determination rule, and inventory the terminal devices in the network.
  • the work of judging the type of the terminal device may be implemented by the network forwarding device, and the work of inventorying the terminal device may be implemented by the network control device.
  • FIG. 2 is a schematic diagram of a centralized deployment scheme.
  • the judgment of the terminal device type and the inventory of assets are realized by the network control device.
  • the network control device 200 includes a rule configuration module 201 , a rule matching module 202 , an asset information extraction module 203 , an asset inventory module 204 and an asset library module 205 .
  • the rule configuration module 201 acquires the terminal type determination rule, and saves the terminal type determination rule.
  • the rule matching module 202 determines the type of the terminal device according to the mirror image of the data traffic and the terminal type determination rule saved by the rule configuration module 201 .
  • the asset information extraction module 203 extracts the asset information of the terminal device (eg, internet protocol (IP) address, port number and/or media access control (MAC) address, etc.).
  • IP internet protocol
  • MAC media access control
  • the asset inventory module 204 integrates the asset information extracted by the asset information extraction module 203 according to the judgment result of the matching rule module 202 (eg, merges, deduplicates, etc.), and then enters the integration result into the asset library module 205 .
  • the user can obtain the final asset inventory result through the asset library module 205 .
  • FIG. 3 is a schematic diagram of a distributed deployment scheme.
  • the judgment of the terminal device type and the extraction of asset information can be implemented by the network forwarding device.
  • the network control device is responsible for the final inventory of assets.
  • the network control device 310 includes a rule configuration module 311 , an asset inventory module 312 and an asset library module 313 .
  • the network forwarding device 320 includes a rule matching module 321 and an asset information extraction module 322 .
  • the rule configuration module 311 acquires the terminal type determination rule, and sends the acquired terminal type determination rule to the network forwarding device 320 .
  • the rule matching module 321 obtains the terminal type determination rule from the network control device 310 , determines the type of the terminal device according to the data flow and the terminal type determination rule, and reports the determination result to the network control device 310 .
  • the asset information extraction module 322 extracts asset information (eg, IP address, port number, and/or MAC address, etc.) of the terminal device and reports the extracted asset information to the network control device 310 .
  • asset information eg, IP address, port number, and/or MAC address, etc.
  • the asset inventory module 312 integrates the asset information extracted by the asset information extraction module 322 according to the judgment result of the rule matching module 321 (eg, merging, deduplication, etc.), and then enters the integration result into the asset library module 313 .
  • the rule matching module 321 e.g, merging, deduplication, etc.
  • the user can obtain the final asset inventory result through the asset library module 313 .
  • the network control device 200 shown in FIG. 2 and the network control device 310 shown in FIG. 3 may be the network control device 101 shown in FIG. 1 .
  • the network forwarding device 320 shown in FIG. 3 may be the network forwarding device 111 or the network forwarding device 112 shown in FIG. 1 .
  • the type of the terminal device is determined according to the terminal type judgment rule.
  • the terminal type judgment rule can be obtained by training based on historical data traffic. There are two methods for training the terminal type judgment rules. The first method is supervised learning; the second method is unsupervised learning.
  • FIG. 4 is a schematic flowchart of supervised learning to determine the terminal type judgment rule.
  • the terminal classification information is used to indicate types of multiple terminal devices and multiple terminal identification information.
  • the terminal classification information may also indicate the correspondence between multiple types and multiple terminal identification information.
  • the multiple types and the multiple terminal identification information are in one-to-one correspondence.
  • Table 1 is an illustration of terminal classification information.
  • the IP address range corresponding to the terminal device of type A is 192.101.1.1 to 192.1.1.10; the corresponding IP address range of terminal device of type B is 192.101.1.11 to 192.1.1.20; The IP address range corresponding to the terminal device is 192.101.1.21 to 192.1.1.30.
  • Table 1 is only a schematic representation of terminal classification information.
  • Table 1 uses an IP address as an example of terminal identification information.
  • the terminal identification information may include any one or more kinds of identification information capable of distinguishing different terminal devices.
  • the terminal identification information may include any one or more of the IP address, port number, or MAC address of the terminal device.
  • the terminal classification information is collected in advance. For example, it can be determined according to terminal devices that can support data fingerprinting and support protocol scanning. For another example, it may be obtained by manual statistics.
  • the traffic After acquiring the terminal classification information, the traffic can be monitored according to the terminal identification information in the terminal classification information, and a data flow including the terminal identification information in the terminal classification information can be extracted. .
  • the data stream extracted according to the terminal identification information in the terminal classification information may be referred to as a historical data stream.
  • K historical streams are acquired in total, and the value of K is greater than or equal to the total number of terminal device types.
  • the historical data flow in step 402 includes K historical flows.
  • Each type of terminal device in the multiple types indicated by the terminal classification information has at least one corresponding history stream in the K history streams.
  • data flows with at least one end device of each type are extracted as historical data flows.
  • the types of terminal devices include A, B, and C in total, and K can take a positive integer greater than or equal to 3.
  • the type of terminal equipment corresponding to at least one historical flow in the K historical flows is type A
  • the type of terminal equipment corresponding to at least one historical flow is type B
  • the type of terminal equipment corresponding to at least one historical flow is type C.
  • the type of terminal devices in at least one of the K historical streams is type A
  • the type of terminal devices in at least one of the at least one historical stream is type B
  • the type of terminal devices in at least one of the historical streams is type B.
  • the type of terminal equipment is type C.
  • the K historical flows can be divided into multiple reference flows, and the multiple reference flows are in one-to-one correspondence with the types of multiple terminal equipments.
  • the K pieces of historical traffic include reference traffic A, reference traffic B, and reference traffic C, wherein at least one of the reference traffic A included corresponds to the terminal device of type A.
  • Historical flow that is, the type of terminal device in each historical flow in reference flow A is type A
  • reference flow B includes at least one historical flow corresponding to a terminal device of type B
  • the type of terminal equipment in the reference flow C includes at least one historical flow corresponding to the terminal equipment of type C (that is, the type of terminal equipment in each historical flow in the reference flow C is type C).
  • the historical flow in the reference flow may also be referred to as a reference flow.
  • Each reference flow is determined from the corresponding candidate flow.
  • the candidate traffic is determined according to the terminal classification information.
  • multiple candidate flows can be determined, and the multiple candidate flows are in one-to-one correspondence with the types of multiple terminal devices.
  • the historical data traffic includes a plurality of reference traffic flows, and the plurality of reference traffic flows are in one-to-one correspondence with the types of the plurality of terminal devices. Therefore, the multiple reference flows also correspond one-to-one with the multiple candidate flows.
  • Each reference flow is determined according to the corresponding candidate flow.
  • a total of three candidate flows may be determined, which may be referred to as candidate flow A, candidate flow B, and candidate flow C, respectively.
  • the candidate flow A includes multiple candidate flows, and the type of the terminal device of each candidate flow in the multiple candidate flows is type A.
  • the candidate flow B also includes multiple candidate flows, and the type of the terminal device of each candidate flow in the multiple candidate flows is type B;
  • the candidate flow C also includes multiple candidate flows, each of the multiple candidate flows.
  • the type of the terminal device of the candidate stream is type C.
  • the candidate flow can be used as a reference flow in the corresponding reference flow.
  • the same access behavior may refer to the same source IP and destination IP. Whether the access behaviors of the two streams are the same can be judged by the upstream packets or downstream packets of the two streams. If the source IP addresses and destination IP addresses of the upstream packets of the two flows are the same, then the two flows can be considered to have the same access behavior; otherwise, the two flows are considered to have different access behaviors. If the source IP addresses and destination IP addresses of the downstream packets of the two flows are the same, then the two flows can be considered to have the same access behavior; otherwise, the two flows are considered to have different access behaviors.
  • IP 1 to IP 3 are the IP addresses of three terminal devices, and IP A, IP B, and IP C are the IP addresses of three servers.
  • the source IP address of the upstream packet of candidate flow 1 is IP 1
  • the destination IP address of the upstream packet of candidate flow 1 is IP A
  • the source IP address of the upstream packet of candidate flow 2 is IP 1
  • the source IP address of the upstream packet of candidate flow 2 is IP 1.
  • the destination IP address of the upstream packet is IP A
  • the source IP address of the upstream packet of candidate flow 3 is IP 2
  • the destination IP address of the upstream packet of candidate flow 3 is IP A
  • candidate flow 1 and candidate flow 2 have The same access behavior, candidate stream 1 and candidate stream 3 have different access behaviors.
  • the same access behavior may include: the same source IP, the same destination IP, the same source port, and the same destination IP port. Whether the access behaviors of the two streams are the same can be judged by the upstream packets or downstream packets of the two streams. If the source IP addresses, source port numbers, destination IP addresses and destination port numbers of the upstream packets of the two streams are the same, then the two streams can be considered to have the same access behavior; if the upstream packets of the two streams have the same access behavior If any one of the source IP address, source port number, destination IP address, and destination port number is different, it can be considered that the two flows have different access behaviors.
  • the two streams can be considered to have the same access behavior; if the downstream packets of the two streams have the same access behavior If any one of the source IP address, source port number, destination IP address, and destination port number is different, it can be considered that the two flows have different access behaviors.
  • the quintuple of packets (upstream or downstream) in the same direction of the two flows are identical, it is considered that the access behaviors of the two flows are the same.
  • T candidate flows with the highest number of candidate flows with the same access behavior among the candidate flows may be selected as reference flows in the reference flows corresponding to the candidate flows, where T is a preset positive integer.
  • the candidate flow A includes access behavior 1 to access behavior 5, a total of five candidate flows of access behavior, including a total of 100 candidate flows with access behavior 1, a total of 120 candidate flows with access behavior 2, and a total of 120 candidate flows with access behavior There are 80 candidate streams for 3, 20 candidate streams for access behavior 4, and 5 candidate streams for access behavior 5.
  • T can be a preset value, assuming that the value of T is 3. Assuming that the value of T is 3, the candidate flow with access behavior 1, the candidate flow with access behavior 2, and the candidate flow with access behavior 3 can be selected as reference flows in the reference flow.
  • T may also be calculated according to a preset ratio, and the ratio of the number of candidate streams selected as historical data traffic to the total number of candidate streams in one candidate traffic is a preset value. Then, the value of T can be determined according to the preset value and the total number of candidate flows included in the candidate flow.
  • N CAND ceil(T_all ⁇ P T %), where ceil(T_all ⁇ P T %) represents a pair of T_all ⁇
  • the manner of selecting the historical data traffic from the candidate traffic may also be determined according to the total number of traffic included in the candidate traffic and a preset ratio. For example, in the candidate flow A, the flow with the same access behavior is greater than 25% of the total flow number can be selected. Assume that there are 100 candidate flows with access behavior 1 in candidate flow A, 120 candidate flows with access behavior 2, 80 candidate flows with access behavior 3, and 20 candidate flows with access behavior 4. There are 5 candidate streams with access behavior 5, so it can be determined that the candidate streams with access behavior 1 account for 30.8% of the total candidate streams, and the candidate streams with access behavior 2 account for 36.9% of the total candidate streams.
  • the ratio of the candidate flow of 3 to the total candidate flow is 24.6%
  • the ratio of the candidate flow with access behavior 4 to the total candidate flow is 6.1%
  • the ratio of the candidate flow with access behavior 5 to the total candidate flow is 1.5, then it can be determined
  • the candidate flow with access behavior 1 and the candidate flow with access behavior 2 are used as the reference flow in the reference flow A.
  • the terminal type determination rule may include multiple sub-rules, and the multiple sub-rules correspond to the types of the multiple terminal devices one-to-one.
  • the historical data flow includes multiple reference flows, and the multiple reference flows are in one-to-one correspondence with the types of multiple terminal devices. Therefore, the multiple sub-rules also correspond one-to-one with multiple groups of historical data flows. Each sub-rule may be determined according to a corresponding reference flow and historical data flows other than a corresponding set of reference flows.
  • the terminal type judgment rule may include sub-rule A, sub-rule B and sub-rule C, wherein sub-rule A corresponds to the terminal device of type A, and sub-rule B corresponds to For type B terminal equipment, sub-rule C corresponds to type C terminal equipment.
  • the sub-rule A may be determined according to the reference traffic A and historical data traffic other than the reference traffic A.
  • the sub-rule B may be determined according to the reference traffic B and historical data traffic other than the reference traffic B.
  • the sub-rule C may be determined according to the reference flow C and historical data flows other than the reference flow C.
  • the access behavior of the terminal device of type A can be obtained according to the reference traffic A, the access behavior of other types of terminal devices can be determined according to the historical data traffic except the reference traffic A, and then the sub-rule A is determined by the set difference method.
  • the access behavior of the terminal device may include identification information of the server accessed by the terminal device, and the like.
  • the identification information of the server may include any one or more of the IP address, port number and MAC address of the server.
  • the server accessed by the terminal device can be determined, and then the identification information of the server can be obtained. According to the identification information of the server, the access behavior is summarized, and each sub-rule is obtained.
  • the servers accessed by different types of terminal devices are different. Therefore, the IP address of the server can be used as the basis for judging the type of the terminal device.
  • an ATM with deposit and withdrawal function can access the deposit server (hereinafter referred to as the deposit server) and the server responsible for the withdrawal function (hereinafter referred to as the withdrawal server); while the ATM with only withdrawal function can only access the withdrawal server; electronic receipt counter Only access the server that provides the receipt service (hereinafter referred to as the receipt server), but cannot access the deposit server or the withdrawal server.
  • Different servers have different identification information. In this way, different types of terminal devices can be distinguished according to the identification information of the server.
  • an ATM with deposit and withdrawal function is a type A terminal device
  • an ATM with only a cash withdrawal function is a type B terminal device
  • an electronic receipt cabinet is a type C terminal device.
  • the IP addresses of the servers accessed by different types of terminal devices are different. According to historical data traffic, it can be found that the IP addresses accessed by reference traffic A are IP W and IP D ; the IP addresses accessed by reference traffic B are IP W ; the IP addresses accessed by reference traffic C are IP R , where IP W represents the withdrawal server’s IP address. IP address, IP D represents the IP address of the deposit server, and IP R represents the IP address of the receipt server.
  • Sub-rule A IP W , IP D ;
  • Sub-rule B IP W ;
  • a judgment matrix can be used to represent the terminal type judgment rule, and the judgment matrix can be expressed as:
  • the judgment matrix M represents the judgment matrix.
  • the judgment matrix M includes a total of three rows of elements, wherein the three rows of elements are in one-to-one correspondence with the three sub-rules, the first element in each row element corresponds to IP W , the second element corresponds to IP D , and the third element corresponds to IP D .
  • the elements correspond to IP R . If the value of an element is 1, it means that the access behavior includes accessing the corresponding server; if the value of an element is 0, it means that the access behavior does not include accessing the corresponding server.
  • the sub-rules A are IP W and IP D , so the values of one row of elements (ie, the first row of elements) corresponding to the sub-rule A in the judgment matrix M are 1, 1, and 0 in sequence.
  • the servers accessed by different types of terminal devices may be the same, but the port numbers used by different functions to access the servers are different.
  • the IP address and port number of the server can be used as the basis for judging the type of the terminal device.
  • registration/acceptance machines Registration machines
  • diagnostic result printers there are three types of terminal devices: registration/acceptance machines, registration machines, and diagnostic result printers.
  • Server A can provide both registration and number retrieval functions.
  • the registration function is implemented through port A
  • the number retrieval function is implemented through port B.
  • Server B provides a diagnostic result function. Assume that the access behavior of reference traffic A includes two types.
  • Access behavior 1 is: IP A:Port A
  • access behavior 2 is: IP A:Port B
  • the access behavior of reference traffic B is: IP A:Port A
  • reference traffic C The access behavior is: IP B, where IP A represents the IP address of server A, IP B represents the IP address of server B, Port A represents the port number of port A, and Port B represents the port number of port B.
  • the difference set of the access behavior of the reference traffic A and the access behavior of the reference traffic B is IP A:Port B
  • the difference set of the access behaviors of the reference traffic A and the reference traffic C is IP A:Port A, IP A: Port B and IP B
  • the difference set of access behavior between reference traffic B and reference traffic C is also IP A: Port A, IP A: Port B and IP B.
  • Sub-rule A IP A:Port A, IP A:Port B;
  • Sub-rule B IP A: Port A;
  • judgment matrix If a judgment matrix is used to represent the terminal type judgment rule, then the judgment matrix can be expressed as
  • the judgment matrix M represents the judgment matrix.
  • the judgment matrix M includes a total of three rows of elements, wherein the three rows of elements correspond to the three sub-rules one-to-one, the first element in each row of elements corresponds to IP A:Port A, and the second element corresponds to IP A: Port B, the third element corresponds to IP B. If the value of an element is 1, it means that the access behavior includes accessing the corresponding server; if the value of an element is 0, it means that the access behavior does not include accessing the corresponding server.
  • sub-rule A is IP A:Port A
  • IP A:Port B so the values of one row of elements (that is, the first row of elements) corresponding to sub-rule A in the judgment matrix M are 1, 1, and 0 in turn.
  • FIG. 5 is a schematic flowchart of unsupervised learning to determine the terminal type judgment rule.
  • the historical flow included in the historical data flow may be divided into multiple reference flows, and the multiple reference flows are in one-to-one correspondence with multiple IP addresses.
  • the historical data flow can include reference flow 1, reference flow 2 and reference flow 3, wherein reference flow 1 includes at least one historical flow whose corresponding IP address is IP 1 (that is, the report in each historical flow in reference flow 1).
  • the IP address of the sender or receiver of the message is IP 1)
  • the reference flow 2 includes at least one historical flow whose corresponding IP address is IP 2 (that is, the sender or receiver of the message in the historical flow in the reference flow 2).
  • the IP address is IP 2)
  • the reference flow 3 includes at least one historical flow whose corresponding IP address is IP 3 (that is, the IP address of the sender or receiver of the packet in the historical flow in the reference flow 3 is IP 3).
  • the historical flow in the reference flow may also be referred to as a reference flow.
  • Each reference flow is determined from the corresponding candidate flow.
  • the collected traffic can be divided into multiple candidate flows, the multiple candidate flows are in one-to-one correspondence with multiple IP addresses, and each candidate flow includes multiple candidate flows.
  • the IP address of the sender or the receiver of the candidate flow belonging to the same candidate flow is the IP address corresponding to the candidate flow.
  • a total of 100 flows are collected, the sender IP address of flow 1 to flow 20 is IP1; the sender IP address of flow 21 to flow 40 is IP2, and the sender IP address of flow 41 to flow 100 is IP3, among which, IP1, IP2 and IP3 represent three different IP addresses.
  • the 100 flows can be divided into three candidate flows.
  • Candidate flow 1 includes flow 1 to flow 20, candidate flow 2 includes flow 21 to flow 40, and candidate flow 3 includes flow 41 to flow 100.
  • the candidate flow can be used as a reference flow in the corresponding reference flow.
  • the same access behavior may refer to the same source IP and destination IP. Whether the access behaviors of the two streams are the same can be judged by the upstream packets or downstream packets of the two streams. If the source IP addresses and destination IP addresses of the upstream packets of the two flows are the same, then the two flows can be considered to have the same access behavior; otherwise, the two flows are considered to have different access behaviors. If the source IP addresses and destination IP addresses of the downstream packets of the two flows are the same, then the two flows can be considered to have the same access behavior; otherwise, the two flows are considered to have different access behaviors.
  • IP 1 to IP 3 are the IP addresses of three terminal devices, and IP A, IP B, and IP C are the IP addresses of three servers.
  • the source IP address of the upstream packet of candidate flow 1 is IP 1
  • the destination IP address of the upstream packet of candidate flow 1 is IP A
  • the source IP address of the upstream packet of candidate flow 2 is IP 1
  • the source IP address of the upstream packet of candidate flow 2 is IP 1.
  • the destination IP address of the upstream packet is IP A
  • the source IP address of the upstream packet of candidate flow 3 is IP 2
  • the destination IP address of the upstream packet of candidate flow 3 is IP A
  • candidate flow 1 and candidate flow 2 have The same access behavior, candidate stream 1 and candidate stream 3 have different access behaviors.
  • the same access behavior may include: the same source IP, the same destination IP, the same source port, and the same destination IP port. Whether the access behaviors of the two streams are the same can be judged by the upstream packets or downstream packets of the two streams. If the source IP addresses, source port numbers, destination IP addresses and destination port numbers of the upstream packets of the two streams are the same, then the two streams can be considered to have the same access behavior; if the upstream packets of the two streams have the same access behavior If any one of the source IP address, source port number, destination IP address, and destination port number is different, it can be considered that the two flows have different access behaviors.
  • the two streams can be considered to have the same access behavior; if the downstream packets of the two streams have the same access behavior If any one of the source IP address, source port number, destination IP address, and destination port number is different, it can be considered that the two flows have different access behaviors.
  • the quintuple of packets (upstream or downstream) in the same direction of the two flows are identical, it is considered that the access behaviors of the two flows are the same.
  • T candidate flows with the highest number of candidate flows with the same access behavior among the candidate flows may be selected as reference flows in the reference flows corresponding to the candidate flows, where T is a preset positive integer.
  • the candidate flow A includes access behavior 1 to access behavior 5, a total of five candidate flows of access behavior, including a total of 100 candidate flows with access behavior 1, a total of 120 candidate flows with access behavior 2, and a total of 120 candidate flows with access behavior There are 80 candidate streams for 3, 20 candidate streams for access behavior 4, and 5 candidate streams for access behavior 5.
  • T can be a preset value, assuming that the value of T is 3. Assuming that the value of T is 3, the candidate flow with access behavior 1, the candidate flow with access behavior 2, and the candidate flow with access behavior 3 can be selected as reference flows in the reference flow.
  • T may also be calculated according to a preset ratio, and the ratio of the number of candidate streams selected as historical data traffic to the total number of candidate streams in one candidate traffic is a preset value. Then, the value of T can be determined according to the preset value and the total number of candidate flows included in the candidate flow.
  • N CAND ceil(T_all ⁇ P T %), where ceil(T_all ⁇ P T %) represents a pair of T_all ⁇
  • the manner of selecting the historical data traffic from the candidate traffic may also be determined according to the total number of traffic included in the candidate traffic and a preset ratio. For example, in the candidate flow A, the flow with the same access behavior is greater than 25% of the total flow number can be selected. Assume that there are 100 candidate flows with access behavior 1 in candidate flow A, 120 candidate flows with access behavior 2, 80 candidate flows with access behavior 3, and 20 candidate flows with access behavior 4. There are 5 candidate streams with access behavior 5, so it can be determined that the candidate streams with access behavior 1 account for 30.8% of the total candidate streams, and the candidate streams with access behavior 2 account for 36.9% of the total candidate streams.
  • the ratio of the candidate flow of 3 to the total candidate flow is 24.6%
  • the ratio of the candidate flow with access behavior 4 to the total candidate flow is 6.1%
  • the ratio of the candidate flow with access behavior 5 to the total candidate flow is 1.5, then it can be determined
  • the candidate flow with access behavior 1 and the candidate flow with access behavior 2 are used as the reference flow in the reference flow A.
  • step 502 determines the identity of the identification information in each historical flow in the historical data flow, that is, whether the IP address, port number or MAC address, etc. belong to the terminal device or the server.
  • the identification information of the terminal device in the historical data flow can be determined first, and then it can be determined that another identification information in the data flow belongs to the server.
  • the identification information of the terminal device can be determined in the following three ways:
  • the traffic in the network collected in step 501 is the upstream traffic collected from the network forwarding device or the upstream port of the terminal device.
  • the sender of the upstream traffic is the terminal device, and the receiver is the server.
  • the proportion of actively establishing connections for each IP address can be counted. Under normal circumstances, the number of times that the IP address of the terminal device actively establishes a connection is greater than the number of times that the server actively establishes a connection. If the proportion of an IP address actively establishing a connection is greater than a preset proportion threshold, it can be determined that the IP address is the IP address of a terminal device. The proportion of IP addresses actively establishing connections can be judged by counting the sending and receiving of synchronize sequence number (SYN) packets. If an IP address sends a SYN packet, the IP address is the IP address that actively established the connection.
  • SYN synchronize sequence number
  • the IP address is the IP address of the terminal device. After the identity of the IP address is determined, the identity of the port number and/or the MAC address can be determined.
  • IP 1 sends 9 SYN packets to IP X
  • IP X sends 1 SYN packet to IP 1.
  • the proportion of IP 1 sending SYN packets is 90%.
  • the preset ratio threshold is 80%, it can be determined that IP 1 is the IP address of the terminal device.
  • IP X is the IP address of the server.
  • the source IP address and the destination IP address of each data flow are counted, and determined according to the statistical result.
  • the number of servers accessed by one terminal device is smaller than the number of terminal devices accessed by one server.
  • an ATM that supports deposit and withdrawal functions may access two servers (deposit server and withdrawal server), while an ATM that only supports withdrawal functions may only access the withdrawal server, and an ATM that accesses the withdrawal server may have Thousands of them. Therefore, a threshold for the number of IP addresses can be preset. Count the number of different destination IP addresses corresponding to an IP address when it is used as the source IP address in the historical data stream.
  • the IP address is a terminal The IP address of the device; if the number of different IP addresses corresponding to the IP address is greater than or equal to the preset number of IP addresses threshold, then the IP address is the IP address of the server.
  • the server set corresponding to each terminal device can be determined.
  • the servers corresponding to the three history streams of terminal device 1 are server 1 , server 2 and server 3 respectively, then the server set corresponding to terminal device 1 includes: server 1 , server 2 and server 3 .
  • the servers corresponding to the two history streams corresponding to terminal device 2 are server 3 and server 4 respectively, then the server set corresponding to terminal device 2 includes server 3 and server 4 .
  • the multiple server sets are in one-to-one correspondence with the multiple terminal devices. For example, suppose there are three server sets in total, namely server set 1, server set 2, and server set 3. Server set 1 is the server set corresponding to terminal device 1, server set 2 is the server set corresponding to terminal device 2, and server set 3 is the server set corresponding to the terminal device 3 . In this case, the terminal device 1 to the terminal device 3 can be clustered according to the server set 1 to the server set 3 to obtain a clustering result.
  • the clustering algorithm adopted in this embodiment of the present application may be a spectral clustering algorithm.
  • server 1 server 2 server 3 server 4 Terminal equipment 1 1 1 1 0 Terminal equipment 2 1 1 1 0 Terminal equipment 3 0 1 1 1 1
  • the three rows of the access matrix shown in Table 2 correspond to terminal equipment 1 to terminal equipment 3, respectively.
  • the value of the corresponding element in the matrix of the elements contained in the server set corresponding to each terminal device in the terminal device 1 to the terminal device 3 is 1, otherwise it is 0.
  • the server set corresponding to terminal device 1 includes server 1 , server 2 and server 3 . Therefore, in the first row of elements in Table 2, the elements corresponding to server 1, server 2, and server 3 have a value of 1, and the elements corresponding to server 4 and server 5 have a value of 0.
  • the similarity matrix can be calculated.
  • the vector angle between IP1 and IP2 can be determined according to the following formula:
  • the elements in the first row are the similarity between IP1 and IP1, the similarity between IP1 and IP2, the similarity between IP1 and IP3, the elements in the second row are the similarity between IP2 and IP1, the similarity between IP2 and IP2 Similarity, the similarity between IP2 and IP3, the third line elements are the similarity between IP3 and IP1, the similarity between IP3 and IP2, and the similarity between IP3 and IP3.
  • the degree matrix can be calculated, that is, the degree matrix is obtained by summing each row of the similarity matrix, and then the Laplace matrix is determined according to the degree matrix and the similarity matrix.
  • the Laplacian matrix can be determined by the following formula:
  • L represents the Laplace matrix
  • D represents the degree matrix
  • S represents the similarity matrix
  • the Laplacian matrix After getting the Laplacian matrix, the Laplacian matrix can be normalized according to the following formula:
  • L_normal represents the normalized Laplacian matrix
  • D represents the degree matrix
  • L represents the Laplacian matrix
  • the k smallest eigenvalues of the standardized Laplacian matrix can be taken to obtain the corresponding n ⁇ k-dimensional eigenvector matrix.
  • K-means algorithm it can be regarded as N samples (ie, n terminal devices), each of which is k-dimensional, are clustered into m clusters (C1, C2, ... Cm), that is, similar terminal devices are clustered together.
  • other clustering algorithms such as DBSCAN, etc.
  • DBSCAN DBSCAN, etc.
  • each terminal device is a vertex in the graph
  • the similarity matrix is the adjacency matrix between each vertex.
  • the clustering result may include multiple clusters, each cluster includes one or more terminal devices among the multiple terminal devices, and there is no intersection between any two clusters among the multiple clusters.
  • the clustering result after clustering may include three clusters, which are called cluster A, cluster B and cluster C respectively, wherein cluster A includes terminal device 1, and cluster B includes terminal equipment.
  • Device 3 includes terminal device 2.
  • Each cluster in the plurality of clusters corresponds to a type of terminal device.
  • the terminal device type corresponding to each cluster may be manually determined.
  • each cluster may include one or more terminal devices that can support data fingerprinting and support protocol scanning.
  • the type of terminal device corresponding to each cluster can be determined according to the terminal devices that support data fingerprints and support protocol scanning. Taking cluster A, cluster B and cluster C as examples, the type of terminal equipment corresponding to cluster A is type A, the type of terminal equipment corresponding to cluster B is type B, and the type of terminal equipment corresponding to cluster C is type C.
  • the terminal type judgment rule can be determined according to the access behavior of the terminal device of each cluster.
  • the terminal type determination rule may include multiple sub-rules, and the multiple sub-rules correspond to the types of the multiple terminal devices one-to-one.
  • the multiple terminal devices included in the historical data traffic are clustered into multiple clusters, and the multiple clusters are in one-to-one correspondence with the types of the multiple terminal devices. Therefore, the multiple sub-rules also correspond to multiple clusters one-to-one.
  • Each sub-rule may be determined according to a corresponding one cluster and clusters other than the corresponding one cluster.
  • the terminal type judgment rule may include sub-rule A, sub-rule B and sub-rule C, wherein sub-rule A corresponds to the terminal device of type A, and sub-rule B corresponds to For type B terminal equipment, sub-rule C corresponds to type C terminal equipment.
  • the sub-rule A is determined according to the access behavior of terminal devices in cluster A and the access behaviors of terminal devices in other clusters except cluster A (ie, cluster B and cluster C) by adopting a set difference method.
  • the specific determination method of the terminal type determination rule is similar to the determination method of the terminal type determination rule in the method based on supervised learning, and is not repeated here for brevity.
  • the supervised learning process shown in FIG. 4 and the unsupervised learning process shown in FIG. 5 may be implemented by components (such as chips or circuits, etc.) in the network control device or the case control device.
  • the network control device may further include a rule learning module.
  • a computer device such as a server, a workstation
  • a cloud service capable of providing supervised learning/unsupervised learning can be used to determine the terminal type judgment rule. Then, the determined terminal type judgment rule is sent to the network control device.
  • the type of each terminal device in the network can be determined.
  • the terminal type judgment rule is the judgment matrix shown in Table 4.
  • server 1 server 2 server 3 server 4
  • Type A 1 1 0 0
  • Type B 1 1 1 0
  • Type C 0 0 0 1
  • Y represents the judgment matrix
  • y' represents the transposed matrix of the reference matrix y.
  • 3 is the largest, and the location is 2, that is, the second device type, that is, type B.
  • the statistics on the access behavior of a terminal device may be within an observation period.
  • the observation period can be set as required, for example, it can be granular in hours (eg, 12 hours, 24 hours), or in days or weeks.
  • the access behavior of the terminal device determined when determining the terminal type judgment rule may also be counted in the observation period.
  • Each element in the judgment matrix shown in Table 4 indicates whether a certain type of terminal device has accessed the server.
  • the elements in the judgment matrix may also represent the probability that a certain terminal device accesses the server. For example, a statistical period is divided into multiple time windows, and each element in the judgment matrix represents the probability of a certain type of terminal equipment appearing in the multiple time windows. For example, if the statistical period is one week and each time window is 30 minutes, there are 336 time windows in the entire statistical period.
  • the value of the element corresponding to type A and server 1 is 1; if the terminal device of type A has only accessed server 2 in 168 time windows , then the element corresponding to type A and server 2 has a value of 0.5. It is assumed that Table 5 is a judgment matrix determined according to the access probability.
  • server 1 server 2 server 3 server 4
  • Type A 1 0.5 0 0
  • Type B 1 0.8 0.8 0
  • Type C 0 0 0 1
  • Y represents the judgment matrix
  • y' represents the transpose matrix of the reference matrix y.
  • only some terminal devices in the network may determine their type according to the terminal type determination rule. In other words, some terminal devices may not be able to determine their type according to the terminal type determination rule.
  • an unsupervised learning method can be used to cluster these terminal devices to obtain multiple clusters. The multiple clusters are in one-to-one correspondence with multiple terminal types. The terminal type corresponding to each cluster can then be determined manually or by using some terminal devices that support data fingerprinting and support protocol scanning.
  • FIG. 6 is a schematic flowchart of a method for judging the type of a terminal device according to an embodiment of the present application. The method shown in FIG. 6 may be performed by a network forwarding device or a network control device.
  • the Terminal type judgment rules are obtained by training based on historical data traffic.
  • the first data flow may include the data flow counted in the first time period.
  • Each of the at least one data flow included in the first data flow includes one or more uplink packets, and the sender of the one or more uplink packets is the first terminal device.
  • the historical data flow is the data flow obtained by statistics in the second time period, wherein the end time of the second time period is earlier than the start time of the first time period.
  • the historical data flow is the data flow obtained before the first data flow is obtained.
  • the sender of the historical data traffic includes multiple types of terminal devices, and the type of the first terminal device is one of the multiple types.
  • the historical data stream includes multiple historical streams, and each historical stream in the multiple historical streams includes one or more upstream packets.
  • the senders of the uplink messages in the multiple historical flows include multiple terminal devices. Each type of terminal device in the plurality of types has at least one corresponding history flow.
  • the terminal type judgment rule is obtained by training according to the historical data traffic and terminal classification information, wherein the terminal classification information is used to indicate the multiple types and multiple sets of terminal identification information, and the multiple sets of terminal identification information
  • Each group of terminal identification information in the information includes identification information of at least one terminal, and the terminal classification information is also used to indicate the corresponding relationship between the multiple types and the multiple groups of terminal identification information, and the multiple types and the multiple groups of terminal identification information one by one.
  • each terminal identification information in the plurality of terminal identification information includes identification information of at least one terminal device, and the historical data flow is determined according to the terminal classification information.
  • the identification information may include any one or more of IP addresses, port numbers, or MAC addresses.
  • the terminal identification information may include one or more of the IP address of the terminal device, the port number of the terminal device, or the MAC address of the terminal device. If it is an uplink packet, the terminal identification information is one or more of the source IP address, source port number or source MAC address. If it is a downlink message, the terminal identification information is one or more of the destination IP address, destination port number or destination MAC address.
  • the historical data flow includes multiple reference flows, the multiple reference flows are in one-to-one correspondence with the multiple types, the multiple reference flows include a first reference flow, and the type corresponding to the first reference flow is The type of the first terminal device; the terminal type judgment rule includes multiple sub-rules, the multiple sub-rules are in one-to-one correspondence with the multiple types, and the sub-rule corresponding to the type of the first terminal device in the multiple sub-rules is based on The first reference flow rate and the reference flow rate other than the first reference flow rate among the plurality of reference flow rates are determined.
  • the first reference flow is determined according to a first candidate flow
  • the first candidate flow is a flow corresponding to the type of the first terminal device among the plurality of candidate flows.
  • the number of times that the access behavior corresponding to each data flow appears in the first candidate flow is greater than the number of times that the access behavior corresponding to the data flow that does not belong to the first reference flow appears in the first candidate flow.
  • the terminal type determination rule is determined according to a clustering result obtained by clustering P terminal devices by a set of P servers, the P terminal devices are determined according to the historical data traffic, and the P terminal devices are determined according to the historical data traffic.
  • Terminal devices are in one-to-one correspondence with the P server sets, each server set in the P server sets is a set of servers accessed by the corresponding terminal device, the P terminal devices include the multiple types of terminal devices, and P is A positive integer greater than or equal to the total number of types of end devices.
  • the historical data traffic is upstream data streams of the P terminal devices, and the P terminal devices are senders of the historical data traffic.
  • the ratio of the number of times each of the P terminal devices acts as the sender of the synchronization message in the historical data traffic to the number of times the terminal device acts as the receiver of the synchronization message is greater than a second preset ratio.
  • the historical data flow includes P reference flows
  • the plurality of reference flows are in one-to-one correspondence with the P terminal devices
  • the P reference flows are in one-to-one correspondence with the P candidate flows
  • the second reference flow includes The number of times that the access behavior corresponding to each data flow of the The flow is any one of the P reference flows.
  • the terminal type judgment rule is a judgment matrix, and the judgment matrix includes multi-row elements, and the multi-row elements are in one-to-one correspondence with the multiple types; the terminal type judgment rule is based on the access of the first terminal device. Behavior, determining the type of the first terminal device, including: according to the access behavior of the first terminal, from the judgment matrix, determining the target row that matches the access behavior of the first terminal device; determining the type of the first terminal device Type corresponding to the target row.
  • determining the target row corresponding to the access behavior of the first terminal device from the judgment matrix according to the access behavior of the first terminal includes: according to the access behavior of the first terminal, determining a reference matrix (for example, the reference matrix y) in the above-mentioned embodiment, wherein the values of a plurality of elements included in the reference matrix match the access behavior of the first terminal device; multiply the judgment matrix with the reference matrix to obtain a target matrix, which The multiple elements included in the target matrix are in one-to-one correspondence with the multiple row elements of the judgment rule; the row element corresponding to the element with the largest value in the target matrix is determined as the target row.
  • a reference matrix For example, the reference matrix y
  • FIG. 7 is a structural block diagram of a computer device provided according to an embodiment of the present application.
  • the computer device 700 shown in FIG. 7 may be the network control device or the network forwarding device in the above embodiment.
  • the computer device 700 shown in FIG. 7 includes an acquisition unit 701 and a processing unit 702 .
  • the obtaining unit 701 is configured to obtain a first data flow, where the sender of the first data flow is a first terminal device.
  • the processing unit 702 is configured to determine the access behavior of the first terminal device according to the identification information of the receiving end of the packet in the first data flow.
  • the processing unit 702 is further configured to determine the type of the first terminal device according to the terminal type determination rule and the access behavior of the first terminal device, wherein the terminal type determination rule is used to indicate the access behavior of the terminal device and the type of the terminal device.
  • the corresponding relationship of the terminal type judgment rule is obtained by training according to the historical data traffic.
  • the acquiring unit 701 may be implemented by a transceiver circuit, and the processing unit 702 may be implemented by a processor.
  • the processing unit 702 may be implemented by a processor.
  • FIG. 7 is only an example and not a limitation, and the above-mentioned computer device including the acquiring unit and the processing unit may not depend on the structure shown in FIG. 7 .
  • the chip When the computer device 700 is a chip, the chip includes an acquisition unit and a processing unit.
  • the acquisition unit may be an input/output circuit or a communication interface;
  • the processing unit may be a processor or a microprocessor or an integrated circuit integrated on the chip.
  • Embodiments of the present application also provide a computer device, including a processor and a memory.
  • the processor is configured to be coupled with the memory to read and execute the instructions and/or program codes in the memory, so as to execute the steps executed by the network control device in the above method embodiments.
  • Embodiments of the present application also provide a computer device, including a processor and a memory.
  • the processor is configured to be coupled with the memory to read and execute the instructions and/or program codes in the memory, so as to execute the learning step of the terminal type judgment rule in the above method embodiment.
  • Embodiments of the present application also provide a computer device, including a processor and a memory.
  • the processor is configured to be coupled with the memory to read and execute the instructions and/or program codes in the memory, so as to execute the steps executed by the network forwarding device in the foregoing method embodiments.
  • the above-mentioned processor may be a chip.
  • the processor may be a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), a system on chip (SoC), or a It is a central processing unit (CPU), a network processor (NP), a digital signal processing circuit (DSP), or a microcontroller (microcontroller unit). , MCU), it can also be a programmable logic device (PLD), other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or other integrated chips.
  • FPGA field programmable gate array
  • ASIC application specific integrated circuit
  • SoC system on chip
  • MCU microcontroller unit
  • MCU programmable logic device
  • PLD programmable logic device
  • each step of the above-mentioned method can be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software.
  • the steps of the methods disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware processor, or executed by a combination of hardware and software modules in the processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware. To avoid repetition, detailed description is omitted here.
  • the processor in this embodiment of the present application may be an integrated circuit chip, which has a signal processing capability.
  • each step of the above method embodiments may be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware.
  • the memory in this embodiment of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically programmable read-only memory (EPROM). Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • Volatile memory may be random access memory (RAM), which acts as an external cache.
  • RAM random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous DRAM
  • SDRAM double data rate synchronous dynamic random access memory
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous link dynamic random access memory
  • direct rambus RAM direct rambus RAM
  • the present application also provides a computer program product, the computer program product includes: computer program code, when the computer program code is run on a computer, the computer is made to execute the network control in the above-mentioned embodiment. The various steps performed by the device.
  • the present application also provides a computer program product, the computer program product includes: computer program code, when the computer program code is run on a computer, the computer is made to execute the terminal type in the above embodiment. Judgment rules learn the individual steps.
  • the present application also provides a computer program product, the computer program product includes: computer program code, when the computer program code runs on a computer, the computer is made to execute the network forwarding in the above-mentioned embodiment. The various steps performed by the device.
  • the present application further provides a computer-readable medium, where program codes are stored in the computer-readable medium, and when the program codes are run on a computer, the computer is made to execute the network control in the above-mentioned embodiments.
  • the various steps performed by the device are not limited to the above-mentioned embodiments.
  • the present application further provides a computer-readable medium, where program codes are stored in the computer-readable medium, and when the program codes are run on a computer, the computer is made to execute the terminal type in the above-mentioned embodiments.
  • the various steps of judgment rule learning are described in detail below.
  • the present application further provides a computer-readable medium, where the computer-readable medium stores program code, when the program code is executed on a computer, the computer is made to execute the network forwarding in the above-mentioned embodiment.
  • the various steps performed by the device are not limited to the above-mentioned embodiment.
  • the present application further provides a system, which includes the foregoing network forwarding device and network control device.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Telephone Function (AREA)

Abstract

本申请提供了一种判断终端设备类型的方法和相关设备,该方法包括:获取第一数据流量;根据该第一数据流量中的报文的接收端的标识信息,确定该第一终端设备的访问行为;根据通过历史数据流量训练的终端类型判断规则和该第一终端设备的访问行为。上述技术方案可以利用预先训练好的终端类型判断规则,确定出网络中的每个终端设备的类型,从而为后续的设备盘点打好基础。此外,上述技术方案中使用的终端类型判断规则是基于历史流量数据确定的,而非基于静态指纹库确定的。因此,上述技术方案的应用范围更加广泛,是一种更加有效的确定终端设备类型的方案。

Description

判断终端设备类型的方法和相关设备
本申请要求于2021年1月20日提交国家知识产权局、申请号为202110078112.9、发明名称为“一种信息管理方法、设备及系统”的中国专利申请的优先权,以及于2021年04月19日提交国家知识产权局、申请号为202110420570.6、申请名称为“判断终端设备类型的方法和相关设备”的中国专利申请的优先权,上述中国专利申请全部内容通过引用结合在本申请中。
技术领域
本申请涉及信息技术领域,更具体地,涉及判断终端设备类型的方法和相关设备。
背景技术
随着信息技术的发展,使用终端设备替代人工已经成为一个趋势。这一点在银行、医院等服务行业体现的更为突出。例如,在银行,存取款和转账都可以通过银行的自动柜员机(automated teller machine,ATM)实现;投取客户回单也可以通过电子回单柜实现。在医院,挂号、区号、打印诊断结果也可以使用终端设备实现。
对网络中的终端设备进行全面且有效的识别,才能做到所谓的“摸清家底”,从而才能认清风险,找出漏洞,从而实现网络的安全检查。
目前的终端设备的类型识别依靠商业指纹库扫描以及手动静态维护。然而,指纹库一般都是依赖人工录入,很多与特定行业相关的终端设备是没有完整的静态指纹库的。除此之外,数据采集时需要依赖特定协议对终端设备进行扫描,这就要求终端设备必须支持协议扫描或者安装支持盘点功能的客户端,才有可能发现资产。很多终端设备交互报文少,指纹库要求的信息无法发出,或者终端设备本身根本没有硬件等环境支持协议扫描或者安装盘点客户端。
因此,如何有效地确定终端设备的类型是本领域亟待解决的问题。
发明内容
本申请提供一种判断终端设备类型的方法和相关设备,可以提升确定终端设备类型的效果。
第一方面,本申请实施例提供一种判断终端设备类型的方法,包括:获取第一数据流量,该第一数据流量的发送端为第一终端设备;根据该第一数据流量中的报文的接收端的标识信息,确定该第一终端设备的访问行为;根据终端类型判断规则和该第一终端设备的访问行为,确定该第一终端设备的类型,其中,该终端类型判断规则用指示终端设备的访问行为与终端设备的类型的对应关系,该终端类型判断规则是根据历史数据流量训练得到的。
上述技术方案可以利用预先训练好的终端类型判断规则,确定出网络中的每个终端设 备的类型,从而为后续的设备盘点打好基础。此外,上述技术方案中使用的终端类型判断规则是基于历史流量数据确定的,而非基于静态指纹库确定的。因此,上述技术方案的可以应用到不支持静态指纹库或者协议扫描的终端设备。这样,上述技术方案的应用范围更加广泛,是一种更加有效的确定终端设备类型的方案。
结合第一方面,在第一方面的一种可能的实现方式中,该历史数据流量的发送端包括多个类型的终端设备,该第一终端设备的类型为该多个类型中的一个。
可选的,在一些实施例中,该历史数据流量的发送端中可以不包括第一终端设备。
结合第一方面,在第一方面的一种可能的实现方式中,该终端类型判断规则是根据该历史数据流量和终端分类信息训练得到的,其中,该终端分类信息用于指示该多个类型和多组终端标识信息,该多组终端标识信息中的每组终端标识信息包括至少一个终端的标识信息,该终端分类信息还用于指示该多个类型和多组终端标识信息的对应关系,该多个类型和多组终端标识信息一一对应,该多个终端标识信息中的每个终端标识信息包括至少一个终端设备的标识信息,该历史数据流量是根据该终端分类信息确定的。
结合第一方面,在第一方面的一种可能的实现方式中,该历史数据流量包括多个参考流量,该多个参考流量与该多个类型一一对应,该多个参考流量包括第一参考流量,该第一参考流量对应的类型为该第一终端设备的类型;该终端类型判断规则包括多条子规则,该多条子规则与该多个类型一一对应,该多条子规则中对应于该第一终端设备的类型的子规则是根据该第一参考流量和该多个参考流量中除该第一参考流量以外的参考流量确定的。
结合第一方面,在第一方面的一种可能的实现方式中,该第一参考流量是根据第一候选流量确定的,该第一候选流量是多个候选流量中与该第一终端设备的类型对应的流量,该第一参考流量中的每个数据流对应的访问行为在该第一候选流量中出现的次数大于不属于该第一参考流量的数据流对应的访问行为在该第一候选流量中出现的次数。
结合第一方面,在第一方面的一种可能的实现方式中,该终端类型判断规则是根据P个服务器集合对P个终端设备进行聚类得到的聚类结果确定的,该P个终端设备是根据该历史数据流量确定的,该P个终端设备与该P个服务器集合一一对应,该P个服务器集合中的每个服务器集合是对应的终端设备访问的服务器的集合,该P个终端设备包括该多个类型的终端设备,P为大于或等于终端设备的总类型数目的正整数。
结合第一方面,在第一方面的一种可能的实现方式中,该历史数据流量是该P个终端设备的上行数据流,该P个终端设备是该历史数据流量的发送端。
结合第一方面,在第一方面的一种可能的实现方式中,该P个终端设备中的每个终端设备在该历史数据流量中作为同步报文的发送端的次数与作为同步报文的接收端的次数之比大于第二预设比例。
结合第一方面,在第一方面的一种可能的实现方式中,该历史数据流量包括P个参考流量,该多个参考流量与该P个终端设备一一对应,该P个参考流量与P个候选流量一一对应,第二参考流量包括的每个数据流对应的访问行为在对应的第二候选流量中出现的次数大于不属于该第二参考流量的数据流对应的访问行为在该第二候选流量中出现的次数,该第二参考流量为该P个参考流量中的任一个参考流量。
结合第一方面,在第一方面的一种可能的实现方式中,该终端类型判断规则为判断矩 阵,该判断矩阵包括多行元素,该多行元素与该多个类型一一对应;该根据终端类型判断规则和该第一终端设备的访问行为,确定该第一终端设备的类型,包括:根据该第一终端的访问行为,从该判断矩阵中确定与该第一终端设备的访问行为匹配的目标行;确定该第一终端设备的类型为该目标行对应的类型。
结合第一方面,在第一方面的一种可能的实现方式中,该根据该第一终端的访问行为,从该判断矩阵中确定与该第一终端设备的访问行为对应的目标行,包括:根据该第一终端的访问行为,确定参考矩阵,其中该参考矩阵包括的多个元素的值与该第一终端设备的访问行为相匹配;将该判断矩阵与该参考矩阵相乘,得到目标矩阵,该目标矩阵包括的多个元素与该判断规则的多行元素一一对应;确定该目标矩阵中值最大的元素对应的一行元素为该目标行。
第二方面,本申请实施例提供一种计算机设备,该计算机设备包括用于实现第一方面或第一方面的任一种可能的实现方式的单元。
第三方面,本申请实施例提供一种计算机设备,该计算机设备包括处理器,该处理器用于与存储器耦合,读取并执行该存储器中的指令和/或程序代码,以执行第一方面或第一方面的任一种可能的实现方式。
第四方面,本申请实施例提供一种芯片系统,该芯片系统包括逻辑电路,该逻辑电路用于与输入/输出接口耦合,通过该输入/输出接口传输数据,以执行第一方面或第一方面任一种可能的实现方式。
第五方面,本申请实施例提供一种计算机可读存储介质,该计算机可读存储介质存储有程序代码,当该计算机存储介质在计算机上运行时,使得计算机执行如第一方面或第一方面的任一种可能的实现方式。
第六方面,本申请实施例提供一种计算机程序产品,该计算机程序产品包括:计算机程序代码,当该计算机程序代码在计算机上运行时,使得该计算机执行如第一方面或第一方面的任一种可能的实现方式。
附图说明
图1是根据本申请实施例提供的一种可能的应用场景的示意图。
图2是集中部署的方案的示意图。
图3是分布式部署的方案的示意图。
图4是有监督学习确定该终端类型判断规则的示意性流程图。
图5是无监督学习确定该终端类型判断规则的示意性流程图。
图6是根据本申请实施例一种判断终端设备类型的方法的示意性流程图。
图7是根据本申请实施例提供的一种计算机设备的结构框图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字 符“/”一般表示前后关联对象是一种“或”的关系。“以下中的至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a、b、c、a-b、a-c、b-c、或a-b-c,其中a、b、c可以是单个,也可以是多个。另外,在本申请的实施例中,“第一”、“第二”等字样并不对数量和执行次序进行限定。
需要说明的是,本申请中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
为了帮助本领域技术人员更好地理解本申请的技术方案,首先对本申请涉及到的一些概念进行简单介绍。
1,数据流
数据流也可以简称为流。一条流中包含若干报文。报文有上行方向和下行方。一般情况下,可以将终端设备发往服务器方向作为上行方向,服务器发往终端设备的方向作为下行方向。一条流由五元组标识。终端设备与服务器从建立连接开始到连接断开为止,这期间传输的所有上行报文的中的源互联网协议(internet protocol,IP)地址都是该终端设备,目的IP地址都是该服务器;所有下行报文中的源IP地址都是该服务器,目的IP地址都是该终端设备。因此,可以认为这期间传输的所有报文是一条流中的报文。
作为数据流中的上行报文的发送端和下行报文的接收端的终端设备可以数据流中的终端设备或者数据流对应的终端设备;作为数据流中的上行报文的接收端和下行报文的发送端的服务器可以称该数据流中的服务器或者数据流对应的终端设备。例如,数据流A中的终端设备A是指数据流A的所有上行报文的发送端都是终端设备A,数据流A中的服务器A是指数据流A中所有下行报文的发送端都是服务器B。
2,流量
流量也可以是称为数据流量。流量是一段时间内统计到的所有数据流的集合。流量可以包括多个数据流,该多个数据流中的任意两条流的通信双方可以相同也可以不相同。
3,终端设备
本申请实施例中所称的终端设备可以包括物联网终端和生产终端。物联网终端是具有特定用途的专用计算机设备,例如医疗器械、石油传感器等。生产终端是运行有通用的操作系统(例如Windows操作系统、Linux操作系统等)但是执行专用功能的计算机设备,例如排号机、取/挂号机等。
图1是根据本申请实施例提供的一种可能的应用场景的示意图。如图1所示,系统100包括网络控制设备101、网络转发设备111、网络转发设备112、终端设备121、终端设备122、终端设备123、终端设备124、终端设备125、服务器131和服务器132。
本申请实施例中的终端设备(例如图1所示的终端设备121、终端设备122、终端设备123、终端设备124、终端设备125)可以是具有一种或多种特定功能的计算机设备(例如,ATM、电子回单柜、挂/取号机、X光片打印机、摄像头等),也可以是具有通用功能的计算机设备(例如手机、平板电脑、台式电脑、笔记本电脑等)。本申请实施例中所称的终端设备能够通过网络转发设备与服务器通信,读取保存在服务器中的数据和/或将 数据写入到服务器。
以图1所示的系统100为例,终端设备121可以通过网络转发设备111访问服务器131,读取保存在服务器131中的数据;终端设备124可以通过网络转发设备112访问服务器132,向服务器132写入数据。
网络转发设备(例如图1所示的网络转发设备111和网络转发设备112)可以是交换机/路由器。网络转发设备可以监控终端设备产生的流量。在一些实施例中,网络转发设备还可以提取监控到的流量的特征。
网络控制设备(例如,图1所示的网络控制设备101)可以是网络控制器、服务器或者计算机等。
在一些实施例中,网络控制设备可以基于终端类型判断规则判断终端设备的类型,并盘点网络中的终端设备。
在另一些实施例中,判断终端设备的类型的工作可以由网络转发设备实现,盘点终端设备的工作可以由网络控制设备实现。
图2是集中部署的方案的示意图。在集中部署的方案中,终端设备类型的判断以及资产盘点都是由网络控制设备实现。
如图2所示,网络控制设备200包括规则配置模块201、规则匹配模块202、资产信息提取模块203、资产盘点模块204和资产库模块205。
规则配置模块201获取终端类型判断规则,并保存该终端类型判断规则。
规则匹配模块202根据数据流量的镜像以及规则配置模块201保存的终端类型判断规则,判断终端设备的类型。
资产信息提取模块203提取终端设备的资产信息(例如互联网协议(internet protocol,IP)地址、端口号和/或媒体访问控制(media access control,MAC)地址等)。
资产盘点模块204根据匹配规则模块202的判断结果对资产信息提取模块203提取的资产信息进行整合(例如合并、去重等),然后将整合结果录入资产库模块205。
用户可以通过资产库模块205获取最终的资产盘点结果。
图3是分布式部署的方案的示意图。在分布式部署的方案中,终端设备类型的判断和资产信息的提取工作可以由网络转发设备实现。网络控制设备负责最终的资产盘点工作。
如图3所示,网络控制设备310包括规则配置模块311、资产盘点模块312和资产库模块313。网络转发设备320包括规则匹配模块321和资产信息提取模块322。
规则配置模块311获取终端类型判断规则,将获取到的终端类型判断规则发送至网络转发设备320。
规则匹配模块321获取来自于网络控制设备310的终端类型判断规则,根据数据流量以及该终端类型判断规则,判断终端设备的类型,并将判断结果上报至网络控制设备310。
资产信息提取模块322提取终端设备的资产信息(例如IP地址、端口号和/或MAC地址等)并将提取到的资产信息上报给网络控制设备310。
资产盘点模块312根据规则匹配模块321的判断结果对资产信息提取模块322提取的资产信息进行整合(例如合并、去重等),然后将整合结果录入资产库模块313。
用户可以通过资产库模块313获取最终的资产盘点结果。
如图2所示的网络控制设备200和如图3所示的网络控制设备310可以是如图1所示 的网络控制设备101。如图3所示的网络转发设备320可以是如图1所示的网络转发设备111或网络转发设备112。
如上所述,终端设备的类型是根据终端类型判断规则确定的。该终端类型判断规则可以根据历史数据流量训练得到。训练该终端类型判断规则的方法可以有两种,第一种方法是通过有监督学习;第二种方法是无监督学习。
图4是有监督学习确定该终端类型判断规则的示意性流程图。
401,获取终端分类信息。
该终端分类信息用于指示多个终端设备的类型和多个终端标识信息。该终端分类信息还可以指示多个类型和多个终端标识信息的对应关系。该多个类型和该多个终端标识信息是一一对应的。
例如,表1是一个终端分类信息的示意。
表1
类型 IP地址
A 192.101.1.1~192.1.1.10
B 192.101.1.11~192.1.1.20
C 192.101.1.21~192.1.1.30
如表1所示,类型为A的终端设备对应的IP地址范围是192.101.1.1~192.1.1.10;类型为B的终端设备对应的IP地址范围是192.101.1.11~192.1.1.20;类型为C的终端设备对应的IP地址范围是192.101.1.21~192.1.1.30。
可以理解的是,表1仅是一个终端分类信息的示意。例如,表1中使用IP地址作为终端标识信息的示例。在另一些实施例中,终端标识信息可以包括任意一种或多种能够区分不同终端设备的标识信息。例如,终端标识信息可以包括终端设备的IP地址、端口号或MAC地址等中的任一个或多个。
终端分类信息是预先采集得到的。例如,可以根据能够支持数据指纹和支持协议扫描的终端设备确定的。又如,可以是人工统计得到的。
402,根据该终端分类信息,获取历史数据流量。
在获取了终端分类信息后,可以根据终端分类信息中的终端标识信息对流量进行监控,提取包含有终端分类信息中的终端标识信息的数据流。。
还以表1所示的终端分类信息为例,可以提取源/目的IP地址在表1所示的IP地址范围内的所有数据流。
根据终端分类信息中的终端标识信息提取的数据流可以称为历史数据流。为了便于描述,可以假设总共获取了K条历史流,K的取值大于或等于终端设备类型的总数。换句话说,步骤402中的历史数据流量中包括K条历史流。
终端分类信息所指示的多个类型中的每个类型的终端设备在该K条历史流中有至少一条对应的历史流。换句话说,每个类型有至少一个终端设备的数据流被提取为作为历史数据流量。
还假设终端设备的类型总共包括A、B、C三种,K可以取大于或等于3的正整数。该K条历史流中至少一条历史流对应的终端设备的类型为类型A,至少一条历史流对应的终端设备的类型为类型B,至少一条历史流对应的终端设备的类型为类型C。换句话说, 该K条历史流中的至少一条历史流中的终端设备的类型为类型A,至少一条历史流中的至少一条历史流中的终端设备的类型为类型B,至少一条历史流中的终端设备的类型为类型C。
根据历史数据流量对应的终端设备的类型,该K条历史流可以分为多个参考流量,该多个参考流量与多个终端设备的类型一一对应。
还以A、B、C三种类型的终端设备为例,该K条历史流量包括参考流量A,参考流量B和参考流量C,其中参考流量A包括的至少一条对应于类型A的终端设备的历史流(即参考流量A里面的每条历史流中的终端设备的类型为类型A),参考流量B包括至少一条对应于类型B的终端设备的历史流(即参考流量B里面的条历史流中的终端设备的类型为类型B),参考流量C包括至少一条对应于类型C的终端设备的历史流(即参考流量C里的每条历史流中的终端设备的类型为类型C)。为了便于描述,参考流量中的历史流也可以称为参考流。
各个参考流量是从对应的候选流量中确定的。候选流量是根据该终端分类信息确定的。根据终端分类信息,可以确定多个候选流量,该多个候选流量与多个终端设备的类型一一对应。如上所述,历史数据流量中包括多个参考流量,该多个参考流量与多个终端设备的类型一一对应。因此该多个参考流量也与多个候选流量一一对应。每个参考流量是根据对应的候选流量确定的。
还以A、B、C三种类型的终端设备为例,总共可以确定出三个候选流量,分别可以称为候选流量A、候选流量B和候选流量C。候选流量A中包括多个候选流,该多个候选流中每条候选流的终端设备的类型为类型A。类似的,候选流量B也包括多个候选流,该多个候选流中每条候选流的终端设备的类型为类型B;候选流量C也包括多个候选流,该多个候选流中每条候选流的终端设备的类型为类型C。
如果有一个候选流的访问行为在包括该候选流的候选流量中出现的次数较多,那么这个候选流可以作为对应的参考流量中的一条参考流。
可选的,在一些实施例中,访问行为相同可以指源IP和目的IP相同。两条流的访问行为是否相同可以通过这两条流的上行报文或下行报文判断。如果两条流的上行报文的源IP地址相同且目的IP地址相同,那么可以认为这两条流具有相同的访问行为,反之则认为这两条流的访问行为不同。如果两条流的下行报文的源IP地址相同且目的IP地址相同,那么可以认为这两条流具有相同的访问行为,反之则认为这两条流的访问行为不同。
例如,假设IP 1至IP 3分别为3个终端设备的IP地址,IP A、IP B和IP C为三个服务器的IP地址。假设候选流1的上行报文的源IP地址为IP 1,候选流1的上行报文的目的IP地址为IP A;候选流2的上行报文的源IP地址为IP 1,候选流2的上行报文的目的IP地址为IP A;候选流3的上行报文的源IP地址为IP 2,候选流3的上行报文的目的IP地址为IP A,那么候选流1与候选流2具有相同的访问行为,候选流1与候选流3的访问行为不同。
可选的,在另一些实施例中,访问行为相同可以包括:源IP相同、目的IP相同、源端口相同、目的IP端口相同。两条流的访问行为是否相同可以通过这两条流的上行报文或下行报文判断。如果两条流的上行报文的源IP地址相同、源端口号相同、目的IP地址相同且目的端口号相同,那么可以认为这两条流具有相同的访问行为;如果两条流的上行 报文的源IP地址、源端口号、目的IP地址和目的端口号中的任一个不同,那么可以认为这两条流具有不同的访问行为。如果两条流的下行报文的源IP地址相同、源端口号相同、目的IP地址相同且目的端口号相同,那么可以认为这两条流具有相同的访问行为;如果两条流的下行报文的源IP地址、源端口号、目的IP地址和目的端口号中的任一个不同,那么可以认为这两条流具有不同的访问行为。
可选的,在另一些实施例中,如果两条流的方向相同的报文(上行或下行)的五元组完全相同,则认为这两条流的访问行为相同。
在一些实施例中,可以选择候选流量中访问行为相同的候选流数量排名靠前的T个候选流作为与该候选流量对应的参考流量中的参考流,T是预设的正整数。
例如,假设候选流量A包括访问行为1至访问行为5,共五种访问行为的候选流,其中具有访问行为1的候选流共100个,具有访问行为2的候选流共120个,具有访问行为3的候选流共80个,具有访问行为4的候选流共20个,具有访问行为5的候选流共5个。
T可以是一个预设值,假设T的取值为3。假设T的值为3,那么可以选择具有访问行为1的候选流,具有访问行为2的候选流和具有访问行为3的候选流作为参考流量中的参考流。
T也可以根据一个预设比例计算出来,被选中作为历史数据流量的候选流的数目与一个候选流量中的总候选流数目的比例是一个预设值。那么可以根据该预设值以及候选流量包含的总候选流数目确定T的取值。例如,假设候选流量中包括的总候选流的数目为T_all,预设比例为P T%,那么N CAND=ceil(T_all×P T%),其中ceil(T_all×P T%)表示对T_all×P T%的结果进行取整操作,取整的方式可以是向上取整、向下取整或者按照四舍五入的方式取整,本申请实施例对此并不限定。
从候选流量中选择历史数据流量的方式也可以是根据候选流量包括的总流数和一个预设比例确定的。例如,可以选择候选流量A中相同访问行为的流大于总流数25%的候选流。假设候选流量A中具有访问行为1的候选流共100个,具有访问行为2的候选流共120个,具有访问行为3的候选流共80个,具有访问行为4的候选流共20个,具有访问行为5的候选流共5个,那么可以确定具有访问行为1的候选流占总候选流的比例为30.8%,具有访问行为2的候选流占总候选流的比例为36.9%,具有访问行为3的候选流占总候选流的比例为24.6%,具有访问行为4的候选流占总候选流的比例为6.1%,具有访问行为5的候选流占总候选流的比例为1.5,那么可以确定具有访问行为1的候选流和访问行为2的候选流作为参考流量A中的参考流。
403,根据历史数据流量,确定终端类型判断规则。
该终端类型判断规则可以包括多个子规则,该多个子规则与该多个终端设备的类型一一对应。如上所述,历史数据流量包括多个参考流量,该多个参考流量与多个终端设备的类型一一对应。因此,该多个子规则也与多组历史数据流量一一对应。每个子规则可以根据对应的一参考流量以及除对应的一组参考流量以外的历史数据流量确定。
还以A、B、C三种类型的终端为例,该终端类型判断规则可以包括子规则A,子规则B和子规则C,其中,子规则A对应于类型A的终端设备,子规则B对应于类型B的终端设备,子规则C对应于类型C的终端设备。
子规则A可以根据参考流量A和除参考流量A以外的历史数据流量确定。
子规则B可以根据参考流量B和除参考流量B以外的历史数据流量确定。
子规则C可以根据参考流量C和除参考流量C以外的历史数据流量确定。
下面以子规则A为例,对如何确定子规则进行介绍。
可以根据参考流量A获取类型A的终端设备的访问行为,根据除参考流量A以外的历史数据流量确定其他类型的终端设备的访问行为,然后采用集合差的方式确定出子规则A。
终端设备的访问行为可以包括终端设备访问的服务器的标识信息等。服务器的标识信息可以包括服务器的IP地址、端口号和MAC地址中的任意一个或多个。
通过提取上行报文可以确定终端设备访问的服务器,进而获取服务器的标识信息。根据服务器的标识信息对访问行为进行归纳,得到各个子规则。
在一些实施例中,不同类型的终端设备访问的服务器是不一样的。因此,可以根据服务器的IP地址作为判断终端设备类型的依据。例如,具有存取款功能的ATM可以访问负载存款的服务器(以下简称存款服务器)以及负责取款功能的服务器(以下简称取款服务器);而只有取款功能的ATM只能访问取款服务器;电子回单柜只访问提供回单业务的服务器(以下简称回单服务器),而不能访问存款服务器或者取款服务器。不同的服务器的标识信息不同。这样,可以根据服务器的标识信息来区分不同类型的终端设备。例如具有存取款功能的ATM为类型A的终端设备,只有取款功能的ATM为类型B的终端设备,电子回单柜为类型C的终端设备。在此情况下,不同类型的终端设备访问的服务器的IP地址是不同的。根据历史数据流量可以发现,参考流量A访问的IP地址为IP W和IP D;参考流量B访问的IP地址为IP W;参考流量C访问的IP地址为IP R,其中IP W表示取款服务器的IP地址,IP D表示存款服务器的IP地址,IP R表示回单服务器的IP地址。
根据参考流量A、参考流量B和参考流量C,可以确定出以下子规则:
子规则A:IP W,IP D
子规则B:IP W
子规则C:IP R
可以使用判断矩阵表示该终端类型判断规则,该判断矩阵可以表示为:
Figure PCTCN2021141759-appb-000001
M表示该判断矩阵。如上所述,判断矩阵M共包括三行元素,其中三行元素与三个子规则一一对应,每行元素中第一个元素对应于IP W,第二个元素对应于IP D,第三个元素对应于IP R。如果一个元素的值为1,则表示访问行为包括访问对应的服务器;如果一个元素的值为0,则表示访问行为不包括访问对应的服务器。
如上所述,子规则A为IP W和IP D,因此判断矩阵M中对应于子规则A的一行元素(即第一行元素)的值依次为1,1,0。
在另一些实施例中,不同类型的终端设备访问的服务器可能是相同的,但是不同功能访问服务器使用的端口号是不同的。在此情况下,可以根据服务器的IP地址和端口号作为判断终端设备类型的依据。例如,在医院场景中存在三种终端设备:挂号/取号机、挂号机、诊断结果打印机。服务器A可以同时提供挂号和取号功能,其中挂号功能通过端口 A实现,取号功能通过端口B实现。服务器B提供诊断结果功能。假设参考流量A的访问行为包括两种,访问行为1为:IP A:Port A,访问行为2为:IP A:Port B;参考流量B的访问行为为:IP A:Port A;参考流量C的访问行为为:IP B,其中IP A表示服务器A的IP地址,IP B表示服务器B的IP地址,Port A表示端口A的端口号,Port B表示端口B的端口号。综合上述四个访问行为可以发现参考流量A的访问行与参考流量B的访问行为差集为IP A:Port B;参考流量A与参考流量C的访问行为差集为IP A:Port A,IP A:Port B和IP B;参考流量B与参考流量C的访问行为差集也为IP A:Port A,IP A:Port B和IP B。这样,可以确定出三个子规则:
子规则A:IP A:Port A,IP A:Port B;
子规则B:IP A:Port A;
子规则C:IP B。
如果使用判断矩阵表示该终端类型判断规则,那么该判断矩阵可以表示为
Figure PCTCN2021141759-appb-000002
M表示该判断矩阵。如上所述,判断矩阵M共包括三行元素,其中三行元素与三个子规则一一对应,每行元素中第一个元素对应于IP A:Port A,第二个元素对应于IP A:Port B,第三个元素对应于IP B。如果一个元素的值为1,则表示访问行为包括访问对应的服务器;如果一个元素的值为0,则表示访问行为不包括访问对应的服务器。
如上所述,子规则A为IP A:Port A,IP A:Port B,因此判断矩阵M中对应于子规则A的一行元素(即第一行元素)的值依次为1,1,0。
图5是无监督学习确定该终端类型判断规则的示意性流程图。
501,采集网络中的流量,得到历史数据流量。
可选的,在一些实施例中,历史数据流量包括的历史流可以分为多个参考流量,该多个参考流量与多个IP地址一一对应。
假设有三个IP地址,分别为IP 1,IP 2和IP 3。那么历史数据流量可以包括参考流量1,参考流量2和参考流量3,其中参考流量1包括的至少一条对应的IP地址为IP 1的历史流(即参考流量1里面的每条历史流中的报文的发送端或接收端的IP地址为IP 1),参考流量2包括至少一条对应的IP地址为IP 2的历史流(即参考流量2里面的条历史流中的报文的发送端或接收端的IP地址为IP 2),参考流量3包括至少一条对应的IP地址为IP 3的历史流(即参考流量3里面的条历史流中的报文的发送端或接收端的IP地址为IP 3)。为了便于描述,参考流量中的历史流也可以称为参考流。
各个参考流量是从对应的候选流量中确定的。采集到的流量可以分为多个候选流量,该多个候选流量与多个IP地址一一对应,每个候选流量包括多个候选流。属于同一个候选流量的候选流的发送端或接收端的IP地址是与该候选流量对应的IP地址。
例如,总共采集到100条流量,流量1至流量20的发送端IP地址为IP1;流量21至流量40的发送端IP地址为IP2,流量41至流量100的发送端IP地址为IP3,其中,IP1,IP2和IP3表示三个不同的IP地址。那么这100条流量可以分为三个候选流量,候选流量1包括流量1至流量20,候选流量2包括流量21至流量40,候选流量3包括流量41至 流量100。
如果有一个候选流的访问行为在包括该候选流的候选流量中出现的次数较多,那么这个候选流可以作为对应的参考流量中的一条参考流。
可选的,在一些实施例中,访问行为相同可以指源IP和目的IP相同。两条流的访问行为是否相同可以通过这两条流的上行报文或下行报文判断。如果两条流的上行报文的源IP地址相同且目的IP地址相同,那么可以认为这两条流具有相同的访问行为,反之则认为这两条流的访问行为不同。如果两条流的下行报文的源IP地址相同且目的IP地址相同,那么可以认为这两条流具有相同的访问行为,反之则认为这两条流的访问行为不同。
例如,假设IP 1至IP 3分别为3个终端设备的IP地址,IP A、IP B和IP C为三个服务器的IP地址。假设候选流1的上行报文的源IP地址为IP 1,候选流1的上行报文的目的IP地址为IP A;候选流2的上行报文的源IP地址为IP 1,候选流2的上行报文的目的IP地址为IP A;候选流3的上行报文的源IP地址为IP 2,候选流3的上行报文的目的IP地址为IP A,那么候选流1与候选流2具有相同的访问行为,候选流1与候选流3的访问行为不同。
可选的,在另一些实施例中,访问行为相同可以包括:源IP相同、目的IP相同、源端口相同、目的IP端口相同。两条流的访问行为是否相同可以通过这两条流的上行报文或下行报文判断。如果两条流的上行报文的源IP地址相同、源端口号相同、目的IP地址相同且目的端口号相同,那么可以认为这两条流具有相同的访问行为;如果两条流的上行报文的源IP地址、源端口号、目的IP地址和目的端口号中的任一个不同,那么可以认为这两条流具有不同的访问行为。如果两条流的下行报文的源IP地址相同、源端口号相同、目的IP地址相同且目的端口号相同,那么可以认为这两条流具有相同的访问行为;如果两条流的下行报文的源IP地址、源端口号、目的IP地址和目的端口号中的任一个不同,那么可以认为这两条流具有不同的访问行为。
可选的,在另一些实施例中,如果两条流的方向相同的报文(上行或下行)的五元组完全相同,则认为这两条流的访问行为相同。
在一些实施例中,可以选择候选流量中访问行为相同的候选流数量排名靠前的T个候选流作为与该候选流量对应的参考流量中的参考流,T是预设的正整数。
例如,假设候选流量A包括访问行为1至访问行为5,共五种访问行为的候选流,其中具有访问行为1的候选流共100个,具有访问行为2的候选流共120个,具有访问行为3的候选流共80个,具有访问行为4的候选流共20个,具有访问行为5的候选流共5个。
T可以是一个预设值,假设T的取值为3。假设T的值为3,那么可以选择具有访问行为1的候选流,具有访问行为2的候选流和具有访问行为3的候选流作为参考流量中的参考流。
T也可以根据一个预设比例计算出来,被选中作为历史数据流量的候选流的数目与一个候选流量中的总候选流数目的比例是一个预设值。那么可以根据该预设值以及候选流量包含的总候选流数目确定T的取值。例如,假设候选流量中包括的总候选流的数目为T_all,预设比例为P T%,那么N CAND=ceil(T_all×P T%),其中ceil(T_all×P T%)表示对T_all×P T%的结果进行取整操作,取整的方式可以是向上取整、向下取整或者按照四舍五入的方式取整,本申请实施例对此并不限定。
从候选流量中选择历史数据流量的方式也可以是根据候选流量包括的总流数和一个预设比例确定的。例如,可以选择候选流量A中相同访问行为的流大于总流数25%的候选流。假设候选流量A中具有访问行为1的候选流共100个,具有访问行为2的候选流共120个,具有访问行为3的候选流共80个,具有访问行为4的候选流共20个,具有访问行为5的候选流共5个,那么可以确定具有访问行为1的候选流占总候选流的比例为30.8%,具有访问行为2的候选流占总候选流的比例为36.9%,具有访问行为3的候选流占总候选流的比例为24.6%,具有访问行为4的候选流占总候选流的比例为6.1%,具有访问行为5的候选流占总候选流的比例为1.5,那么可以确定具有访问行为1的候选流和访问行为2的候选流作为参考流量A中的参考流。
502,确定历史数据流量的终端设备的标识信息和服务器的标识信息。
换句话说,步骤502的目的是确定历史数据流量中的每个历史流里面的标识信息的身份,即IP地址、端口号或者MAC地址等是终端设备的还是服务器的。
可以先确定历史数据流量中的终端设备的标识信息,然后可以确定数据流中的另一个标识信息是服务器的。
终端设备的标识信息可以通过以下三种方式确定:
方式1,在步骤501采集的网络中的流量是从网络转发设备或者终端设备的上行端口采集的上行流量。在此情况下,可以确定上行流量的发送端为终端设备,接收端为服务器。
方式2,可以统计每个IP地址主动建立连接的比例。通常情况下,终端设备的IP地址主动建立连接的次数会大于服务器主动建立连接的次数。如果一个IP地址主动建立连接的比例大于一个预设的比例阈值,那么可以判断该IP地址是一个终端设备的IP地址。IP地址主动建立连接的比例可以通过统计同步(synchronize sequence number,SYN)报文的发送和接收来判断。如果一个IP地址发送了SYN报文,那么该IP地址就是主动建立连接的IP地址。如果一个IP地址发送SYN报文的数量与该IP地址接收SYN报文的数量之比大于该预设的比例阈值,那么可以确定该IP地址是终端设备的IP地址。在确定了IP地址的身份后,可以继续确定端口号和/或MAC地址的身份。
例如,IP 1向IP X发送了9个SYN报文,IP X向IP 1发送了1个SYN报文。那么IP 1发送SYN报文的比例为90%。假设预设比例阈值为80%,那么可以确定IP 1是终端设备的IP地址。相应的,IP X是服务器的IP地址。
方式3,统计每条数据流的源IP地址和目的IP地址,根据统计结果确定。通常情况下,一个终端设备访问的服务器数量要小于访问一个服务器的终端设备数量。例如,通常情况下,支持存取款功能的ATM可能会访问两个服务器(存款服务器和取款服务器),而仅支持取款功能的ATM可能只会访问取款服务器,而访问取款服务器的ATM可能会有成千上万个。因此,可以预设一个IP地址数目阈值。统计一个IP地址在历史数据流中作为源IP地址时对应的不同的目的IP地址数目,如果该IP地址对应的不同的目的IP地址数目小于该预设IP地址数目阈值,那么该IP地址是终端设备的IP地址;如果该IP地址对应的不同的IP地址数目大于或等于该预设IP地址数目阈值,那么该IP地址是服务器的IP地址。
503,确定每个终端设备对应的服务器集合。
在确定了标识信息的身份后,可以确定每个终端设备对应的服务器集合。
例如,对应于终端设备1三条历史流对应的服务器分别为服务器1,服务器2和服务器3,那么与终端设备1对应的服务器集合包括:服务器1,服务器2和服务器3。对应于终端设备2的两条历史流对应的服务器分别为服务器3和服务器4,那么与终端设备2对应的服务器集合包括服务器3和服务器4。
504,根据多个服务器集合,对多个终端设备进行聚类,得到聚类结果。
该多个服务器集合与该多个终端设备一一对应。例如,假设总共有三个服务器集合,分别为服务器集合1、服务器集合2和服务器集合3,服务器集合1是终端设备1对应的服务器集合,服务器集合2是终端设备2对应的服务器集合,服务器集合3是终端设备3对应的服务器集合。在此情况下,可以根据服务器集合1至服务器集合3对终端设备1至终端设备3进行聚类,得到聚类结果。
本申请实施例采用的聚类算法可以是谱聚类算法。
假设终端设备1至终端设备3的访问矩阵如表2所示。
表2
  服务器1 服务器2 服务器3 服务器4
终端设备1 1 1 1 0
终端设备2 1 1 1 0
终端设备3 0 1 1 1
如表2所示的访问矩阵的三行分别对应于终端设备1至终端设备3。终端设备1至终端设备3中每个终端设备对应的服务器集合包含的元素在矩阵中对应的元素的值为1,否则为0。例如,终端设备1对应的服务器集合包括服务器1、服务器2和服务器3。因此表2中的第一行元素中与服务器1、服务器2和服务器3对应的元素的值为1,与服务器4和服务器5对应的元素的值为0。
基于如表2所示的访问矩阵,可以计算相似度矩阵。可以通过计算向量角来计算两个终端设备之间的相似度。假设用IP1表示终端设备1,用IP2表示终端设备2,用IP3表示终端设备3,那么根据如表2所示的访问矩阵可以得到:IP1=(1,1,1,0);IP2=(1,1,1,0);IP3=(0,1,1,1)。IP1和IP2之间的向量角可以根据以下公式确定:
Figure PCTCN2021141759-appb-000003
其中cosθ为IP1和IP2之间的向量角(即终端设备1和终端设备2的相似度),|IP|表示向量的模。
根据访问矩阵和公式1,可以得到如表3所示的相似度矩阵。
表3
  IP1 IP2 IP3
IP1 1 1 2/3
IP2 1 1 2/3
IP3 2/3 2/3 1
如表3所示的第一行元素分别为IP1和IP1的相似度,IP1和IP2的相似度,IP1和IP3的相似度,第二行元素分别为IP2和IP1的相似度,IP2和IP2的相似度,IP2和IP3的相似度,第三行元素分别为IP3和IP1的相似度、IP3和IP2的相似度,IP3和IP3的相似度。
通过相似度矩阵,可以计算度矩阵,即相似度矩阵每行求和获得度矩阵,然后根据度矩阵和相似度矩阵,确定拉普拉斯矩阵。拉普拉斯矩阵可以通过以下公式确定:
L=D-S,(公式2)
其中L表示拉普拉斯矩阵,D表示度矩阵,S表示相似度矩阵。
在得到拉普拉斯矩阵后,可以根据以下公式对拉普拉斯矩阵进行标准化:
L_normal=D (-1/2)×L×D (-1/2),(公式3)
其中L_normal表示标准化后的拉普拉斯矩阵,D表示度矩阵,L表示拉普拉斯矩阵。
在得到标准化后的拉普拉斯矩阵后,可以取标准化后的拉普拉斯矩阵的k个最小特征值,获得对应的n×k维的特征向量矩阵,用K-means算法,看做是n个样本(即n个终端设备),每个样本k维,聚成m个簇(C1,C2,…Cm),即将相似的终端设备聚类在一起。除了K-means算法外,还可以利用其它的聚类算法(例如DBSCAN等)对终端设备进行聚类。
可以认为每个终端设备是图中的顶点,相似度矩阵是每个顶点之间的邻接矩阵,用图的思想,发现连通的顶点,即相似的簇。
聚类结果可以包括多个簇,每个簇中包括该多个终端设备中的一个或多个终端设备,且该多个簇中的任意两个簇之间不存在交集。还以上述终端设备1至终端设备3为例,聚类后的聚类结果可以包括三个簇,分别称为簇A、簇B和簇C,其中簇A包括终端设备1,簇B包括终端设备3,簇C包括终端设备2。
505,根据聚类结果,确定终端类型判断规则。
该多个簇中的每个簇对应一个终端设备的类型。
在一些实施例中,在得到聚类结果后,可以人工判断每个簇对应的终端设备类型。在另一些实施例中,每个簇中可能会包括一个或多个可以支持数据指纹和支持协议扫描的终端设备。在此情况下,每个簇对应的终端设备的类型可以根据这些支持数据指纹和支持协议扫描的终端设备确定。还以簇A、簇B和簇C为例,簇A对应的终端设备的类型为类型A、簇B对应的终端设备的类型为类型B、簇C对应的终端设备的类型为类型C。
在确定了每个簇对应的终端设备类型后,可以根据每个簇的终端设备的访问行为确定出该终端类型判断规则。
该终端类型判断规则可以包括多个子规则,该多个子规则与该多个终端设备的类型一一对应。如上所述,历史数据流量中包含的多个终端设备被聚类为多个簇,该多个簇与多个终端设备的类型一一对应。因此,该多个子规则也与多个簇一一对应。每个子规则可以根据对应的一个簇以及除对应的一个簇以外的簇确定。
还以A、B、C三种类型的终端为例,该终端类型判断规则可以包括子规则A,子规则B和子规则C,其中,子规则A对应于类型A的终端设备,子规则B对应于类型B的终端设备,子规则C对应于类型C的终端设备。
下面以子规则A为例,对如何确定子规则进行介绍。
根据簇A的终端设备的访问行为和除簇A以外的其他簇(即簇B和簇C)的终端设备的访问行为采用集合差的方式确定出子规则A。终端类型判断规则的具体确定方式与基于监督学习的方法中终端类型判断规则的确定方式相似,为了简洁在此就不再赘述。
在一些实施例中,图4所示的有监督学习过程和图5所示的无监督学习过程可以由网 络控制设备或者案例控制设备中的部件(例如芯片或电路等)实现。在此情况下,网络控制设备中还可以包括规则学习模块。
也可以由其他的一个或多个计算机设备实现。例如,在采集到的历史数据流量后,可以使用计算机设备(例如服务器、工作站)或者能够提供有监督学习/无监督学习的云服务来确定终端类型判断规则。然后将确定好的终端类型判断规则发送给网络控制设备。
利用确定好的终端类型判断规则,可以确定网络中的每个终端设备的类型。例如,假设终端类型判断规则是如表4所示的判断矩阵。
表4
  服务器1 服务器2 服务器3 服务器4
类型A 1 1 0 0
类型B 1 1 1 0
类型C 0 0 0 1
如果一个终端设备出现访问了服务器1、服务器2和服务器3的行为,那么可以生成参考矩阵y=[1,1,1,0],利用矩阵乘法Y×y'得到[2,3,0]',其中Y表示判断矩阵,y'表示参考矩阵y的转置矩阵。取[2,3,0]中取值最大的位置表示该设备类型。这里3最大,所在位置是2,即第2种设备类型,即类型B。
可选的,统计一个终端设备的访问行为可以是在一个观察周期内统计的。观察周期可以根据需要进行设定,例如可以以小时为粒度(例如12小时、24小时)也可以以天或者星期为粒度。类似的,在确定终端类型判断规则时确定的终端设备的访问行为也可以是在观察周期内统计的。
表4所示的判断矩阵中的每个元素表示某一类型的终端设备是否访问了服务器。在另一些实施例中,判断矩阵中的元素还可以表示某一终端设备访问服务器的概率。例如,将一个统计周期划分为多个时间窗口,判断矩阵中的每个元素表示某一类型的终端设备在该多个时间窗口中出现的概率。例如,统计周期为一周,每个时间窗口是30分钟,那么整个统计周期内共有336个时间窗口。如果类型A的终端设备在336个时间窗口中都有访问服务器1,那么对应于类型A和服务器1的元素的值为1;如果类型A的终端设备仅在168个时间窗口中访问了服务器2,那么对应于类型A和服务器2的元素的值为0.5。假设表5是根据访问概率确定的判断矩阵。
表5
  服务器1 服务器2 服务器3 服务器4
类型A 1 0.5 0 0
类型B 1 0.8 0.8 0
类型C 0 0 0 1
如果一个终端设备在一个时间窗口内出现访问了服务器1、服务器2和服务器3的行为,那么可以生成参考矩阵y=[1,1,1,0],利用矩阵乘法Y×y'得到[1.5,2.6,0]',其中Y表示判断矩阵,y'表示参考矩阵y的转置矩阵。取[1.5,2.6,0]中取值最大的位置表示该设备类型。这里2.6最大,所在位置是2,即第2种设备类型,即类型B。
在一些情况下,网络中可能只有部分终端设备可以根据终端类型判断规则判断其类 型。换句话说,可能出现部分终端设备无法根据终端类型判断规则判断其类型的情况。对于这些无法利用终端类型判断规则判断类型的终端设备,可以采用无监督学习的方式,将这些终端设备进行聚类,得到多个簇。该多个簇与多个终端类型一一对应。然后可以通过人工或者利用部分支持数据指纹和支持协议扫描的终端设备来确定每个簇对应的终端类型。
图6是根据本申请实施例一种判断终端设备类型的方法的示意性流程图。图6所示的方法可以由网络转发设备或者网络控制设备执行。
601,获取第一数据流量,该第一数据流量的发送端为第一终端设备。
602,根据该第一数据流量中的报文的接收端的标识信息,确定该第一终端设备的访问行为。
603,根据终端类型判断规则和该第一终端设备的访问行为,确定该第一终端设备的类型,其中,该终端类型判断规则用指示终端设备的访问行为与终端设备的类型的对应关系,该终端类型判断规则是根据历史数据流量训练得到的。
该第一数据流量可以包括第一时间段内统计到的数据流。第一数据流量包括的至少一个数据流中的每个数据流一个或多个上行报文,该一个或多个上行报文的发送端是第一终端设备。
该历史数据流量是第二时间段内统计得到的数据流,其中第二时间段的结束时刻早于第一时间段的起始时刻。换句话说,该历史数据流量是在获取到第一数据流量之前获取到的数据流量。
该历史数据流量的发送端包括多个类型的终端设备,该第一终端设备的类型为该多个类型中的一个。该历史数据流包括多条历史流,该多条历史流中的每条历史流包括一个或多个上行报文。该多条历史流中的上行报文的发送端包括多个终端设备。该多个类型中的每个类型的终端设备都有至少一条对应的历史流。
在一些实施例中,该终端类型判断规则是根据该历史数据流量和终端分类信息训练得到的,其中,该终端分类信息用于指示该多个类型和多组终端标识信息,该多组终端标识信息中的每组终端标识信息包括至少一个终端的标识信息,该终端分类信息还用于指示该多个类型和多组终端标识信息的对应关系,该多个类型和多组终端标识信息一一对应,该多个终端标识信息中的每个终端标识信息包括至少一个终端设备的标识信息,该历史数据流量是根据该终端分类信息确定的。
标识信息可以包括IP地址、端口号或MAC地址中的任一个或多个。终端标识信息可以包括终端设备的IP地址、终端设备的端口号或终端设备的MAC地址中的一个或多个。如果是上行报文,那么终端标识信息就是源IP地址、源端口号或源MAC地址中的一个或多个。如果是下行报文,那么终端标识信息就是目的IP地址、目的端口号或目的MAC地址中的一个或多个。
在一些实施例中,该历史数据流量包括多个参考流量,该多个参考流量与该多个类型一一对应,该多个参考流量包括第一参考流量,该第一参考流量对应的类型为该第一终端设备的类型;该终端类型判断规则包括多条子规则,该多条子规则与该多个类型一一对应,该多条子规则中对应于该第一终端设备的类型的子规则是根据该第一参考流量和该多个参考流量中除该第一参考流量以外的参考流量确定的。
在一些实施例中,该第一参考流量是根据第一候选流量确定的,该第一候选流量是多个候选流量中与该第一终端设备的类型对应的流量,该第一参考流量中的每个数据流对应的访问行为在该第一候选流量中出现的次数大于不属于该第一参考流量的数据流对应的访问行为在该第一候选流量中出现的次数。
在一些实施例中,该终端类型判断规则是根据P个服务器集合对P个终端设备进行聚类得到的聚类结果确定的,该P个终端设备是根据该历史数据流量确定的,该P个终端设备与该P个服务器集合一一对应,该P个服务器集合中的每个服务器集合是对应的终端设备访问的服务器的集合,该P个终端设备包括该多个类型的终端设备,P为大于或等于终端设备的总类型数目的正整数。
在一些实施例中,该历史数据流量是该P个终端设备的上行数据流,该P个终端设备是该历史数据流量的发送端。
在一些实施例中,该P个终端设备中的每个终端设备在该历史数据流量中作为同步报文的发送端的次数与作为同步报文的接收端的次数之比大于第二预设比例。
在一些实施例中,该历史数据流量包括P个参考流量,该多个参考流量与该P个终端设备一一对应,该P个参考流量与P个候选流量一一对应,第二参考流量包括的每个数据流对应的访问行为在对应的第二候选流量中出现的次数大于不属于该第二参考流量的数据流对应的访问行为在该第二候选流量中出现的次数,该第二参考流量为该P个参考流量中的任一个参考流量。
在一些实施例中,该终端类型判断规则为判断矩阵,该判断矩阵包括多行元素,该多行元素与该多个类型一一对应;该根据终端类型判断规则和该第一终端设备的访问行为,确定该第一终端设备的类型,包括:根据该第一终端的访问行为,从该判断矩阵中确定与该第一终端设备的访问行为匹配的目标行;确定该第一终端设备的类型为该目标行对应的类型。
在一些实施例中,根据该第一终端的访问行为,从该判断矩阵中确定与该第一终端设备的访问行为对应的目标行,包括:根据该第一终端的访问行为,确定参考矩阵(例如上述实施例中的参考矩阵y),其中该参考矩阵包括的多个元素的值与该第一终端设备的访问行为相匹配;将该判断矩阵与该参考矩阵相乘,得到目标矩阵,该目标矩阵包括的多个元素与该判断规则的多行元素一一对应;确定该目标矩阵中值最大的元素对应的一行元素为该目标行。
图7是根据本申请实施例提供的一种计算机设备的结构框图。如图7所示的计算机设备700可以是上述实施例中的网络控制设备或者网络转发设备。如图7所示的计算机设备700包括获取单元701和处理单元702。
获取单元701,用于获取第一数据流量,该第一数据流量的发送端为第一终端设备。
处理单元702,用于根据该第一数据流量中的报文的接收端的标识信息,确定该第一终端设备的访问行为。
处理单元702,还用于根据终端类型判断规则和该第一终端设备的访问行为,确定该第一终端设备的类型,其中,该终端类型判断规则用指示终端设备的访问行为与终端设备的类型的对应关系,该终端类型判断规则是根据历史数据流量训练得到的。
获取单元701可以由收发电路实现,处理单元702可以由处理器实现。获取单元701 和处理单元702的具体功能和有益效果,可以参见上述实施例,为了简洁,在此就不再赘述。
应理解,图7仅为示例而非限定,上述包括获取单元和处理单元的计算机设备可以不依赖于图7所示的结构。
当该计算机设备700为芯片时,该芯片包括获取单元和处理单元。其中,获取单元可以是输入输出电路、通信接口;处理单元为该芯片上集成的处理器或者微处理器或者集成电路。
本申请实施例还提供了一种计算机设备,包括处理器和存储器。处理器用于与存储器耦合,读取并执行存储器中的指令和/或程序代码,以执行上述方法实施例中网络控制设备执行的步骤。
本申请实施例还提供了一种计算机设备,包括处理器和存储器。处理器用于与存储器耦合,读取并执行存储器中的指令和/或程序代码,以执行上述方法实施例中终端类型判断规则的学习步骤。
本申请实施例还提供了一种计算机设备,包括处理器和存储器。处理器用于与存储器耦合,读取并执行存储器中的指令和/或程序代码,以执行上述方法实施例中网络转发设备执行的步骤。
应理解,上述处理器可以是一个芯片。例如,该处理器可以是现场可编程门阵列(field programmable gate array,FPGA),可以是专用集成芯片(application specific integrated circuit,ASIC),还可以是系统芯片(system on chip,SoC),还可以是中央处理器(central processor unit,CPU),还可以是网络处理器(network processor,NP),还可以是数字信号处理电路(digital signal processor,DSP),还可以是微控制器(micro controller unit,MCU),还可以是可编程控制器(programmable logic device,PLD)、其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,或其他集成芯片。
在实现过程中,上述方法的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。为避免重复,这里不再详细描述。
应注意,本申请实施例中的处理器可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
可以理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only  memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
根据本申请实施例提供的方法,本申请还提供一种计算机程序产品,该计算机程序产品包括:计算机程序代码,当该计算机程序代码在计算机上运行时,使得该计算机执行上述实施例中网络控制设备执行的各个步骤。
根据本申请实施例提供的方法,本申请还提供一种计算机程序产品,该计算机程序产品包括:计算机程序代码,当该计算机程序代码在计算机上运行时,使得该计算机执行上述实施例中终端类型判断规则学习各个步骤。
根据本申请实施例提供的方法,本申请还提供一种计算机程序产品,该计算机程序产品包括:计算机程序代码,当该计算机程序代码在计算机上运行时,使得该计算机执行上述实施例中网络转发设备执行的各个步骤。
根据本申请实施例提供的方法,本申请还提供一种计算机可读介质,该计算机可读介质存储有程序代码,当该程序代码在计算机上运行时,使得该计算机执行上述实施例中网络控制设备执行的各个步骤。
根据本申请实施例提供的方法,本申请还提供一种计算机可读介质,该计算机可读介质存储有程序代码,当该程序代码在计算机上运行时,使得该计算机执行上述实施例中终端类型判断规则学习的各个步骤。
根据本申请实施例提供的方法,本申请还提供一种计算机可读介质,该计算机可读介质存储有程序代码,当该程序代码在计算机上运行时,使得该计算机执行上述实施例中网络转发设备执行的各个步骤。
根据本申请实施例提供的方法,本申请还提供一种系统,其包括前述的网络转发设备和网络控制设备。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的 划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (25)

  1. 一种判断终端设备类型的方法,其特征在于,包括:
    获取第一数据流量,所述第一数据流量的发送端为第一终端设备;
    根据所述第一数据流量中的报文的接收端的标识信息,确定所述第一终端设备的访问行为;
    根据终端类型判断规则和所述第一终端设备的访问行为,确定所述第一终端设备的类型,其中,所述终端类型判断规则用指示终端设备的访问行为与终端设备的类型的对应关系,所述终端类型判断规则是根据历史数据流量训练得到的。
  2. 如权利要求1所述的方法,其特征在于,所述历史数据流量的发送端包括多个类型的终端设备,所述第一终端设备的类型为所述多个类型中的一个。
  3. 如权利要求2所述的方法,其特征在于,所述终端类型判断规则是根据所述历史数据流量和终端分类信息训练得到的,其中,
    所述终端分类信息用于指示所述多个类型和多组终端标识信息,所述多组终端标识信息中的每组终端标识信息包括至少一个终端的标识信息,
    所述终端分类信息还用于指示所述多个类型和多组终端标识信息的对应关系,所述多个类型和多组终端标识信息一一对应,
    所述多个终端标识信息中的每个终端标识信息包括至少一个终端设备的标识信息,
    所述历史数据流量是根据所述终端分类信息确定的。
  4. 如权利要求3所述的方法,其特征在于,所述历史数据流量包括多个参考流量,所述多个参考流量与所述多个类型一一对应,所述多个参考流量包括第一参考流量,所述第一参考流量对应的类型为所述第一终端设备的类型;
    所述终端类型判断规则包括多条子规则,所述多条子规则与所述多个类型一一对应,所述多条子规则中对应于所述第一终端设备的类型的子规则是根据所述第一参考流量和所述多个参考流量中除所述第一参考流量以外的参考流量确定的。
  5. 如权利要求4所述的方法,其特征在于,所述第一参考流量是根据第一候选流量确定的,所述第一候选流量是多个候选流量中与所述第一终端设备的类型对应的流量,所述第一参考流量中的每个数据流对应的访问行为在所述第一候选流量中出现的次数大于不属于所述第一参考流量的数据流对应的访问行为在所述第一候选流量中出现的次数。
  6. 如权利要求2所述的方法,其特征在于,所述终端类型判断规则是根据P个服务器集合对P个终端设备进行聚类得到的聚类结果确定的,所述P个终端设备是根据所述历史数据流量确定的,所述P个终端设备与所述P个服务器集合一一对应,所述P个服务器集合中的每个服务器集合是对应的终端设备访问的服务器的集合,所述P个终端设备包括所述多个类型的终端设备,P为大于或等于终端设备的总类型数目的正整数。
  7. 如权利要求6所述的方法,其特征在于,所述历史数据流量是所述P个终端设备的上行数据流,所述P个终端设备是所述历史数据流量的发送端。
  8. 如权利要求6所述的方法,其特征在于,所述P个终端设备中的每个终端设备在所述历史数据流量中作为同步报文的发送端的次数与作为同步报文的接收端的次数之比 大于第二预设比例。
  9. 如权利要求6至8中任一项所述的方法,其特征在于,所述历史数据流量包括P个参考流量,所述多个参考流量与所述P个终端设备一一对应,所述P个参考流量与P个候选流量一一对应,第二参考流量包括的每个数据流对应的访问行为在对应的第二候选流量中出现的次数大于不属于所述第二参考流量的数据流对应的访问行为在所述第二候选流量中出现的次数,所述第二参考流量为所述P个参考流量中的任一个参考流量。
  10. 如权利要求1至9中任一项所述的方法,其特征在于,所述终端类型判断规则为判断矩阵,所述判断矩阵包括多行元素,所述多行元素与所述多个类型一一对应;
    所述根据终端类型判断规则和所述第一终端设备的访问行为,确定所述第一终端设备的类型,包括:
    根据所述第一终端的访问行为,从所述判断矩阵中确定与所述第一终端设备的访问行为匹配的目标行;
    确定所述第一终端设备的类型为所述目标行对应的类型。
  11. 如权利要求10所述的方法,其特征在于,所述根据所述第一终端的访问行为,从所述判断矩阵中确定与所述第一终端设备的访问行为对应的目标行,包括:
    根据所述第一终端的访问行为,确定参考矩阵,其中所述参考矩阵包括的多个元素的值与所述第一终端设备的访问行为相匹配;
    将所述判断矩阵与所述参考矩阵相乘,得到目标矩阵,所述目标矩阵包括的多个元素与所述判断规则的多行元素一一对应;
    确定所述目标矩阵中值最大的元素对应的一行元素为所述目标行。
  12. 一种计算机设备,其特征在于,包括:
    获取单元,用于获取第一数据流量,所述第一数据流量的发送端为第一终端设备;
    处理单元,用于根据所述第一数据流量中的报文的接收端的标识信息,确定所述第一终端设备的访问行为;
    所述处理单元,还用于根据终端类型判断规则和所述第一终端设备的访问行为,确定所述第一终端设备的类型,其中,所述终端类型判断规则用指示终端设备的访问行为与终端设备的类型的对应关系,所述终端类型判断规则是根据历史数据流量训练得到的。
  13. 如权利要求12所述的计算机设备,其特征在于,所述历史数据流量的发送端包括多个类型的终端设备,所述第一终端设备的类型为所述多个类型中的一个。
  14. 如权利要求13所述的计算机设备,其特征在于,所述终端类型判断规则是根据所述历史数据流量和终端分类信息训练得到的,其中,
    所述终端分类信息用于指示所述多个类型和多组终端标识信息,所述多组终端标识信息中的每组终端标识信息包括至少一个终端的标识信息,
    所述终端分类信息还用于指示所述多个类型和多组终端标识信息的对应关系,所述多个类型和多组终端标识信息一一对应,
    所述多个终端标识信息中的每个终端标识信息包括至少一个终端设备的标识信息,
    所述历史数据流量是根据所述终端分类信息确定的。
  15. 如权利要求14所述的计算机设备,其特征在于,所述历史数据流量包括多个参考流量,所述多个参考流量与所述多个类型一一对应,所述多个参考流量包括第一参考流 量,所述第一参考流量对应的类型为所述第一终端设备的类型;
    所述终端类型判断规则包括多条子规则,所述多条子规则与所述多个类型一一对应,所述多条子规则中对应于所述第一终端设备的类型的子规则是根据所述第一参考流量和所述多个参考流量中除所述第一参考流量以外的参考流量确定的。
  16. 如权利要求15所述的计算机设备,其特征在于,所述第一参考流量是根据第一候选流量确定的,所述第一候选流量是多个候选流量中与所述第一终端设备的类型对应的流量,所述第一参考流量中的每个数据流对应的访问行为在所述第一候选流量中出现的次数大于不属于所述第一参考流量的数据流对应的访问行为在所述第一候选流量中出现的次数。
  17. 如权利要求13所述的计算机设备,其特征在于,所述终端类型判断规则是根据P个服务器集合对P个终端设备进行聚类得到的聚类结果确定的,所述P个终端设备是根据所述历史数据流量确定的,所述P个终端设备与所述P个服务器集合一一对应,所述P个服务器集合中的每个服务器集合是对应的终端设备访问的服务器的集合,所述P个终端设备包括所述多个类型的终端设备,P为大于或等于终端设备的总类型数目的正整数。
  18. 如权利要求17所述的计算机设备,其特征在于,所述历史数据流量是所述P个终端设备的上行数据流,所述P个终端设备是所述历史数据流量的发送端。
  19. 如权利要求17所述的计算机设备,其特征在于,所述P个终端设备中的每个终端设备在所述历史数据流量中作为同步报文的发送端的次数与作为同步报文的接收端的次数之比大于第二预设比例。
  20. 如权利要求17至19中任一项所述的计算机设备,其特征在于,所述历史数据流量包括P个参考流量,所述多个参考流量与所述P个终端设备一一对应,所述P个参考流量与P个候选流量一一对应,第二参考流量包括的每个数据流对应的访问行为在对应的第二候选流量中出现的次数大于不属于所述第二参考流量的数据流对应的访问行为在所述第二候选流量中出现的次数,所述第二参考流量为所述P个参考流量中的任一个参考流量。
  21. 如权利要求12至20中任一项所述的计算机设备,其特征在于,所述终端类型判断规则为判断矩阵,所述判断矩阵包括多行元素,所述多行元素与所述多个类型一一对应;
    所述处理单元,具体用于根据所述第一终端的访问行为,从所述判断矩阵中确定与所述第一终端设备的访问行为匹配的目标行;确定所述第一终端设备的类型为所述目标行对应的类型。
  22. 如权利要求21所述的计算机设备,其特征在于,所述处理单元,具体用于:
    根据所述第一终端的访问行为,确定参考矩阵,其中所述参考矩阵包括的多个元素的值与所述第一终端设备的访问行为相匹配;
    将所述判断矩阵与所述参考矩阵相乘,得到目标矩阵,所述目标矩阵包括的多个元素与所述判断规则的多行元素一一对应;
    确定所述目标矩阵中值最大的元素对应的一行元素为所述目标行。
  23. 一种计算机设备,其特征在于,包括:处理器,所述处理器用于与存储器耦合,读取并执行所述存储器中的指令和/或程序代码,以执行如权利要求1至11中任一项所述的方法。
  24. 一种芯片系统,其特征在于,包括:逻辑电路,所述逻辑电路用于与输入/输出接口耦合,通过所述输入/输出接口传输数据,以执行如权利要求1至11中任一项所述的方法。
  25. 一种计算机可读介质,其特征在于,所述计算机可读介质存储有程序代码,当所述计算机程序代码在计算机上运行时,使得计算机执行如权利要求1至11中任一项所述的方法。
PCT/CN2021/141759 2021-01-20 2021-12-27 判断终端设备类型的方法和相关设备 WO2022156492A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202110078112.9 2021-01-20
CN202110078112 2021-01-20
CN202110420570.6 2021-04-19
CN202110420570.6A CN114785708A (zh) 2021-01-20 2021-04-19 判断终端设备类型的方法和相关设备

Publications (1)

Publication Number Publication Date
WO2022156492A1 true WO2022156492A1 (zh) 2022-07-28

Family

ID=82407725

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/141759 WO2022156492A1 (zh) 2021-01-20 2021-12-27 判断终端设备类型的方法和相关设备

Country Status (2)

Country Link
CN (1) CN114785708A (zh)
WO (1) WO2022156492A1 (zh)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104883278A (zh) * 2014-09-28 2015-09-02 北京匡恩网络科技有限责任公司 一种利用机器学习对网络设备进行分类的方法
CN105704400A (zh) * 2016-04-26 2016-06-22 山东大学 一种基于多平台终端和云服务的学习系统及其运行方法
US20160210645A1 (en) * 2015-01-16 2016-07-21 Linkedin Corporation Dynamically generating feedback based on contextual information
CN106714225A (zh) * 2016-12-29 2017-05-24 北京酷云互动科技有限公司 网络设备的识别方法及其系统、智能终端
CN109063745A (zh) * 2018-07-11 2018-12-21 南京邮电大学 一种基于决策树的网络设备类型识别方法及系统
CN110011973A (zh) * 2019-03-06 2019-07-12 浙江国利网安科技有限公司 工业控制网络访问规则构建方法及训练系统
CN110519106A (zh) * 2019-09-18 2019-11-29 南京中孚信息技术有限公司 目标网络中设备类型的确定方法、装置及电子设备

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104883278A (zh) * 2014-09-28 2015-09-02 北京匡恩网络科技有限责任公司 一种利用机器学习对网络设备进行分类的方法
US20160210645A1 (en) * 2015-01-16 2016-07-21 Linkedin Corporation Dynamically generating feedback based on contextual information
CN105704400A (zh) * 2016-04-26 2016-06-22 山东大学 一种基于多平台终端和云服务的学习系统及其运行方法
CN106714225A (zh) * 2016-12-29 2017-05-24 北京酷云互动科技有限公司 网络设备的识别方法及其系统、智能终端
CN109063745A (zh) * 2018-07-11 2018-12-21 南京邮电大学 一种基于决策树的网络设备类型识别方法及系统
CN110011973A (zh) * 2019-03-06 2019-07-12 浙江国利网安科技有限公司 工业控制网络访问规则构建方法及训练系统
CN110519106A (zh) * 2019-09-18 2019-11-29 南京中孚信息技术有限公司 目标网络中设备类型的确定方法、装置及电子设备

Also Published As

Publication number Publication date
CN114785708A (zh) 2022-07-22

Similar Documents

Publication Publication Date Title
AU2021218110B2 (en) Learning from distributed data
CN111027048B (zh) 一种操作系统识别方法、装置、电子设备及存储介质
CN112235264B (zh) 一种基于深度迁移学习的网络流量识别方法及装置
WO2019105163A1 (zh) 目标人物的搜索方法和装置、设备、程序产品和介质
CN107292154B (zh) 一种终端特征识别方法及系统
CN108429718B (zh) 账号识别方法及装置
EP3716547B1 (en) Data stream recognition method and apparatus
CN106130806B (zh) 数据层实时监控方法
CN111523012B (zh) 用于检测异常数据的方法、设备和计算机可读存储介质
CN113328985B (zh) 一种被动物联网设备识别方法、系统、介质及设备
WO2020020098A1 (zh) 网络流测量的方法、网络测量设备以及控制面设备
CN110647895B (zh) 一种基于登录框图像的钓鱼页面识别方法及相关设备
CN113762377A (zh) 网络流量识别方法、装置、设备及存储介质
JP2007243459A (ja) トラヒック状態抽出装置及び方法ならびにコンピュータプログラム
CN113268735B (zh) 分布式拒绝服务攻击检测方法、装置、设备和存储介质
WO2022156492A1 (zh) 判断终端设备类型的方法和相关设备
CN113872962A (zh) 一种面向高速网络抽样数据采集场景的慢速端口扫描检测方法
An et al. Traffic Identification Based on Applications using Statistical Signature Free from Abnormal TCP Behavior.
US20190050673A1 (en) Synthetic rare class generation by preserving morphological identity
Dong et al. FPETD: Fault‐Tolerant and Privacy‐Preserving Electricity Theft Detection
US20240232343A1 (en) Attack Detection Method and Apparatus
CN117395162B (zh) 利用加密流量识别操作系统的方法、系统、设备及介质
CN115150165B (zh) 一种流量识别方法及装置
US11941626B2 (en) System and method for associating a cryptocurrency address to a user
CN112836212B (zh) 邮件数据的分析方法、钓鱼邮件的检测方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21920864

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21920864

Country of ref document: EP

Kind code of ref document: A1