CN115865387A - Active and passive network terminal discovery and identification method based on K-means clustering - Google Patents

Active and passive network terminal discovery and identification method based on K-means clustering Download PDF

Info

Publication number
CN115865387A
CN115865387A CN202111464064.3A CN202111464064A CN115865387A CN 115865387 A CN115865387 A CN 115865387A CN 202111464064 A CN202111464064 A CN 202111464064A CN 115865387 A CN115865387 A CN 115865387A
Authority
CN
China
Prior art keywords
terminal equipment
type
active
terminal
discovery
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202111464064.3A
Other languages
Chinese (zh)
Inventor
陈伟
侯倩
陈红
徐开军
温佳伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing College of Information Technology
Original Assignee
Nanjing College of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing College of Information Technology filed Critical Nanjing College of Information Technology
Priority to CN202111464064.3A priority Critical patent/CN115865387A/en
Publication of CN115865387A publication Critical patent/CN115865387A/en
Withdrawn legal-status Critical Current

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses an active and passive network terminal discovery and identification method based on K-means clustering, which comprises the steps of obtaining a characteristic sequence sample set of each type of terminal equipment in a network; performing a K-means clustering algorithm according to the obtained characteristic sequence sample set to obtain an optimal cluster center corresponding to each type of terminal equipment; acquiring a characteristic sequence of newly accessed terminal equipment in a network, wherein the characteristic sequence acquires characteristic information of the terminal equipment in an active and passive mode and processes and generates the acquired characteristic information; and calculating the distance between the acquired characteristic sequence of the new access terminal equipment and the optimal cluster center of each terminal equipment type, and determining the type of the new access terminal equipment according to the cluster corresponding to the shortest distance. The automatic discovery and automatic classification of the network equipment are realized, the characteristic information of the equipment is prevented from being easily perceived after being tampered, and the safety and management and control of the network equipment are effectively guaranteed.

Description

Active and passive network terminal discovery and identification method based on K-means clustering
Technical Field
The invention relates to the technical field of computer and network information processing, in particular to an active and passive network terminal discovery and identification method based on K-means clustering.
Background
In recent years, the situation of network security is becoming more severe, and network attack events initiated by using network end terminals occur occasionally, so that the destructive power caused by the attack is obviously increased, and the influence range tends to be enlarged. The marketing site terminals are various in types, various in structure and complex in service scene, partial terminals cannot be provided with clients or modified to realize access control, and unified safety control cannot be realized, so that automatic discovery of terminal equipment is more important, the traditional method only depends on the method of equipment active reporting and cannot meet the current requirements, the automatic discovery and type identification technology for accessing various marketing site terminals is realized, support can be provided for further equipment legal verification and fine control, and a foundation can be laid for improving the safety protection capability of the marketing site terminals.
Disclosure of Invention
The invention aims to provide an active and passive network terminal discovery and identification method based on K-means clustering, which realizes automatic discovery and automatic classification of network equipment, avoids that equipment characteristic information is not easy to perceive after being tampered, and effectively ensures the safety and management and control of the network equipment.
The invention adopts the following technical scheme for realizing the aim of the invention:
the invention provides an active and passive network terminal discovery and identification method based on K-means clustering, which comprises the following steps:
acquiring a characteristic sequence sample set of each type of terminal equipment in a network, wherein the characteristic sequence sample set acquires characteristic information of each type of terminal equipment in an active and passive mode, processes the acquired characteristic information to generate a characteristic sequence, and then intensively forms the characteristic sequence of each type of terminal equipment;
performing a K-means clustering algorithm according to the obtained characteristic sequence sample set to obtain an optimal cluster center corresponding to each type of terminal equipment;
acquiring a characteristic sequence of newly accessed terminal equipment in a network, wherein the characteristic sequence acquires characteristic information of the terminal equipment in an active and passive mode and processes and generates the acquired characteristic information;
and calculating the distance between the acquired characteristic sequence of the newly accessed terminal equipment and the optimal cluster center of each terminal equipment type, and determining the type of the newly accessed terminal equipment according to the cluster corresponding to the shortest distance.
Further, the method for acquiring the characteristic information of each type of terminal equipment in an active and passive mode comprises the following steps:
actively discovering the terminal equipment, confirming that the equipment is on line through a ping message, sending a protocol inquiry message to the terminal equipment through UDP scanning and TCP scanning technologies, and sending an HTTP message with a GET request to the terminal equipment to obtain a reply message of the terminal equipment;
receiving an HTTP message returned by the terminal equipment, analyzing a message header field sequence, and extracting the characteristic information of the terminal equipment from a sequence field;
and passively discovering the terminal equipment, performing hierarchical analysis on the message by capturing the receiving and sending flow of each terminal equipment, and acquiring the characteristic information of the network terminal from the message.
Further, the feature information of the target device extracted from the sequence field includes one or a combination of the following information:
server, port, authorization, MAC Address, version, IP Address, operating system type, server Version, vendor, and model.
Further, the feature information of the target device obtained from the message includes one or a combination of the following information:
IP, MAC, UDP port number, TCP port number, traffic size, packet length, payload, access time, application protocol and protocol type.
Further, the method for processing the collected feature information to generate the feature sequence comprises the following steps:
and carrying out binary combination on the characteristic information of the active discovery and the passive discovery to form a characteristic sequence of the terminal equipment, wherein each characteristic value in the characteristic sequence is stored in a character string form.
Further, the method for performing a K-means clustering algorithm according to the obtained feature sequence sample set and finally obtaining the optimal cluster center corresponding to each type of terminal device through clustering calculation comprises the following steps:
acquiring a K value and an initial cluster center of the determined K-means cluster;
after obtaining the K value and the initial cluster center, taking the obtained characteristic sequence sample set as an input data set for K-means clustering calculation, then starting the K-means clustering, and obtaining the optimal cluster center corresponding to each type of terminal equipment through K-means distance calculation.
Further, the method for determining the K value and the initial cluster center of the K-means cluster comprises the following steps:
counting the types K of various types of terminal equipment in the network, and determining the K value of the K-means cluster according to the types K;
and selecting the characteristic sequence of the typical terminal equipment in each type of terminal equipment as the initial cluster center of the terminal equipment of the type.
The invention has the following beneficial effects:
according to the invention, various complete characteristic information of the terminal equipment can be found through an active and passive discovery technology, the optimal cluster center of the type of the terminal equipment can be obtained through rapid clustering by a K-means clustering algorithm, a basis is provided for identifying the type of the terminal equipment, automatic discovery and automatic classification of the network equipment are realized, the characteristic information of the equipment is prevented from being easily perceived after being tampered, and the safety and control of the network equipment are effectively ensured.
Drawings
Fig. 1 is a schematic diagram of a terminal network topology in a K-means cluster-based active and passive network terminal discovery and identification method according to an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating active discovery information acquisition in a K-means cluster-based active and passive network terminal discovery identification method according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating passive discovery information acquisition in a K-means cluster-based active and passive network terminal discovery identification method according to an embodiment of the present invention;
fig. 4 is a flowchart of an active and passive network terminal discovery and identification method based on K-means clustering according to an embodiment of the present invention.
Detailed Description
The invention is further described with reference to specific examples. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
The invention provides an active and passive network terminal discovery and identification method based on K-means clustering, which comprises the following steps:
(1) Firstly, environment setting for terminal equipment discovery is carried out on the existing field terminal. The access terminal discovers the identification device in the terminal network topology and obtains the network terminal service traffic through the mirror image setting of the switch port, as shown in fig. 1.
(2) In the terminal discovery and identification device, firstly, a field known terminal is actively discovered, firstly, the device is confirmed to be online through a ping message, then a protocol inquiry message is sent to a target device through UDP scanning and TCP scanning technologies, and as browser services used by each device are different, a reply message of the target can be obtained by sending an HTTP message with a GET request to the target device, as shown in figure 2.
(3) The terminal discovery identification equipment receives the HTTP message returned by the target terminal, analyzes a message header field sequence, and extracts the characteristic information of the target equipment from a sequence field, wherein the characteristic information comprises server, port, authorization, MAC Address, version, IP Address, the type of an operating system, the Version of a server, the manufacturer and the model. The content for which the feature information is not retrieved is marked as 0.
(4) In the terminal discovery identification device, the passive discovery is carried out on the known terminals on site, the layered analysis of the message is carried out by capturing the receiving and sending flow of each terminal, and the characteristic information of the terminal, including IP, MAC, UDP port number, TCP port number, flow size, message length, payload, access time, application protocol and protocol type, is obtained from the message. For a feature for which no feature information is retrieved, it is marked as 0. As shown in fig. 3.
(5) The method comprises the steps of combining feature information which is actively discovered and passively discovered on a terminal discovery identification device to form a sample sequence of the terminal device, combining feature information systems acquired during terminal discovery to form a feature sequence { MAC, IP, UDPport, TCPport, protocol, server, authorization, version, model and Brand } of the terminal device, storing feature values in a character string mode, and finally concentrating the feature sequences of all terminals on site to form a sample total set.
(6) Counting the type number K of the field terminals, and determining the K value of the K-means cluster according to the type number K of the field terminals.
(7) Setting initial clusters of K-means clustering according to the conditions of field terminals, and selecting a characteristic sequence of typical terminal equipment in each type of terminal equipment as an initial cluster center of the type of terminal equipment for K-means clustering calculation in order to improve iteration efficiency.
(8) After K values and initial cluster centers are determined, K-means clustering is started, input data of clustering is a feature sequence sample set of each type of terminal equipment, a feature sequence of each terminal equipment comprises feature values of 10 dimensions, the similarity in the clustering process is based on the Euclidean distance between the terminal equipment and the initial cluster centers, and the Euclidean distance calculation method comprises the following steps:
Figure BDA0003389662310000031
and finally obtaining the optimal cluster center through continuous iterative computation. As shown in fig. 4.
(9) And carrying out equipment discovery and terminal type judgment on new terminal equipment in the access network.
(10) After the new terminal equipment is accessed into the field network, the terminal discovery identification equipment starts active discovery and passive discovery, detects the characteristic information of the new terminal equipment and constructs the characteristic sequence of the new terminal equipment.
(11) And calculating the distance between the characteristic sequence and the optimal cluster center of each terminal equipment type, and determining the type of the newly accessed terminal equipment according to the cluster corresponding to the shortest distance.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (7)

1. A K-means cluster-based active and passive network terminal discovery and identification method is characterized by comprising the following steps:
acquiring a characteristic sequence sample set of each type of terminal equipment in a network, wherein the characteristic sequence sample set acquires characteristic information of each type of terminal equipment in an active and passive mode, processes the acquired characteristic information to generate a characteristic sequence, and then intensively forms the characteristic sequence of each type of terminal equipment;
performing a K-means clustering algorithm according to the obtained characteristic sequence sample set to obtain an optimal cluster center corresponding to each type of terminal equipment; acquiring a characteristic sequence of terminal equipment in a new access network, wherein the characteristic sequence acquires characteristic information of the terminal equipment in an active and passive mode and processes and generates the acquired characteristic information;
and calculating the distance between the acquired characteristic sequence of the newly accessed terminal equipment and the optimal cluster center of each terminal equipment type, and determining the type of the newly accessed terminal equipment according to the cluster corresponding to the shortest distance.
2. The active and passive network terminal discovery and identification method based on K-means clustering according to claim 1, wherein the method for acquiring the characteristic information of each type of terminal equipment in an active and passive mode comprises:
actively discovering the terminal equipment, confirming that the equipment is on line through a ping message, sending a protocol inquiry message to the terminal equipment through UDP scanning and TCP scanning technologies, and sending an HTTP message with a GET request to the terminal equipment to obtain a reply message of the terminal equipment; receiving an HTTP message returned by the terminal equipment, analyzing a message header field sequence, and extracting characteristic information of the terminal equipment from a sequence field; and passively discovering the terminal equipment, performing layered analysis on the message by capturing the receiving and sending flow of each terminal equipment, and acquiring the characteristic information of the network terminal from the message.
3. The active-passive network terminal discovery identification method based on K-means clustering of claim 2, wherein the feature information of the target device extracted from the sequence field comprises one or a combination of the following information: server, port, authorization, MAC Address, version, IP Address, operating system type, server Version, vendor, and model.
4. The active and passive network terminal discovery and identification method based on K-means clustering according to claim 2 or 3, wherein the feature information of the target device obtained from the message includes one or a combination of the following information:
IP, MAC, UDP port number, TCP port number, traffic size, packet length, payload, access time, application protocol and protocol type.
5. The active and passive network terminal discovery and identification method based on K-means clustering according to claim 1, wherein the method for processing the collected feature information to generate the feature sequence comprises:
and carrying out binary combination on the characteristic information of the active discovery and the passive discovery to form a characteristic sequence of the terminal equipment, wherein each characteristic value in the characteristic sequence is stored in a character string form.
6. The active and passive network terminal discovery and identification method based on K-means clustering according to claim 1, wherein the method for obtaining the optimal cluster center corresponding to each type of terminal device by performing the K-means clustering algorithm according to the obtained feature sequence sample set comprises:
acquiring a K value and an initial cluster center of the determined K-means cluster;
after obtaining the K value and the initial cluster center, taking the obtained characteristic sequence sample set as an input data set for K-means clustering calculation, then starting the K-means clustering, and calculating through a K-means algorithm to obtain the optimal cluster center corresponding to each type of terminal equipment.
7. The active and passive network terminal discovery and identification method based on K-means clustering according to claim 6, wherein the K value of the K-means clustering and the initial cluster center determining method comprises:
counting the types K of various types of terminal equipment in the network, and determining the K value of the K-means cluster according to the types K;
and selecting the characteristic sequence of the typical terminal equipment in each type of terminal equipment as the initial cluster center of the terminal equipment of the type.
CN202111464064.3A 2021-12-02 2021-12-02 Active and passive network terminal discovery and identification method based on K-means clustering Withdrawn CN115865387A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111464064.3A CN115865387A (en) 2021-12-02 2021-12-02 Active and passive network terminal discovery and identification method based on K-means clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111464064.3A CN115865387A (en) 2021-12-02 2021-12-02 Active and passive network terminal discovery and identification method based on K-means clustering

Publications (1)

Publication Number Publication Date
CN115865387A true CN115865387A (en) 2023-03-28

Family

ID=85653254

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111464064.3A Withdrawn CN115865387A (en) 2021-12-02 2021-12-02 Active and passive network terminal discovery and identification method based on K-means clustering

Country Status (1)

Country Link
CN (1) CN115865387A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116682128A (en) * 2023-06-02 2023-09-01 中央民族大学 Method, device, equipment and medium for constructing and identifying data set of water book single word

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116682128A (en) * 2023-06-02 2023-09-01 中央民族大学 Method, device, equipment and medium for constructing and identifying data set of water book single word

Similar Documents

Publication Publication Date Title
CN110113345B (en) Automatic asset discovery method based on flow of Internet of things
CN112714045B (en) Rapid protocol identification method based on device fingerprint and port
WO2022083353A1 (en) Abnormal network data detection method and apparatus, computer device, and storage medium
CN101282331B (en) Method for recognizing P2P network flow based on transport layer characteristics
US20220174008A1 (en) System and method for identifying devices behind network address translators
CN111277570A (en) Data security monitoring method and device, electronic equipment and readable medium
CN112270346B (en) Internet of things equipment identification method and device based on semi-supervised learning
CN111464485A (en) Encrypted proxy flow detection method and device
CN112019449B (en) Traffic identification packet capturing method and device
WO2020022953A1 (en) System and method for identifying an internet of things (iot) device based on a distributed fingerprinting solution
CN115865387A (en) Active and passive network terminal discovery and identification method based on K-means clustering
CN113872962B (en) Low-speed port scanning detection method for high-speed network sampling data acquisition scene
CN111224891B (en) Flow application identification system and method based on dynamic learning triples
CN111478925B (en) Port scanning detection method and system applied to industrial control environment
CN112788039A (en) DDoS attack identification method, device and storage medium
CN115065519B (en) Distributed side-end cooperative DDoS attack real-time monitoring method
CN111200543A (en) Encryption protocol identification method based on active service detection engine technology
CN116346434A (en) Method and system for improving monitoring accuracy of network attack behavior of power system
CN113726809B (en) Internet of things equipment identification method based on flow data
CN110381038B (en) Information verification method and system based on video network
CN113765891A (en) Equipment fingerprint identification method and device
CN112469034A (en) Internet of things gateway device capable of safely authenticating physical sensing equipment and access method thereof
CN115348188B (en) DNS tunnel traffic detection method and device, storage medium and terminal
CN111144504B (en) Software mirror image flow identification and classification method based on PCA algorithm
CN115412465B (en) Method and system for generating distributed real network flow data set based on client

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20230328

WW01 Invention patent application withdrawn after publication