CN112068926A - Method for identifying virtual machine in local area network - Google Patents

Method for identifying virtual machine in local area network Download PDF

Info

Publication number
CN112068926A
CN112068926A CN202010759077.2A CN202010759077A CN112068926A CN 112068926 A CN112068926 A CN 112068926A CN 202010759077 A CN202010759077 A CN 202010759077A CN 112068926 A CN112068926 A CN 112068926A
Authority
CN
China
Prior art keywords
equipment
protocol
identified
characteristic information
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010759077.2A
Other languages
Chinese (zh)
Other versions
CN112068926B (en
Inventor
喻灵婧
周钊宇
刘庆云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN202010759077.2A priority Critical patent/CN112068926B/en
Publication of CN112068926A publication Critical patent/CN112068926A/en
Application granted granted Critical
Publication of CN112068926B publication Critical patent/CN112068926B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/09Mapping addresses
    • H04L61/10Mapping addresses of different types
    • H04L61/103Mapping addresses of different types across network layers, e.g. resolution of network layer into physical layer addresses or address resolution protocol [ARP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2101/00Indexing scheme associated with group H04L61/00
    • H04L2101/60Types of network addresses
    • H04L2101/618Details of network addresses
    • H04L2101/622Layer-2 addresses, e.g. medium access control [MAC] addresses
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2101/00Indexing scheme associated with group H04L61/00
    • H04L2101/60Types of network addresses
    • H04L2101/618Details of network addresses
    • H04L2101/659Internet protocol version 6 [IPv6] addresses

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a method for identifying a virtual machine in a local area network, which comprises the steps of acquiring broadcast and multicast flow of equipment to be identified in the local area network, and taking the flow as the flow to be identified; extracting characteristic information of a data transmission protocol of traffic to be identified, and classifying the characteristic information into each characteristic type according to the corresponding relation between a preset characteristic type and each data transmission protocol; vectorizing the characteristic information of each characteristic type, and splicing to obtain fingerprint information; inputting fingerprint information into a pre-trained equipment identification model, outputting prediction results of an mDNS view and an LBN view, and judging whether equipment to be identified is abnormal equipment; if the equipment to be identified is abnormal equipment, comparing the prediction results of the mDNS view and the LBN view, and if the two prediction results are different, judging that the equipment to be identified is a virtual machine. The method can identify the virtual machine without increasing the load in the network and without interacting with other equipment.

Description

Method for identifying virtual machine in local area network
Technical Field
The invention relates to the technical field of network information, in particular to a method for identifying a virtual machine in a local area network.
Background
A Virtual Machine (Virtual Machine) refers to a complete computer system with complete hardware system functionality, which is emulated by software and runs in a completely isolated environment. The identification method for the virtual machine mainly comprises the following categories:
the first type is that the current running environment is detected through an application program, and whether a provider of the environment is a virtual machine or not is judged. Such methods include identifying memory characteristics of the virtual machine, identifying using performance differences, identifying using abnormal behavior of the virtual processor, and the like. The most typical method for using the memory features is a detection method implemented by using idt (interrupt Descriptor table) or ldt (local Descriptor table). The method judges whether the current execution environment is the virtual machine or not through different positions of the virtual machine and the real host in the key data structure of the operating system. For example, the redipull tool determines whether a virtual machine exists by detecting whether the base address of the IDT exceeds a certain value. Danny quis et al (quis, d., Smith, v., & Computing, O. (2006). Detecting the presence of virtual machines using the local data table. off sensing Computing,25(04)) detect virtual machines by LDT. By using performance difference generated by virtualization, Jason Franklin and the like (Franklin, Jason, et al, "Remote detection of virtual machine monitor with fuzzy marking." ACM SIGOPS Operating Systems Review 42.3(2008):83-92.) compare time difference of running on a real host and a device to be tested to judge whether the device to be tested is a virtual machine or not through design codes. The abnormal behavior detection method based on the virtual processor mainly checks whether the current environment supports machine language instructions related to the virtual machine or not and checks whether a communication channel exists between the virtual machine and a host machine or not. For example, the VMDetect tool determines whether the current environment is a Virtual PC environment by observing whether the Virtual PC related non-standard IA32 instruction returns an invalid opcode.
The second type is to judge whether the device is a virtual machine or not by extracting features from the network data packet. For example, chinese patent CN102025535B discloses that a network administrator previously counts and records MAC addresses of all virtual machines in a network, and compares MAC address information in traffic with the MAC addresses recorded in advance, thereby completing virtual identification tracking. Chinese patent CN107741872A discloses that by obtaining a user name, a machine media access control MAC address, a user network address, a hard disk serial number and a BIOS serial number, and according to whether the characteristics thereof meet the parameter specification of a virtual machine, it is determined whether the device is a virtual machine.
For the method of detection by an application program, the program needs to be deployed on a device to complete the detection, and the detection needs to be compared with the characteristics of a real host. Meanwhile, for the detection method using the IDT, the false alarm rate on a device with a plurality of CPUs is high, because each CPU in the multi-CPU device has a different IDT, and the IDT base address takes a different value. For the detection method of the performance difference, most of the hardware configuration information of the device to be tested needs to be known, and such information is not necessarily easy to obtain. For the detection mode based on the abnormal behavior of the virtual processor, the detection mode depends on the specific implementation of different virtual machines, and the detection mode has no universality.
For the method using network characteristics, the method using MAC addresses needs to grasp the conditions of monitoring all virtual machines in the network in advance, which increases the workload on one hand and cannot discover the virtual machines which can conceal themselves on the other hand. However, in the current method of directly determining by using the MAC prefix, when the virtual machine is connected to the network by using the NAT method or the bridging method of sharing the MAC address with the host, the virtual machine cannot be identified by the MAC prefix. For the method for comparing the parameter specification of the virtual machine, on one hand, the characteristics such as the user name and the like can be automatically modified by the user, and on the other hand, the prior knowledge of the parameter specification of the virtual machine is needed, so that the identification efficiency is low.
Disclosure of Invention
The invention aims to provide a method for realizing virtual machine identification in a local area network, and a user can identify a virtual machine without increasing the load in the network and interacting with other equipment by using the method provided by the invention.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method for identifying a virtual machine in a local area network comprises the following steps:
acquiring broadcast and multicast traffic of equipment to be identified in a local area network, and taking the traffic as the traffic to be identified;
extracting characteristic information of a data transmission protocol of traffic to be identified, and classifying the characteristic information into each characteristic type according to the corresponding relation between a preset characteristic type and each data transmission protocol;
vectorizing the characteristic information of each characteristic type, and splicing to obtain fingerprint information;
inputting the fingerprint information into a pre-trained equipment identification model, outputting prediction results of an mDNS view and an LBN view, and judging whether the equipment to be identified is abnormal equipment;
if the equipment to be identified is abnormal equipment, comparing the prediction results of the mDNS view and the LBN view, and if the two prediction results are different, judging that the equipment to be identified is a virtual machine.
Furthermore, a device identification device provided with a monitoring tool is deployed in the local area network, and the device is used for monitoring the traffic and collecting the traffic to be identified.
Further, the equipment identification device formats the acquired data packet of the traffic to be identified, and converts the data packet into a preset Json format; combining all Json elements with the same source MAC address into one Json element taking a source MAC address as a key, wherein the source MAC address corresponds to the equipment to be identified; and for the Json elements with the same content, only retaining the content of one element through duplicate removal, wherein the content in each Json element is the effective load content of the traffic to be identified.
Further, the data transmission protocol includes ARP protocol, ICMPv6 protocol, mDNS protocol, DHCP protocol, IGMP protocol, SSDP protocol, LLC protocol, LLMNR protocol, UDP protocol, ethernet protocol.
Further, the feature types include a DHCP class, an mDNS class, an SSDP class, an LBN class, a UDP class, and a protseq class; the characteristic information of the DHCP class comprises characteristic information of a DHCP protocol and a DHCPv6 protocol, the characteristic information of the mDNS class comprises characteristic information of the mDNS protocol, the characteristic information of the SSDP class comprises characteristic information of the SSDP protocol, the characteristic information of the LBN class comprises characteristic information of an LLMNR protocol, a BROWSER protocol and an NBNS protocol, the characteristic information of the UDP class comprises characteristic information of a UDP protocol, and the characteristic information of the protseq class comprises a protocol sequence and a source MAC address prefix of a preset data transmission protocol.
Further, vectorizing the feature information of each feature type according to the data features, including: the characteristic information of DHCP, DHCPv6, SSDP, LLMNR, BROWSER and NBNS protocols has key value pair type, and vectorization is carried out according to onehot coding form; the characteristic information of the mDNS protocol has a pseudo-natural language type, and vectorization is carried out by using word2vec and LDA.
Further, when extracting the feature information, for part of the feature information without distinguishing characteristics, the feature information is replaced by a character string set in a unified manner to simplify subsequent data processing.
Further, before inputting the fingerprint information into the device identification model, it is first determined whether a device MAC address prefix contained in the traffic to be identified represents a virtual machine manufacturer, and if so, it is directly determined that the device to be identified is a virtual machine.
Further, when the device identification model is trained, the characteristic information training samples of the known devices are vectorized to obtain fingerprint information of each known device, and then the fingerprint information is used for training.
Further, the equipment identification model is a multi-view breadth and deep learning model MvWDL, dense embedding representation of feature views of various feature types is fused to a neural network based on deep fusion and a neural network based on breadth fusion, a multi-view neural network model based on a hybrid fusion mode is constructed based on the two neural networks, and the model MvWDL is obtained; the output end of the model comprises an equipment category identification classifier and an abnormal equipment monitor, the equipment category identification classifier is used for obtaining equipment information according to output final condition probability, the final condition probability is the sum of the classification judgment probability obtained by the neural network with breadth fusion and the classification judgment probability obtained by the neural network with depth fusion, and the abnormal equipment monitor is used for outputting a judgment result of whether the equipment is abnormal equipment.
Further, if the output of the equipment identification model meets a preset inconsistency judgment condition, judging that the equipment to be identified is abnormal equipment; the inconsistency determination condition is: and calculating an inconsistent quantization value by an inconsistent judgment algorithm according to an identification structure obtained by the breadth-fused neural network according to the characteristic information input by each characteristic view, and judging that the equipment to be identified is abnormal equipment if the calculated inconsistent quantization value exceeds a preset threshold value.
The method provided by the invention can identify the counterfeit DHCP server in the local area network, and has the following advantages:
1. the network load is not increased in the process of identifying the equipment, and the control authority of other network equipment is not required;
2. high-accuracy identification of virtual machines in a local area network can be achieved through MAC address prefixes and inconsistency of predictions between mDNS views and LBN views in an MvWDL model. In the sample set containing dahua-echange-tc7c, qihoo360-d302, edimax, hikvision-cs-c2c-1a1wfr, edenet, hikvisio, lenov-snowman, xiaomi-ipc009, tp-link-tl-ipc40a-4, xiaomi-dafang-df3, dlink-dcs-930lb, dlink-dcs-935l, the recognition accuracy was 100%.
3. When the virtual machine is different from the host machine operating system, the virtual machine which is accessed to the local area network in the form of NAT and in the form of bridging which shares MAC address with the host machine can be identified.
Drawings
Fig. 1 is a flowchart of an identification method for a virtual machine in a local area network according to an embodiment of the present disclosure.
Fig. 2 is a schematic structural diagram of an apparatus identification model according to an embodiment of the disclosure.
Detailed Description
In order to make the technical solution of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.
The virtual machine identification method provided by the invention is implemented by connecting to a local area network and acquiring a broadcast/multicast data packet in the local area network, and comprises the following steps.
Step S01, acquiring traffic to be identified sent by equipment to be identified in a wireless network; the traffic to be identified is broadcast and multicast traffic sent by the equipment to be identified.
An equipment identification device is deployed in a wireless network, and the equipment identification device can be any intelligent equipment capable of receiving flow in the wireless network, such as a personal computer, a mobile phone, a gateway server and the like. Monitoring traffic received by a wireless network card of the device identification apparatus through a monitoring tool, such as tcpdump, wireshark, etc., pre-installed on the device identification apparatus, and collecting broadcast and multicast traffic therein as traffic to be identified.
The collected traffic to be identified corresponds to different devices to be identified respectively according to the source MAC address contained in the traffic.
Step S02, obtaining characteristic information of each characteristic type according to the data transmission protocol of each flow to be identified and the corresponding relation between the preset characteristic type and each data transmission protocol, and inputting the characteristic information into a preset equipment identification model; the equipment identification model is obtained after training of the characteristic information training sample marked with the corresponding equipment information.
When the device to be identified sends broadcast and multicast traffic, different data transmission protocols are used according to actual needs, for example: ARP protocol, ICMPv6 protocol, mDNS protocol, DHCP protocol, IGMP protocol, SSDP protocol, LLC protocol, LLMNR protocol, UDP protocol, ETHERTYPE protocol, and the like. Different data transmission protocols have different data formats and contain different data contents, so that part of the data transmission protocols can be selected in advance, and the characteristic information of each data transmission protocol is extracted from the flow to be identified corresponding to the selected data transmission protocol. For example: for the DHCP protocol, the data packet of the traffic to be identified corresponding to the DHCP protocol includes: the data content of each data packet comprises a plurality of options, and the data content corresponding to the preset options is extracted as the characteristic information corresponding to the DHCP protocol.
When extracting the feature information, the subsequent data processing can be simplified by adopting a simple replacement mode for the feature information with no distinguishing characteristics in part, for example, if the feature information is an IPv4 or IPv6 address, the feature information can be replaced by a character string "IPv 4" or "IPv 6".
According to the data characteristics of each data transmission protocol, dividing the characteristic information of each data transmission protocol into a plurality of preset characteristic types so as to obtain the characteristic information of each characteristic type, and inputting the obtained characteristic information of each characteristic type into a pre-constructed and trained equipment identification model.
The equipment recognition model is a pre-constructed neural network model, and is trained by pre-collecting a training sample set, wherein the training sample set comprises a large number of characteristic information training samples which are labeled in advance. The characteristic information training sample is derived from broadcast and multicast flows of identified equipment in various wireless networks, and characteristic information of each characteristic type of the identified equipment is obtained through characteristic extraction.
In step S03, when a certain virtual machine manufacturer is indicated by the MAC address prefix, it may be determined that the device is a virtual machine. Otherwise, if the equipment identification model judges that the equipment to be identified is abnormal equipment, the mDNS view and the LBN view are further compared. And if the prediction results (including the manufacturer/type of the prediction equipment and the reliability of the prediction) of the mDNS view and the LBN view are different strongly, judging the equipment as the virtual machine.
Further, the step S02 specifically includes:
and S021, respectively extracting characteristic information of each data transmission protocol according to the data transmission protocol of each flow to be identified.
The monitoring tool installed on the device identification apparatus formats the collected packets of the traffic to be identified, and converts the packets into a preset Json format, for example, the wireshark tool may convert the collected packets of the broadcast and multicast traffic into preset Json elements by using the supported network protocol analyzer tshark. And combining all Json elements with the same source MAC address into one Json element taking the source MAC address as a key, wherein the source MAC address corresponds to the equipment to be identified. For the Json elements with the same content, the content of only one element is reserved after the duplication is removed. And the content in each Json element is the effective load content of the traffic to be identified, which is sent by the equipment to be identified.
And then according to the data transmission protocol of each flow to be identified, performing feature extraction on the Json format data, and extracting feature information of each data transmission protocol of the equipment to be identified from Json elements corresponding to each data transmission protocol.
And S022, classifying the characteristic information of each data transmission protocol into each characteristic type according to the corresponding relation between the preset characteristic type and each data transmission protocol.
The preset feature types specifically include: DHCP class, mDNS class, SSDP class, LBN class, UDP class, and protseq class; correspondingly, the feature information of the DHCP class includes feature information of a DHCP protocol and a DHCPv6 protocol, the feature information of the mDNS class includes feature information of the mDNS protocol, the feature information of the SSDP class includes feature information of the SSDP protocol, the feature information of the LBN class includes feature information of an LLMNR protocol, a BROWSER protocol and an NBNS protocol, the feature information of the UDP class includes feature information of a UDP protocol, and the feature information of the protseq class includes a protocol sequence and a source MAC address prefix of a preset data transmission protocol.
The characteristic information corresponding to each data transmission protocol is classified according to the data characteristics of each data transmission protocol, such as data structure and data content. The embodiment of the invention provides an example, which is specifically divided into the following six characteristic types: DHCP class, mDNS class, SSDP class, LBN class, UDP class, and protseq class.
And classifying the feature information extracted from the Json elements corresponding to the DHCP protocol and the DHCPv6 into the feature information of DHCP class.
And taking the characteristic information extracted from the Json element corresponding to the mDNS protocol as the characteristic information of the mDNS class.
And taking the feature information extracted from the Json element corresponding to the SSDP protocol as the feature information of the SSDP class.
Extracting the feature information obtained from Json elements corresponding to LLMNR protocol, BROWSER protocol and NBNS protocol as the feature information of LBN class
And taking the feature information extracted from the Json element corresponding to part of the UDP protocol as feature information of UDP classes.
In addition, the protocol sequences of all data transmission protocols of the equipment to be identified and the source MAC address prefix of the equipment to be identified are used as the characteristic information of the protseq class.
The characteristic information of each characteristic type is used as a characteristic view of the equipment identification model by classifying the characteristic information, and equipment identification is carried out together by adopting the thought of multi-view learning and the mode of mutually complementing different views.
Step S023, vectorizing the feature information of each feature type according to a preset data processing method corresponding to each data transmission protocol to obtain vector information of each feature type.
Before inputting the feature information of each data type into the device identification model, the feature information also needs to be vectorized. Due to the difference of the data structure of each data transmission protocol, the extracted characteristic information has different data characteristics, for example, the characteristic information of the DHCP, DHCPv6, SSDP, LLMNR, BROWSER, NBNS protocols has key-value pair type, and the characteristic information of the mDNS protocol has pseudo-natural language type. Therefore, a preset data processing method corresponding to each data transmission protocol is adopted when vectorization is performed. For example, for feature information of a key-value pair type, it is vectorized in onehot coded form, and for feature information of a pseudo-natural language type, it is vectorized using word2vec and LDA.
And S024, splicing the vector information of each characteristic type according to a preset splicing rule to be used as fingerprint information of the equipment to be identified, and inputting the fingerprint information into the preset equipment identification model.
And according to a preset splicing rule, transversely splicing the vector information of the characteristic information obtained after vectorization to form the fingerprint information of the equipment to be identified.
And inputting the fingerprint information of the equipment to be identified into the trained equipment identification model to obtain the equipment information of the equipment to be identified.
Similarly, in the process of training the device model to be recognized, the feature information training samples of the known devices need to be vectorized to obtain the fingerprint information of the known devices, and then the fingerprint information is used for training.
Since the device identification apparatus continuously collects the traffic to be identified, the fingerprint information of the device to be identified may be continuously updated according to the increase of the collected broadcast and multicast traffic. Specifically, the fingerprint information of the device to be identified may be obtained according to a preset interval period, for example, 30 seconds or 1 minute.
The embodiment of the invention formats the acquired traffic to be identified, and then performs feature extraction, vectorization and splicing according to each data transmission protocol to obtain the fingerprint information of each device to be identified as the input of the device identification model, thereby being capable of more accurately identifying the device to be identified in the wireless network.
Fig. 2 is a schematic structural diagram of an equipment identification model according to an embodiment of the present invention, where the equipment identification model specifically includes a neural network with depth fusion and breadth fusion.
The equipment identification model is a multi-view neural network based on a hybrid fusion mode and is named as a multi-view wide and deep learning model (MvWDL). By pre-sorting the 6 independent complementary feature views F ═ { v } in the above embodiment1,v2,v3,v4,v5,v6Dense embedding characterization of
Figure BDA0002612549010000074
Figure BDA0002612549010000075
Fused to the following two structures: (a) one is a neural network of deep fusion for early fusion to maximizeGeneralizing the generalization performance of the equipment identification model, and (b) enabling an extensive fusion wide fusion neural network for post-fusion to improve the interactive memory between the information of each piece of equipment and each feature view, namely how each feature view responds to the manufacturer, the type of equipment and the model of the equipment.
The output of the equipment identification model is divided into equipment information obtained by an equipment class identification classifier according to the output final conditional probability and a judgment result output by an Abnormal equipment monitor Abnormal Device detection whether the equipment information is Abnormal equipment or not.
The final conditional probability is obtained by:
Figure BDA0002612549010000071
wherein, the
Figure BDA0002612549010000072
The classification judgment probability obtained for the breadth-fused neural network,
Figure BDA0002612549010000073
the probability of classification decision obtained for a deeply fused neural network, tcIs a certain device information, i.e. manufacturer, device type or device model.
In the training stage of the MvWDL model, in order to balance the feature information training samples of different known devices in the training sample set, for a smaller number of known devices, multiple copies of the feature information training samples of the known devices may be copied and put into the training sample set for training.
According to the embodiment of the invention, the equipment identification model is constructed based on the multi-view breadth and deep learning model, so that the counterfeit DHCP server or gateway in the wireless network can be identified more accurately.
Further, the step S03 specifically includes:
step S031, if the output of the device identification model satisfies a preset inconsistency determination condition, determine that the device to be identified is an abnormal device.
From the above embodiment, it can be seen that, if it is determined that the device to be recognized is a certain known device according to the output of the device recognition model, device information of the model to be recognized is given.
And if the model to be recognized is judged not to be the known equipment according to the output of the equipment recognition model, the equipment to be recognized is considered to be the unknown benign equipment or the unknown abnormal equipment. The benign device is a normal device which is not marked in the training process of the device identification model, and the abnormal device is a fake device or a malicious device.
The device identification model sets inconsistency determination conditions for determining an abnormal device in advance, and determines that the device to be identified is an abnormal device when an output of the device identification model satisfies the inconsistency determination conditions.
As shown in fig. 2, the breadth-fused neural network of the device identification model obtains the identification result p with each feature view i according to the feature information input by each feature viewiThus, the recognition results corresponding to the feature types are obtained.
And judging the difference of the recognition results of the feature views, and if the difference is larger, determining that the equipment to be recognized is abnormal equipment.
The specific judgment method can calculate the inconsistent quantization value according to the recognition result of each characteristic view through a preset inconsistent judgment algorithm. The inconsistency decision algorithm is as follows:
Figure BDA0002612549010000081
wherein the A operation is pkReturns its corresponding index. Eta is a preset compromise parameter, which converges at 0,1]Within the range, it balances
Figure BDA0002612549010000082
Treated differently in type recognitionAnd (4) performance. Wherein,
Figure BDA0002612549010000083
and (5) the recognition probability of the characteristic views u and v output by the breadth-fused neural network for the specific information k.
And if the calculated inconsistent quantization value exceeds a preset threshold value E, judging that the equipment to be identified is abnormal equipment.
Step S032, further comparing the mDNS view with the LBN view. And if the prediction results (including the manufacturer/type of the prediction equipment and the reliability of the prediction) of the mDNS view and the LBN view are different, judging the equipment as the virtual machine.
To further illustrate the above embodiments, three specific application examples are listed below:
example 1 method for identifying virtual machine in local area network based on broadcast packet
1) And (5) training an MvWDL model.
1.1) connecting a device of known type to a local area network.
1.2) collecting broadcast/multicast data packets from the local area network accessed in 1.1) by using wireshark.
1.3) extracting the features from 1.2) and manually marking the device type.
1.4) MvWDL model training is performed through the data set in 1.3).
2) And accessing the flow collection equipment carrying the MvWDL model into a local area network containing equipment to be tested.
3) Using wireshark to collect the broadcast/multicast data packets in the local area network in 2).
4) Performing feature extraction on the data collected in the step 3).
5) And if the MAC address prefix feature of the equipment to be tested shows that the equipment manufacturer is a virtual machine manufacturer, judging that the equipment is a virtual machine. Otherwise, go to step 6).
6) Inputting the characteristic data in the step 4) into the MvWDL model trained in the step 1), and judging whether the equipment is abnormal equipment.
7) And if the equipment is judged to be abnormal 6), acquiring a prediction result output by the mDNS view in the model 6), namely the predicted equipment manufacturer and type.
8) And if the equipment is judged to be abnormal 6), acquiring a prediction result output by the LBN view in the model 6), namely the predicted equipment manufacturer and type.
9) Comparing the prediction results in 7) and 8), when the two views respectively predict to different manufacturers (the mDNS view predicts the device to be Macbook, and the LBN view predicts the device to be a computer produced by other manufacturers), the device is judged to be a virtual machine.
Example 2 method for identifying virtual machine in local area network based on broadcast packet
1) And (5) training an MvWDL model.
1.1) connecting a device of known type to a local area network.
1.2) collecting broadcast/multicast data packets from the local area network accessed in 1.1) using tcpdump.
1.3) extracting the features from 1.2) and manually marking the device type.
1.4) MvWDL model training is performed through the data set in 1.3).
2) And accessing the flow collection equipment carrying the MvWDL model into a local area network containing equipment to be tested.
3) Collecting the broadcast/multicast packets in the local area network in 2) using tcpdump.
4) Performing feature extraction on the data collected in the step 3).
5) And if the MAC address prefix feature of the equipment to be tested shows that the equipment manufacturer is a virtual machine manufacturer, judging that the equipment is a virtual machine. Otherwise, go to step 6).
6) Inputting the characteristic data in the step 4) into the MvWDL model trained in the step 1), and judging whether the equipment is abnormal equipment.
7) And if the equipment is judged to be abnormal 6), acquiring a prediction result output by the mDNS view in the model 6), namely the predicted equipment manufacturer and type.
8) And if the equipment is judged to be abnormal 6), acquiring a prediction result output by the LBN view in the model 6), namely the predicted equipment manufacturer and type.
9) Comparing the prediction results in 7) and 8), and if the two views are respectively predicted to be different types (the mDNS view predicts the equipment to be a Macbook computer, and the LBN view predicts the equipment to be an android mobile phone), judging the equipment to be a virtual machine.
Example 3 method for identifying virtual machine in local area network based on broadcast packet
1) And (5) training an MvWDL model.
1.1) connecting a device of known type to a local area network.
1.2) collecting broadcast/multicast data packets from the local area network accessed in 1.1) using tcpdump.
1.3) extracting the features from 1.2) and manually marking the device type.
1.4) MvWDL model training is performed through the data set in 1.3).
2) And accessing the flow collection equipment carrying the MvWDL model into a local area network containing equipment to be tested.
3) Collecting the broadcast/multicast packets in the local area network in 2) using tcpdump.
4) Performing feature extraction on the data collected in the step 3).
5) And if the MAC address prefix feature of the device to be tested shows that the device manufacturer is a virtual machine manufacturer (the MAC address prefix of the device is 00-05-69, which indicates that the device is a VMware virtual machine), judging that the device is a virtual machine.
The above embodiments are only intended to illustrate the technical solution of the present invention, but not to limit it, and a person skilled in the art can modify the technical solution of the present invention or substitute it with an equivalent, and the protection scope of the present invention is subject to the claims.

Claims (10)

1. A method for identifying a virtual machine in a local area network is characterized by comprising the following steps:
acquiring broadcast and multicast traffic of equipment to be identified in a local area network, and taking the traffic as the traffic to be identified;
extracting characteristic information of a data transmission protocol of traffic to be identified, and classifying the characteristic information into each characteristic type according to the corresponding relation between a preset characteristic type and each data transmission protocol;
vectorizing the characteristic information of each characteristic type, and splicing to obtain fingerprint information;
inputting the fingerprint information into a pre-trained equipment identification model, outputting prediction results of an mDNS view and an LBN view, and judging whether the equipment to be identified is abnormal equipment;
if the equipment to be identified is abnormal equipment, comparing the prediction results of the mDNS view and the LBN view, and if the two prediction results are different, judging that the equipment to be identified is a virtual machine.
2. The method of claim 1, wherein a device identification device equipped with a monitoring tool is deployed in the local area network, and the device is used to monitor traffic and collect traffic to be identified therein; the equipment identification device formats the acquired data packet of the flow to be identified and converts the data packet into a preset Json format; combining all Json elements with the same source MAC address into one Json element taking a source MAC address as a key, wherein the source MAC address corresponds to the equipment to be identified; and for the Json elements with the same content, only retaining the content of one element through duplicate removal, wherein the content in each Json element is the effective load content of the traffic to be identified.
3. The method of claim 1, wherein the data transmission protocol comprises an ARP protocol, an ICMPv6 protocol, an mDNS protocol, a DHCP protocol, an IGMP protocol, an SSDP protocol, an LLC protocol, an LLMNR protocol, a UDP protocol, an ETHERTYPE protocol.
4. The method of claim 1, wherein the feature types include a DHCP class, an mDNS class, an SSDP class, an LBN class, a UDP class, and a protseq class; the characteristic information of the DHCP class comprises characteristic information of a DHCP protocol and a DHCPv6 protocol, the characteristic information of the mDNS class comprises characteristic information of the mDNS protocol, the characteristic information of the SSDP class comprises characteristic information of the SSDP protocol, the characteristic information of the LBN class comprises characteristic information of an LLMNR protocol, a BROWSER protocol and an NBNS protocol, the characteristic information of the UDP class comprises characteristic information of a UDP protocol, and the characteristic information of the protseq class comprises a protocol sequence and a source MAC address prefix of a preset data transmission protocol.
5. The method of claim 1, wherein vectorizing the feature information for each feature type based on data features comprises: the characteristic information of DHCP, DHCPv6, SSDP, LLMNR, BROWSER and NBNS protocols has key value pair type, and vectorization is carried out according to onehot coding form; the characteristic information of the mDNS protocol has a pseudo-natural language type, and vectorization is carried out by using word2vec and LDA.
6. The method according to claim 1, wherein in extracting the feature information, for part of the feature information having no distinctive feature, the feature information is replaced with a character string set in unison to simplify subsequent data processing.
7. The method according to claim 1, wherein before inputting the fingerprint information into the device identification model, it is determined whether a device MAC address prefix included in the traffic to be identified represents a virtual machine manufacturer, and if so, it is directly determined that the device to be identified is a virtual machine.
8. The method of claim 1, wherein the device identification model is trained by vectorizing the training samples of the feature information of the known devices to obtain the fingerprint information of each known device, and then using the fingerprint information for training.
9. The method of claim 1, wherein the device identification model is a multi-view breadth and deep learning model MvWDL, fused to a depth fusion-based neural network and a breadth fusion-based neural network by using dense embedded characterization of feature views of each feature type, and constructing a multi-view neural network model based on a hybrid fusion manner based on the two neural networks to obtain the model MvWDL; the output end of the model comprises an equipment category identification classifier and an abnormal equipment monitor, the equipment category identification classifier is used for obtaining equipment information according to output final condition probability, the final condition probability is the sum of the classification judgment probability obtained by the neural network with breadth fusion and the classification judgment probability obtained by the neural network with depth fusion, and the abnormal equipment monitor is used for outputting a judgment result of whether the equipment is abnormal equipment.
10. The method according to claim 1, characterized in that if the output of the equipment identification model satisfies a preset inconsistency judgment condition, the equipment to be identified is judged to be abnormal equipment; the inconsistency determination condition is: and calculating an inconsistent quantization value by an inconsistent judgment algorithm according to an identification structure obtained by the breadth-fused neural network according to the characteristic information input by each characteristic view, and judging that the equipment to be identified is abnormal equipment if the calculated inconsistent quantization value exceeds a preset threshold value.
CN202010759077.2A 2020-07-31 2020-07-31 Method for identifying virtual machine in local area network Active CN112068926B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010759077.2A CN112068926B (en) 2020-07-31 2020-07-31 Method for identifying virtual machine in local area network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010759077.2A CN112068926B (en) 2020-07-31 2020-07-31 Method for identifying virtual machine in local area network

Publications (2)

Publication Number Publication Date
CN112068926A true CN112068926A (en) 2020-12-11
CN112068926B CN112068926B (en) 2024-08-09

Family

ID=73656400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010759077.2A Active CN112068926B (en) 2020-07-31 2020-07-31 Method for identifying virtual machine in local area network

Country Status (1)

Country Link
CN (1) CN112068926B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017092409A1 (en) * 2015-12-02 2017-06-08 华为技术有限公司 Method and device for identifying high-usage intermediate code in language virtual machine
CN110602041A (en) * 2019-08-05 2019-12-20 中国人民解放军战略支援部队信息工程大学 White list-based Internet of things equipment identification method and device and network architecture
CN110691100A (en) * 2019-10-28 2020-01-14 中国科学技术大学 Hierarchical network attack identification and unknown attack detection method based on deep learning
CN111757327A (en) * 2020-06-03 2020-10-09 湃方科技(北京)有限责任公司 Method and device for identifying counterfeit DHCP server or gateway in wireless network
CN111757365A (en) * 2020-06-03 2020-10-09 湃方科技(北京)有限责任公司 Abnormal equipment identification method and device in wireless network
CN111757378A (en) * 2020-06-03 2020-10-09 湃方科技(北京)有限责任公司 Equipment identification method and device in wireless network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017092409A1 (en) * 2015-12-02 2017-06-08 华为技术有限公司 Method and device for identifying high-usage intermediate code in language virtual machine
CN110602041A (en) * 2019-08-05 2019-12-20 中国人民解放军战略支援部队信息工程大学 White list-based Internet of things equipment identification method and device and network architecture
CN110691100A (en) * 2019-10-28 2020-01-14 中国科学技术大学 Hierarchical network attack identification and unknown attack detection method based on deep learning
CN111757327A (en) * 2020-06-03 2020-10-09 湃方科技(北京)有限责任公司 Method and device for identifying counterfeit DHCP server or gateway in wireless network
CN111757365A (en) * 2020-06-03 2020-10-09 湃方科技(北京)有限责任公司 Abnormal equipment identification method and device in wireless network
CN111757378A (en) * 2020-06-03 2020-10-09 湃方科技(北京)有限责任公司 Equipment identification method and device in wireless network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
NIKOLAI ANDRIANOV: "A Machine Learning Approach for Virtual Flow Metering and Forecasting", 《IFAC-PAPERSONLINE》, vol. 51, no. 8, 31 August 2018 (2018-08-31), pages 191 - 196 *
韩玲 等: "面向Microsoft Virtual PC的虚拟机远程检测方法", 《计算机技术与发展》, vol. 23, no. 12, 29 September 2013 (2013-09-29), pages 134 - 138 *

Also Published As

Publication number Publication date
CN112068926B (en) 2024-08-09

Similar Documents

Publication Publication Date Title
CN109063745B (en) Network equipment type identification method and system based on decision tree
CN111935170B (en) Network abnormal flow detection method, device and equipment
CN107360145B (en) Multi-node honeypot system and data analysis method thereof
US8065722B2 (en) Semantically-aware network intrusion signature generator
CN110808865B (en) Passive industrial control network topology discovery method and industrial control network security management system
CN111478920A (en) Method, device and equipment for detecting communication of hidden channel
CN110611640A (en) DNS protocol hidden channel detection method based on random forest
US20120090027A1 (en) Apparatus and method for detecting abnormal host based on session monitoring
CN103748853A (en) Method and system for classifying a protocol message in a data communication network
CN111757365A (en) Abnormal equipment identification method and device in wireless network
CN111757378B (en) Method and device for identifying equipment in wireless network
WO2024007615A1 (en) Model training method and apparatus, and related device
CN111818049B (en) Botnet flow detection method and system based on Markov model
CN113114618A (en) Internet of things equipment intrusion detection method based on traffic classification recognition
CN112565229A (en) Hidden channel detection method and device
CN106911665B (en) Method and system for identifying malicious code weak password intrusion behavior
CN109660656A (en) A kind of intelligent terminal method for identifying application program
CN111757327A (en) Method and device for identifying counterfeit DHCP server or gateway in wireless network
CN112073988A (en) Detection method for hidden camera in local area network
CN110472410B (en) Method and device for identifying data and data processing method
Chang et al. Study on os fingerprinting and nat/tethering based on dns log analysis
CN113660267A (en) Botnet detection system and method aiming at IoT environment and storage medium
CN112235242A (en) C & C channel detection method and system
CN112436969A (en) Internet of things equipment management method, system, equipment and medium
CN109922083B (en) Network protocol flow control system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant