WO2023004707A1 - 设备类型识别的方法和装置 - Google Patents

设备类型识别的方法和装置 Download PDF

Info

Publication number
WO2023004707A1
WO2023004707A1 PCT/CN2021/109352 CN2021109352W WO2023004707A1 WO 2023004707 A1 WO2023004707 A1 WO 2023004707A1 CN 2021109352 W CN2021109352 W CN 2021109352W WO 2023004707 A1 WO2023004707 A1 WO 2023004707A1
Authority
WO
WIPO (PCT)
Prior art keywords
traffic data
unknown
encryption protocol
features
device type
Prior art date
Application number
PCT/CN2021/109352
Other languages
English (en)
French (fr)
Inventor
马工速
宋杰
吴洪峰
Original Assignee
西门子股份公司
西门子(中国)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 西门子股份公司, 西门子(中国)有限公司 filed Critical 西门子股份公司
Priority to PCT/CN2021/109352 priority Critical patent/WO2023004707A1/zh
Publication of WO2023004707A1 publication Critical patent/WO2023004707A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload

Definitions

  • the present application relates to the technical field of network security, and more specifically, relates to a method and device for device type identification.
  • Embodiments of the present application provide a device type identification method and system, which can improve the accuracy of device type identification.
  • a device type identification method comprising: acquiring traffic data of an unknown device; extracting identification features from the traffic data of the unknown device, the identification features including encryption protocol features of the traffic data of the unknown device ; Determine the device type of the unknown device according to the identification feature and the device type identification model, the device type identification model is trained based on the sample features of known devices, and the sample features include the encryption protocol features of the traffic data of the known devices .
  • a device type identification device including: an acquisition unit, configured to acquire traffic data of an unknown device; a processing unit, configured to extract an identification feature from the traffic data of the unknown device, and the identification feature Including the encryption protocol features of the traffic data of the unknown device; the processing unit is further configured to determine the device type of the unknown device according to the identification feature and a device type identification model, the device type identification model is based on the known device The sample feature training is obtained, and the sample feature includes the encryption protocol feature of the traffic data of the known device.
  • a device type identification device including: a memory for storing programs; a processor for executing the programs stored in the memory, and when the programs stored in the memory are executed, the processing The device is used to perform the above method of device type identification.
  • the present application also provides a computer-readable storage medium, which stores program codes for device execution, where the program codes include instructions for executing the steps in the above method for device type identification.
  • the present application also provides a computer program product, the computer program product includes a computer program stored on a computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer , causing the computer to execute the above-mentioned device type identification method.
  • this application adds the flow data analysis of the application layer, that is, uses the encryption protocol features of the flow data of known devices to train and obtain a device type identification model with higher inference accuracy, and then extracts the unknown device
  • the encryption protocol features of the traffic data are input to the device type identification model, which further improves the accuracy of device type identification for unknown devices.
  • the feature of the encryption protocol includes: the encryption protocol of the traffic data, the length of the traffic data encrypted by the encryption protocol, and the distribution of the traffic data encrypted by the encryption protocol. At least one of the number of slices.
  • the present application can analyze various information in encryption protocol features through a device type recognition model, and can further improve the accuracy of device type recognition.
  • the device type recognition model is further trained based on sample features of multiple known devices.
  • the identification feature further includes a packet size feature and a protocol stack feature of the traffic data of the unknown device.
  • the present application in addition to analyzing the traffic data at the application layer, that is, the encryption protocol features, the present application can also analyze the traffic data at the transport layer (data packet size features) and the sum of each layer of protocols (protocol stack features), and further improve the device. Accuracy of type recognition.
  • the identification feature is further recorded in a feature database.
  • the traffic data is extracted in multiple dimensions such as data packet size features, protocol stack features, and encryption protocol features, and then the extracted features are stored or recorded in the feature database, which can improve traffic data. management efficiency.
  • the traffic data of the unknown device when acquiring the traffic data of the unknown device, the traffic data of the unknown device may be acquired actively and/or the traffic data of the unknown device may be acquired passively.
  • a detection instruction related to an encryption protocol feature of the required traffic data may be sent; and traffic data corresponding to the detection instruction may be received.
  • the flow data of the unknown device when the flow data of the unknown device is acquired passively, the flow data of the unknown device may be captured in real time or replayed through data packets.
  • replay refers to simulating the online status of a specified type of device through some technical means, and data packet replay can be understood as using "scene reproduction” to obtain the traffic of a certain device data.
  • FIG. 1 is a schematic diagram of a system architecture according to an embodiment of the present application.
  • Fig. 2 is a schematic flowchart of a method for identifying a device type according to an embodiment of the present application.
  • Fig. 3 is a schematic block diagram of an apparatus for identifying a device type according to an embodiment of the present application.
  • Fig. 4 is a schematic structural diagram of an apparatus for identifying a device type according to an embodiment of the present application.
  • a processing device comprising
  • Processing module
  • a preprocessing module 114.
  • a sending unit
  • serial numbers of the processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, rather than by the implementation order of the embodiments of the present application.
  • the implementation process constitutes no limitation.
  • FIG. 1 is a schematic diagram of a system architecture according to an embodiment of the present application.
  • an unknown device 120 is a device that is connected to the current network and has an unknown device type.
  • the device can generate traffic data through information interaction with the network.
  • the traffic data generated by the unknown device 120 may be directly transmitted to the processing device 110 , or may be first stored in the database 140 and then obtained by the processing device 110 from the database 140 .
  • the newly generated traffic data of the unknown device 120 can be captured by the processing device 110 in real time, or can be stored in the database 140 first, and then the processing device 110 can obtain the traffic data from the database 140 .
  • the processing device 110 is in communication connection with the unknown device 120 .
  • the processing device 110 may include a communication interface 112 to implement a communication connection with other devices.
  • the communication connection may be wired or wireless.
  • the processing device 110 may be an electronic device or system capable of data processing, such as a computer.
  • the processing device 110 may include a processing module 111 configured to implement device type identification of unknown devices.
  • the processing module 111 may specifically be one or more processors.
  • the processor may be any type of processor, which is not limited in this embodiment of the present application.
  • the processing device 110 may also include a storage system 113 .
  • the storage system 113 may be used to store data and instructions, for example, computer-executable instructions for implementing the technical solutions of the embodiments of the present application.
  • the processing device 110 may call data, instructions, etc. in the storage system 113 , and may also store the data, instructions, etc. in the storage system 113 .
  • the storage system 113 may specifically be one or more memories.
  • the storage may be any type of storage, which is not limited in this embodiment of the present application.
  • the storage system 113 may be installed inside the processing device 110 or outside the processing device 110 . In the case that the storage system 113 is disposed outside the processing device 110 , the processing device 110 can access the storage system 113 through the data interface.
  • the processing device 110 may also include other general-purpose devices, for example, an output device, configured to output device type identification results.
  • the processing device 110 also includes a preprocessing module 114, configured to preprocess the acquired traffic data. For example, extract relevant features of traffic data. For example, feature extraction is performed using the following technical solutions of the embodiments of the present application.
  • a trained model 115 is also configured in the processing device 110 .
  • the processing module 111 can use the model 115 to perform corresponding processing.
  • the model 115 can be trained through the technical solutions of the embodiments of this application.
  • model 115 may be a model for identifying a device type of an unknown device.
  • the training device 130 can train a device type recognition model based on the training data in the sample database 150 .
  • the processing module 111 can use the device type identification model to obtain the device type of the unknown device.
  • feature extraction can be performed through the preprocessing module 114 first to obtain multiple features; then input them into the model 115 to obtain the device type of the unknown device.
  • FIG. 1 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the training device 130 trains the model 115, which may be a model based on machine learning, for example, a model based on a neural network, where the neural network may be a convolutional neural network (convolutional neural networks) , CNN), recurrent neural network (recurrent neural network, RNN), deep convolutional neural network (deep convolutional neural networks, DCNN) and so on.
  • the neural network may be a convolutional neural network (convolutional neural networks) , CNN), recurrent neural network (recurrent neural network, RNN), deep convolutional neural network (deep convolutional neural networks, DCNN) and so on.
  • Fig. 2 shows a schematic diagram of a process of identifying a device type according to an embodiment of the present application. Specifically, the following steps 210-230 are included.
  • Communication information of network devices usually includes information such as time, source and destination of information transmission, network communication protocol used, data packet length, and packet load. Each type of information can reflect certain characteristics for network devices. The acquisition of such information It is necessary to analyze the traffic data of the device over a period of time. Therefore, before obtaining the characteristics related to the device type, it is first necessary to obtain the traffic data of the unknown device.
  • the flow data may be acquired actively or passively.
  • obtaining traffic data of unknown devices through active means includes: sending detection instructions or sniffing packets related to the required traffic data, and then receiving traffic data in response to the detection instructions or sniffing packets.
  • Obtaining the flow data of the unknown device in a passive manner may include: capturing the flow data of the unknown device in real time or obtaining the flow data of the unknown device through data packet replay.
  • traffic data of unknown devices is mainly obtained through a combination of active and passive methods.
  • WINCAP to grab the data packet directly from the physical interface, save the data packet as a CAP file format
  • WINCAP to read the data packet from the offline heap, that is, use the function pcap_open_offline() of WINCAP to open the stored file.
  • This application is mainly concerned with the encryption protocol features of the traffic data, that is, when it is found that the file includes the content of the encryption protocol features, the acquisition step is completed, and when it is not included, send a request to the unknown device to acquire the traffic data including the encryption protocol features Or detect commands and then receive response data from unknown devices.
  • the identification feature refers to the feature that can reflect the device type of the unknown device, and may be other names, which are not limited in the present application.
  • the encryption protocol characteristics of the traffic data of the unknown device include at least one of the encryption protocol of the traffic data, the length of the traffic data encrypted using the encryption protocol, and the number of fragments of the traffic data encrypted using the encryption protocol .
  • the device type of the unknown device can be inferred by extracting the encryption protocol features and various information in the encryption protocol features.
  • this application can also extract and analyze the traffic data on the application layer, that is, the content related to encryption protocols, which can improve the accuracy of device type inference for unknown devices.
  • the identification feature also includes the packet size feature and the protocol stack feature of the traffic data of the unknown device.
  • identification features may also include other features, which is not limited in this application.
  • the traffic data is decomposed into multiple dimensions: data packet size features, protocol stack features, and encryption protocol features, and then the traffic data is recorded in the feature database in the form of multiple dimensions for subsequent use.
  • using multiple features to identify the device type is convenient to use, and can also improve the management efficiency of the flow data of multiple devices.
  • traffic data can also be decomposed into general features, specific features and attribute features, etc.
  • the following steps S1-S3 may be included.
  • flow data of known devices may be acquired through a combination of passive monitoring and/or active methods.
  • the passive method includes real-time capture or data packet replay to obtain the traffic data of the known device
  • the active method includes constructing detection instructions related to the required traffic data or sending sniffing packets to generate traffic data.
  • WINCAP For example, during active monitoring, send a traffic data request to a known device, and then use WINCAP to capture the response data packet of the known device from the physical interface to obtain the traffic data of the known device.
  • the traffic data is extracted in multiple dimensions such as data packet size features, protocol stack features, encryption protocol features, etc., and then the extracted features are stored or recorded in the feature database to improve Efficiency in managing traffic data for known devices.
  • the TCP/IP protocol stack feature extracted from the flow data includes the initialization window value in the TCP packet header, the initial sequence number (initial sequence number, ISN), the timestamp field (timestamp), and the maximum packet length (maximum segment size, MSS), window scaling factor (window scaling, WS), selective acknowledgment flag (selective acknowledgment permitted, SACK permitted), response default value (ACK number), connection establishment flag (synchronous, SYN) , close connection flag (finish, FIN), response flag (ACK), data transmission flag (push, PSH), emergency flag (urgent, URG), connection reset flag (reset, RST); IP Version number (version), header length (internet header length, IHL), time to live (time to live, TTL), protocol field value (protocol), option value (option) and protocol port number (port) in the message header wait.
  • the traffic data may also include basic information of the device, such as IP address, MAC address, device manufacturer, device model, device open port, etc., which is not limited in this application.
  • S2 Extract sample features from the traffic data of the known data, where the sample features include encryption protocol features of the traffic data of the known device.
  • the encryption protocol feature of the traffic data of the unknown device includes at least one of the encryption protocol of the traffic data of the known device, the length of the traffic data encrypted using the encryption protocol, and the number of fragments of the traffic data encrypted using the encryption protocol.
  • sample features also include data packet size features and protocol stack features of traffic data of known devices.
  • sample features may also include other features, which are not limited in the present application.
  • the machine learning algorithm can be a classification algorithm, wherein commonly used classification algorithms include: decision tree classification, naive Bayesian classification algorithm (native Bayesian classifier, NBC), neural network method, k-nearest neighbor method ( k-nearest neighbor, KNN), fuzzy classification method, etc., the comparison of this application is not limited.
  • commonly used classification algorithms include: decision tree classification, naive Bayesian classification algorithm (native Bayesian classifier, NBC), neural network method, k-nearest neighbor method ( k-nearest neighbor, KNN), fuzzy classification method, etc.
  • the analysis of sample features includes at least the following three parts.
  • the size of the uplink data packet is different from that of the downlink data packet.
  • the uplink traffic of the server is often larger than the downlink traffic
  • the downlink traffic of the terminal requesting the service is often larger than the uplink traffic.
  • Devices can thus be classified into two classes, server and requester, at least by packet size characteristics.
  • Protocol stacks Different devices use different protocol stacks when communicating, and the characteristics of each protocol stack are also different. For example, different programmable logic controller (programmable logic controller, PLC) devices communicate using different protocol stacks. Therefore, devices can be further classified by protocol stack features.
  • PLC programmable logic controller
  • an encryption protocol when used for communication, different encryption protocols will support different encryption algorithms according to different operating systems and browsers. That is to say, different devices will be equipped with different operating systems, and when browsers on different operating systems use the same protocol to communicate with other devices, different results will be obtained.
  • the device type of the device can be judged based on the result. For example, when the Chrome browser on the Windows operating system and the Chrome browser on the MAC operating system both use the transport layer security (transport layer security, TLS) protocol to communicate with other devices, the results obtained are different.
  • transport layer security transport layer security
  • the accessed device will also respond according to the supported encryption protocol, and different devices respond differently. That is to say, when devices equipped with different operating systems face a certain request, the response results are different. For example, devices equipped with Linux and Windows operating systems respond differently to the same request.
  • the system type carried by the device can be determined according to the communication result of the device using the encryption protocol to communicate with other devices or the response result of the accessed device, so as to obtain the device type of the device.
  • the traffic data encrypted using the encryption protocol has different lengths. That is to say, when using the same encryption protocol for communication, devices with different operating systems have different lengths of encrypted data.
  • the system type carried by the device can be determined according to the length of data encrypted using the encryption protocol or the number of data fragments, so as to obtain the device type of the device.
  • the equipment type identification model has been formed.
  • the verification detection is performed after the device type identification model is trained. For example, input the traffic data of a new known device to check whether the output result of the model is correct, if not, record the sample characteristics of the new known device in the feature database data, and further train and verify the model, Thereby improving the accuracy of the device type recognition model.
  • Determining the device type of the unknown device according to the identification features and the device type identification model in the above step 230 can be understood as using the training-formed device type identification model to perform algorithmic matching on the identification features of the unknown device, thereby classifying the unknown device.
  • the essence of the classification process is the matching process of recognition features and classification results. That is, the data packet size feature, protocol stack feature, and encryption protocol feature in the unknown device identification feature are analyzed in sequence, and then the device type matching the unknown device is obtained.
  • the specific analysis process please refer to the analysis process of sample characteristics in the above model training process.
  • Fig. 3 shows a schematic block diagram of an apparatus 300 for identifying a device type according to an embodiment of the present application.
  • the apparatus 300 may execute the device type identification method in the above embodiment of the present application, for example, the apparatus 300 may be the aforementioned processing apparatus 110 .
  • the device includes:
  • An acquisition module 310 configured to acquire traffic data of unknown devices
  • the processing module 320 is configured to extract identification features from the traffic data of the unknown device, where the identification features include encryption protocol features of the traffic data of the unknown device; and to determine according to the identification features and the device type identification model
  • the device type of the unknown device, the device type identification model is obtained by training based on sample features of known devices, and the sample features include encryption protocol features of traffic data of the known devices.
  • the feature of the encryption protocol includes: the encryption protocol of the traffic data, the length of the traffic data encrypted by the encryption protocol, and the length of the traffic data encrypted by the encryption protocol. At least one of the number of shards of traffic data.
  • the processing unit 320 is further configured to train the device type recognition model based on sample features of multiple known devices.
  • the identification feature further includes a packet size feature and a protocol stack feature of the traffic data of the unknown device.
  • the processing unit 320 is further configured to record the identification feature in a feature database.
  • the acquiring unit 310 is specifically configured to actively acquire the traffic data of the unknown device and/or passively acquire the traffic data of the unknown device.
  • the device further includes a sending unit 330, configured to send detection instructions related to encryption protocol features of required traffic data; the acquiring unit 310 is specifically used to The flow data is received in response to the detection instruction.
  • a sending unit 330 configured to send detection instructions related to encryption protocol features of required traffic data; the acquiring unit 310 is specifically used to The flow data is received in response to the detection instruction.
  • the acquiring unit 310 is specifically configured to capture the traffic data of the unknown device in real time or replay the traffic data of the unknown device through data packets.
  • FIG. 4 is a schematic diagram of a hardware structure of an apparatus for identifying a device type according to an embodiment of the present application.
  • the device type identification apparatus 400 shown in FIG. 4 includes a memory 401 , a processor 402 , a communication interface 403 and a bus 404 .
  • the memory 401 , the processor 402 , and the communication interface 403 are connected to each other through the bus 404 .
  • the memory 401 may be a read-only memory (read-only memory, ROM), a static storage device and a random access memory (random access memory, RAM).
  • the memory 401 may store a program, and when the program stored in the memory 401 is executed by the processor 402, the processor 402 and the communication interface 403 are used to execute each step of the method for device type identification in the embodiment of the present application.
  • the processor 402 can be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application specific integrated circuit (application specific integrated circuit, ASIC), a graphics processing unit (graphics processing unit, GPU) or one or more
  • the integrated circuit is used to execute related programs to realize the functions required by the units in the device type identification device of the embodiment of the present application, or to execute the device type identification method of the embodiment of the present application.
  • the processor 402 may also be an integrated circuit chip with signal processing capabilities. During implementation, each step of the device type identification method in the embodiment of the present application may be implemented by an integrated logic circuit of hardware in the processor 402 or instructions in the form of software.
  • processor 402 can also be general-purpose processor, digital signal processor (digital signal processing, DSP), ASIC, off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic device, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC off-the-shelf programmable gate array
  • FPGA field programmable gate array
  • Various methods, steps, and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the steps of the methods disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
  • the storage medium is located in the memory 401, and the processor 402 reads the information in the memory 401, and combines its hardware to complete the functions required by the units included in the device type identification device of the embodiment of the present application, or execute the device of the embodiment of the present application method of type identification.
  • the communication interface 403 implements communication between the apparatus 400 and other devices or communication networks by using a transceiver device such as but not limited to a transceiver. For example, traffic data of an unknown device may be obtained through the communication interface 403 .
  • the bus 404 may include a pathway for transferring information between various components of the device 400 (eg, memory 401 , processor 402 , communication interface 403 ).
  • the device 400 may also include other devices necessary for normal operation.
  • the apparatus 400 may also include hardware devices for implementing other additional functions.
  • the device 400 may also only include components necessary to implement the embodiment of the present application, and does not necessarily include all the components shown in FIG. 4 .
  • the embodiment of the present application also provides a computer-readable storage medium, which stores program code for device execution, where the program code includes instructions for executing the steps in the above method for device type identification.
  • the embodiment of the present application also provides a computer program product, the computer program product includes a computer program stored on a computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by the computer, the The computer executes the above method for device type identification.
  • the above-mentioned computer-readable storage medium may be a transitory computer-readable storage medium, or a non-transitory computer-readable storage medium.
  • the disclosed devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the aspects, implementations, implementations or features of the described embodiments can be used alone or in any combination. Aspects of the described embodiments can be implemented by software, hardware or a combination of hardware and software.
  • the described embodiments may also be embodied by a computer-readable medium storing computer-readable code comprising instructions executable by at least one computing device.
  • the computer readable medium can be associated with any data storage device that can store data that can be read by a computer system. Exemplary computer-readable media may include read-only memory, random-access memory, Compact Disc Read-Only Memory (CD-ROM), hard disk drive (Hard Disk Drive, HDD), digital Video disc (Digital Video Disc, DVD), magnetic tape and optical data storage device, etc.
  • the computer readable medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请实施例提供了一种设备类型识别的方法。该方法包括:获取未知设备的流量数据;从所述未知设备的流量数据中提取识别特征,所述识别特征包括所述未知设备的流量数据的加密协议特征;根据所述识别特征和设备类型识别模型确定所述未知设备的设备类型,所述设备类型识别模型是基于已知设备的样本特征训练得到的,所述样本特征包括所述已知设备的流量数据的加密协议特征。本申请实施例的设备类型识别的方法和装置,能够提升设备类型识别的准确性。

Description

设备类型识别的方法和装置 技术领域
本申请涉及网络安全技术领域,并且更具体地,涉及一种设备类型识别的方法和装置。
背景技术
当前网络环境中,存在着大量各种类型的设备,比如现场设备、控制设备和连接到其他硬件的设备,给设备的安全管理带来了挑战。网络中设备类型的识别是监控设备安全的重要组成部分,确定设备类型后,可以使用不同的策略监控和保护不同的设备,提高效率,节省人力资源,进而清晰地掌握网络环境的全貌,保证整个网络环境的稳定性。
但现有的设备类型识别方法由于一般倾向于在网络层分析流量数据,往往存在识别精度不高的问题。因此,如何提高设备类型识别的准确性是亟需解决的技术问题。
发明内容
本申请实施例提供了一种设备类型识别的方法和系统,能够提高设备类型识别的准确性。
第一方面,提供了一种设备类型识别方法,该方法包括:获取未知设备的流量数据;从所述未知设备的流量数据中提取识别特征,该识别特征包括未知设备的流量数据的加密协议特征;根据该识别特征和设备类型识别模型确定所述未知设备的设备类型,该设备类型识别模型是基于已知设备的样本特征训练得到的,该样本特征包括已知设备的流量数据的加密协议特征。
第二方面,提供了一种设备类型识别的装置,包括:获取单元,用于获取未知设备的流量数据;处理单元,用于从所述未知设备的流量数据中提取识别特征,所述识别特征包括所述未知设备的流量数据的加密协议特征;所述处理单元还用于根据所述识别特征和设备类型识别模型确定所述未知设 备的设备类型,所述设备类型识别模型是基于已知设备的样本特征训练得到的,所述样本特征包括所述已知设备的流量数据的加密协议特征。
第三方面,提供了一种设备类型识别的装置,包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行上述设备类型识别的方法。
第四方面,本申请还提供了一种计算机可读存储介质,存储用于设备执行的程序代码,所述程序代码包括用于执行上述设备类型识别的方法中的步骤的指令。
第五方面,本申请还提供了一种计算机程序产品,所述计算机程序产品包括存储在计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述的设备类型识别的方法。
通过上述技术方案,本申请通过加入对应用层的流量数据分析,即利用已知设备的流量数据的加密协议特征,训练得到推理准确度更高的设备类型识别模型,然后将提取到的未知设备的流量数据的加密协议特征输入至该设备类型识别模型,进一步提高对未知设备的设备类型识别的准确性。
在一些可能的实现方式中,所述加密协议特征包括:所述流量数据的加密协议、使用所述加密协议加密的所述流量数据的长度和使用所述加密协议加密的所述流量数据的分片数量中至少一项。
通过该技术方案,本申请可以通过设备类型识别模型分析加密协议特征中的多种信息,能够进一步提高设备类型识别的准确性。
在一些可能的实现方式中,还基于多个已知设备的样本特征训练所述设备类型识别模型。
通过上述分析多个已知设备的样本特征,不仅可以使得样本数据库更丰富化,还可以使训练得到的设备类型识别模型的推理准确性更高。
在一些可能的实现方式中,识别特征还包括未知设备的流量数据的数据包大小特征、协议栈特征。
通过该实施方式,本申请除了在应用层分析流量数据,即加密协议特征,还可以分析在传输层的流量数据(数据包大小特征)以及各层协议的总和(协议栈特征),进一步提高设备类型识别的准确性。
在一些可能的实现方式中,在根据所述识别特征和设备类型识别模型确定未知设备的设备类型之前,还将所述识别特征记录在特征数据库中。
在获取到设备的流量数据后,分别以数据包大小特征、协议栈特征、加密协议特征等多个维度对流量数据进行提取,然后将提取的特征存储或记录在特征数据库中,可以提高流量数据的管理效率。
在一些可能的实现方式中,在获取未知设备的流量数据时,可通过主动方式获取未知设备的流量数据和/或通过被动方式获取未知设备的流量数据。
采用上述主动和被动结合获取未知设备的流量数据的方式,使得获取流量数据的途径更加灵活,同时可以降低只采用主动方式对当前网络环境造成的影响或可以减少只采用被动方式造成获得的流量数据不够详细的问题。
在一些可能的实现方式中,在通过主动方式获取未知设备的流量数据时,可发送所需流量数据的加密协议特征有关的检测指令;接收响应于该检测指令的流量数据。
在一些可能的实现方式中,在通过被动方式获取未知设备的流量数据时,可实时捕捉未知设备的流量数据或通过数据包重放未知设备的流量数据。
需要说明的是,在本申请中,“重放”是指通过某种技术手段模拟指定类型的设备的在线状态,数据包重放可以理解为利用“情景再现”的方式获取某个设备的流量数据。
附图说明
图1是本申请实施例的一种系统架构的示意图。
图2是本申请实施例的设备类型识别方法的示意性流程图。
图3是本申请实施例的设备类型识别的装置的示意性框图。
图4是本申请实施例的设备类型识别的装置的结构示意图。
附图标记列表:
110,处理装置;
111,处理模块;
112,通信接口;
113,存储系统;
114,预处理模块;
115,模型;
120,未知设备;
130,训练设备;
140,数据库;
150,样本数据库;
210,获取未知设备的流量数据;
220,从所述未知设备的流量数据中提取识别特征,所述识别特征包括所述未知设备的流量数据的加密协议特征;
230,根据所述识别特征和设备类型识别模型确定所述未知设备的设备类型,所述设备类型识别模型是基于已知设备的样本特征训练得到的,所述样本特征包括所述已知设备的流量数据的加密协议特征;
300,设备类型识别的装置;
310,获取单元;
320,处理单元;
330,发送单元;
400,设备类型识别的装置;
401,存储器;
402,处理器;
403,通信接口;
404,总线。
具体实施方式
下面结合附图,对本申请实施例中的技术方案进行描述。应理解,本说明书中的具体的例子只是为了帮助本领域技术人员更好地理解本申请实施例,而非限制本申请实施例的范围。
应理解,在本申请的各种实施例中,各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
还应理解,本说明书中描述的各种实施方式,既可以单独实施,也可以组合实施,本申请实施例对此不作限定。
除非另有说明,本申请实施例所使用的所有技术和科学术语与本申请的技术领域的技术人员通常理解的含义相同。本申请中所使用的术语只是为了描述具体的实施例的目的,不是旨在限制本申请的范围。
图1是本申请实施例的一种系统架构的示意图。
在图1所示的系统架构中,未知设备120为接入当前网络且设备类型未知的设备,该设备可以通过与网络的信息交互产生流量数据。未知设备120可以有多个。未知设备120产生的流量数据可以直接传输给处理装置110,也可以先存储到数据库140中,再由处理装置110从数据库140中获取。以被动获取未知设备的流量数据为例,未知设备120新产生的流量数据可以实时被处理装置110捕捉,也可以先存储于数据库140中,再由处理装置110从数据库140中获取该流量数据。
处理装置110与未知设备120通信连接。具体地,处理装置110可以包括通信接口112,以实现与其他设备的通信连接。该通信连接可以是有线方式,也可以是无线方式。
处理装置110可以是具有数据处理能力的电子设备或系统,例如计算机。处理装置110可以包括处理模块111,用于实现未知设备的设备类型识别。处理模块111具体可以为一个或多个处理器。处理器可以为任意种类的处理器,本申请实施例对此不作限定。
处理装置110还可以包括存储系统113。存储系统113可用于存储数据和指令,例如,实现本申请实施例的技术方案的计算机可执行指令。处理装置110可以调用存储系统113中的数据、指令等,也可以将数据、指令等存入存储系统113中。存储系统113具体可以为一个或多个存储器。该存储器可以为任意种类的存储器,本申请实施例对此也不作限定。
存储系统113可以设置于处理装置110内,也可以设置于处理装置110外。在存储系统113设置于处理装置110外的情况下,处理装置110可通过数据接口实现对存储系统113的访问。
处理装置110还可以包括其他通用的设备,例如,输出设备,用于输出设备类型识别结果。
处理装置110还包括预处理模块114,用于对获取的流量数据进行预处理。例如,提取流量数据的相关特征。例如,采用下述的本申请实施例的技术方案进行特征提取。
处理装置110中还配置训练后的模型115。在这种情况下,处理模块111可以采用模型115进行相应的处理。其中,在本申请中该模型115可以通过本申请实施例的技术方案进行训练。
例如,模型115可以为用于识别未知设备的设备类型的模型。训练设备130可以基于样本数据库150中的训练数据训练得到设备类型识别模型。这样,处理模块111可以采用该设备类型识别模型得到未知设备的设备类型。
对于获取未知设备的流量数据,可先通过预处理模块114进行特征提取,得到多个特征;再将其输入模型115,得到未知设备的设备类型。
应理解,图1仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制。
在一些可能的实现方式中,训练设备130训练得到模型115,可以是基于机器学习得到的模型,例如,可以是基于神经网络搭建的模型,这里的神经网络可以是卷积神经网络(convolutional neural networks,CNN)、循环神经网络(recurrent neural network,RNN)、深度卷积神经网络(deep convolutional neural networks,DCNN)等等。
下面结合图2,对本申请实施例的设备类型识别方法的主要过程进行介绍。
图2示出了本申请实施例的设备类型识别的过程的示意图。具体包括以下步骤210-230。
210,获取未知设备的流量数据。
网络设备的通信信息通常包括时间,信息传递源与目的,使用的网络通信协议,数据包长以及包负载等信息,每种信息对于网络设备而言都可以体现一定的特征,这类信息的获取需要分析设备在一段时间内的流量数据。因此,得到与设备类型相关的特征前首先需要获取未知设备的流量数据。
作为一种可选的实施方式,流量数据可以通过主动的方式获取或通过被动的方式获取。一般地,通过主动方式获取未知设备的流量数据包括:发送所需流量数据有关的检测指令或嗅探包,然后接收响应于该检测指令或嗅探 包的流量数据。通过被动方式获取未知设备的流量数据可以包括:实时捕捉或通过数据包重放获取未知设备的流量数据。
本申请实施例主要通过主动与被动结合的方式来获取未知设备的流量数据。示例性的,首先用WINCAP直接从物理接口抓取数据包,数据包保存为CAP文件格式,然后使用WINCAP从脱机堆中读取数据包,即使用WINCAP的函数pcap_open_offline()打开存储的文件。本申请主要关心该流量数据的加密协议特征,即当发现在该文件中包括有关加密协议特征的内容则获取步骤完成,当不包括时,向未知设备发送获取包括加密协议特征的流量数据的请求或检测指令然后接收未知设备的响应数据。
采用上述主动和被动结合获取未知设备的流量数据的方式,使得获取流量数据的途径更加灵活,同时可以降低只采用主动方式时对当前网络环境造成的影响或可以减少只采用被动方式造成获得的流量数据不够详细的问题。
220,从未知设备的流量数据中提取识别特征,该识别特征中包括未知设备的流量数据的加密协议特征。
本申请实施例中,识别特征是指能够反映未知设备的设备类型的特征,也可以为其他名称,本申请对此不作限定。
本申请实施例中,未知设备的流量数据的加密协议特征包括该流量数据的加密协议、使用该加密协议加密的流量数据的长度和使用该加密协议加密的流量数据的分片数量中至少一项。
其中,当使用同一加密协议对数据进行加密时,不同的设备使用的加密算法不同;不同的设备加密数据的长度不同;不同的设备对加密后的数据进行分片时,分片数量不同,因此可以通过提取加密协议特征以及加密协议特征中的各种信息对未知设备的设备类型进行推理。
针对当前一般仅在网络层分析流量数据的情况,本申请对应用层上的流量数据,即有关加密协议的内容,也可以进行提取分析,可以提高对未知设备的设备类型推断的准确性。
在本申请的实施例中,该识别特征中还包括未知设备的流量数据的数据包大小特征和协议栈特征。
需要说明的是,识别特征中还可以包括其他特征,本申请对此不作限定。
作为一种可选的实施方式,将流量数据分解为多个维度:数据包大小特 征、协议栈特征和加密协议特征,然后以多种维度的形式将流量数据记录在特征数据库中,以在后续步骤中利用多种特征对设备类型进行识别时方便取用,也能够提高对多个设备的流量数据的管理效率。
需要说明的是,数据包大小特征、协议栈特征和加密协议特征等都是人为主观划分的特征,对上述实施方式不构成限定。例如,还可以将流量数据分解为通用特征、特定特征和属性特征等。
230,根据识别特征和设备类型识别模型确定未知设备的设备类型,该设备类型识别模型是基于已知设备的样本特征训练得到的,样本特征包括已知设备的流量数据的加密协议特征。
在一种可选的实施方式中,训练设备类型识别模型时,可以包括以下步骤S1-S3。
S1:获取已知设备的流量数据。
作为一种可选的实施方式,可以通过被动监控方式和/或主动方式结合获取已知设备的流量数据。其中,被动方式包括实时捕捉或数据包重放获取该已知设备的流量数据,主动方式包括构建所需流量数据有关的检测指令或发送嗅探包生成流量数据。
例如,主动监测时向已知设备发送流量数据请求,然后使用WINCAP从物理接口抓取已知设备响应的数据包,得到已知设备的流量数据。
需要说明的是,对于已知设备要尽可能多收集网络设备的种类和每个种类的数目,以提高识别的准确率和增加识别的种类。
在获取到已知设备的流量数据后,分别以数据包大小特征、协议栈特征、加密协议特征等多个维度对流量数据进行提取,然后将提取的特征存储或记录在特征数据库中,以提高已知设备的流量数据的管理效率。
示例性的,从流量数据中提取的TCP/IP协议栈特征,包括TCP报文头中的初始化窗口值、初始序列号(initial sequence number,ISN)、时间戳字段(timestamp)、最大报文长度(maximum segment size,MSS)、窗口扩大因子(window scaling,WS)、选择性确认标志位(selective acknowledgment permitted,SACK permitted)、响应缺省值(ACK number)、建立连接标志位(synchronous,SYN)、关闭连接标志位(finish,FIN)、响应标志位(ACK)、有数据传输标志位(push,PSH),紧急标志位(urgent,URG),连接重置 标志位(reset,RST);IP报文头中的版本号(version)、首部长度(internet header length,IHL)、存活时间(time to live,TTL)、协议字段值(protocol),选项值(option)和协议端口号(port)等。
可选地,该流量数据还可以包括设备的基础信息,例如,IP地址、MAC地址、设备厂商、设备型号、设备开放端口等,本申请对此不作限定。
S2:从已知数据的流量数据中提取样本特征,样本特征包括已知设备的流量数据的加密协议特征。
其中,未知设备的流量数据的加密协议特征包括已知设备的流量数据的加密协议、使用该加密协议加密的流量数据的长度和使用该加密协议加密的流量数据的分片数量中至少一项。
在本申请的实施例中,该样本特征中还包括已知设备的流量数据的数据包大小特征和协议栈特征。当然,样本特征中还可以包括其他特征,本申请对此不作限定。
作为一种可选的实施方式,将已知设备的流量数据分解为多个维度:数据包大小特征、协议栈特征和加密协议特征,然后以多种维度的形式将该流量数据记录在特征数据库中,从而提高对多种设备的流量数据的管理效率。
S3:根据已知设备的样本特征和对应的设备类型训练设备类型识别模型。即,将样本特征(判断依据)通过机器学习算法得出能够输出该样本特征对应的设备类型(输出结果)的模型。
可选地,该机器学习算法可以为分类算法,其中常用的分类算法包括:决策树分类法,朴素的贝叶斯分类算法(native Bayesian classifier,NBC)、神经网络法、k-最近邻法(k-nearest neighbor,KNN)、模糊分类法等等,本申请对比不作限定。
具体地,在本申请实施例的模型训练过程中,对样本特征的分析至少包括以下三部分。
(1)数据包大小特征分析
对于不同的设备,上行数据包的大小与下行数据包的大小不同。例如,服务器的上行流量往往大于下行流量的大小,请求服务的终端的下行流量往往大于上行流量的大小。因此至少可以通过数据包大小特征将设备分为服务方和请求方两类。
(2)协议栈特征分析
不同的设备在通信时使用不同的协议栈,每个协议栈的特性也不同。例如,不同的可编程逻辑控制器(programmable logic controller,PLC)设备使用不同的协议栈进行通信。因此通过协议栈特征可以进一步对设备进行分类。
(3)加密协议特征分析
作为一种可选的实施方式,使用加密协议进行通信时,不同的加密协议会根据不同的操作系统和浏览器支持不同的加密算法。也就是说,不同的设备上会搭载不同的操作系统,当不同操作系统上的浏览器利用同一协议与其他设备进行通信时,得到的结果不同。可以根据该结果判断设备的设备类型。例如,Windows操作系统上的Chrome浏览器和MAC操作系统上的Chrome浏览器均利用安全传输层(transport layer security,TLS)协议与其它设备通信时,得到的结果不同。
另外,被访问的设备也会根据所支持的加密协议进行响应,不同的设备响应结果不同。也就是说,当搭载不同的操作系统的设备面对某一请求时,响应结果不同。例如,搭载Linux与Windows操作系统的设备对同一请求的响应结果不同。
本申请的实施例中,可以根据设备利用加密协议与其他设备通信的通信结果或被访问设备的响应结果确定设备搭载的系统类型,从而得出设备的设备类型。
作为另一种可选的实施方式,对于不同的设备,使用加密协议加密的流量数据的长度不同。也就是说,使用同一加密协议进行通信时,搭载不同操作系统的设备加密数据的长度不同。
作为另一种可选的实施方式,对于不同的设备,对使用加密协议加密的流量数据进行数据分片时,不同操作系统的分片数量也不同。也就是说,使用同一加密协议进行通信时,搭载不同操作系统的设备加密数据的数据分片的数量不同。
本申请的实施例中,可以根据使用加密协议加密的数据长度或数据分片的数量确定设备搭载的系统类型,从而得出设备的设备类型。
通过上述对已知设备的流量数据的分析分类,设备类型识别模型已经形 成。
作为一种可选的实施方式,当设备类型识别模型训练完后进行验证探测。例如,输入新的已知设备的流量数据检验该模型输出的结果是否正确,若不正确,将该新的已知设备的样本特征记录在特征库数据内,并对模型进行进一步训练及验证,从而提高设备类型识别模型的准确性。
上述230步骤中根据识别特征和设备类型识别模型确定未知设备的设备类型可以理解为利用训练形成的设备类型识别模型对未知设备的识别特征进行算法匹配,从而对未知设备进行分类。分类的过程实质是识别特征与分类结果的匹配过程。即按依次对未知设备识别特征中的数据包大小特征、协议栈特征、加密协议特征进行分析,然后得出与未知设备的匹配的设备类型。具体分析过程可参考上述模型训练过程中,对样本特征的分析过程。
上文详细地描述了本申请实施例的方法实施例,下面描述本申请实施例的装置实施例,装置实施例与方法实施例相互对应,因此未详细描述的部分可参见前面方法实施例,装置可以实现上述方法中任意可能实现的方式。
图3示出了本申请一个实施例的设备类型识别的装置300的示意性框图。该装置300可以执行上述本申请实施例的设备类型识别的方法,例如,该装置300可以为前述处理装置110。
如图3所示,该装置包括:
获取模块310,用于获取未知设备的流量数据;
处理模块320,用于从所述未知设备的流量数据中提取识别特征,所述识别特征包括所述未知设备的流量数据的加密协议特征;还用于根据所述识别特征和设备类型识别模型确定所述未知设备的设备类型,所述设备类型识别模型是基于已知设备的样本特征训练得到的,所述样本特征包括所述已知设备的流量数据的加密协议特征。
可选地,在本申请一个实施例中,所述加密协议特征包括:所述流量数据的加密协议、使用所述加密协议加密的所述流量数据的长度和使用所述加密协议加密的所述流量数据的分片数量中至少一项。
可选地,在本申请一个实施例中,所述处理单元320还用于基于多个已知设备的样本特征训练所述设备类型识别模型。
可选地,在本申请一个实施例中,所述识别特征还包括所述未知设备的 流量数据的数据包大小特征、协议栈特征。
可选地,在本申请一个实施例中,所述处理单元320还用于将所述识别特征记录在特征数据库中。
可选地,在本申请一个实施例中,所述获取单元310具体用于通过主动方式获取所述未知设备的流量数据和/或通过被动方式获取所述未知设备的流量数据。
可选地,在本申请一个实施例中,所述装置还包括发送单元330,所述发送单元330用于发送所需流量数据的加密协议特征有关的检测指令;所述获取单元310具体用于接收响应于所述检测指令的所述流量数据。
可选地,在本申请一个实施例中,所述获取单元310具体用于实时捕捉所述未知设备的流量数据或通过数据包重放所述未知设备的流量数据。
图4是本申请实施例的设备类型识别的装置的硬件结构示意图。图4所示的设备类型识别的装置400包括存储器401、处理器402、通信接口403以及总线404。其中,存储器401、处理器402、通信接口403通过总线404实现彼此之间的通信连接。
存储器401可以是只读存储器(read-only memory,ROM),静态存储设备和随机存取存储器(random access memory,RAM)。存储器401可以存储程序,当存储器401中存储的程序被处理器402执行时,处理器402和通信接口403用于执行本申请实施例的设备类型识别的方法的各个步骤。
处理器402可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的设备类型识别的装置中的单元所需执行的功能,或者执行本申请实施例的设备类型识别的方法。
处理器402还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请实施例的设备类型识别的方法的各个步骤可以通过处理器402中的硬件的集成逻辑电路或者软件形式的指令完成。
上述处理器402还可以是通用处理器、数字信号处理器(digital signal processing,DSP)、ASIC、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件 组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器401,处理器402读取存储器401中的信息,结合其硬件完成本申请实施例的设备类型识别的装置中包括的单元所需执行的功能,或者执行本申请实施例的设备类型识别的方法。
通信接口403使用例如但不限于收发器一类的收发装置,来实现装置400与其他设备或通信网络之间的通信。例如,可以通过通信接口403获取未知设备的流量数据。
总线404可包括在装置400各个部件(例如,存储器401、处理器402、通信接口403)之间传送信息的通路。
应注意,尽管上述装置400仅仅示出了存储器、处理器、通信接口,但是在具体实现过程中,本领域的技术人员应当理解,装置400还可以包括实现正常运行所必须的其他器件。同时,根据具体需要,本领域的技术人员应当理解,装置400还可包括实现其他附加功能的硬件器件。此外,本领域的技术人员应当理解,装置400也可仅仅包括实现本申请实施例所必须的器件,而不必包括图4中所示的全部器件。
本申请实施例还提供了一种计算机可读存储介质,存储用于设备执行的程序代码,所述程序代码包括用于执行上述设备类型识别的方法中的步骤的指令。
本申请实施例还提供了一种计算机程序产品,所述计算机程序产品包括存储在计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述设备类型识别的方法。
上述的计算机可读存储介质可以是暂态计算机可读存储介质,也可以是非暂态计算机可读存储介质。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描 述的装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
本申请中使用的用词仅用于描述实施例并且不用于限制权利要求。如在实施例以及权利要求的描述中使用的,除非上下文清楚地表明,否则单数形式的“一个”和“所述”旨在同样包括复数形式。类似地,如在本申请中所使用的术语“和/或”是指包含一个或一个以上相关联的列出的任何以及所有可能的组合。另外,当用于本申请中时,术语“包括”指陈述的特征、整体、步骤、操作、元素,和/或组件的存在,但不排除一个或一个以上其它特征、整体、步骤、操作、元素、组件和/或这些的分组的存在或添加。
所描述的实施例中的各方面、实施方式、实现或特征能够单独使用或以任意组合的方式使用。所描述的实施例中的各方面可由软件、硬件或软硬件的结合实现。所描述的实施例也可以由存储有计算机可读代码的计算机可读介质体现,该计算机可读代码包括可由至少一个计算装置执行的指令。所述计算机可读介质可与任何能够存储数据的数据存储装置相关联,该数据可由计算机系统读取。用于举例的计算机可读介质可以包括只读存储器、随机存取存储器、紧凑型光盘只读储存器(Compact Disc Read-Only Memory,CD-ROM)、硬盘驱动器(Hard Disk Drive,HDD)、数字视频光盘(Digital Video Disc,DVD)、磁带以及光数据存储装置等。所述计算机可读介质还可以分布于通过网络联接的计算机系统中,这样计算机可读代码就可以分布式存储并执行。
上述技术描述可参照附图,这些附图形成了本申请的一部分,并且通过描述在附图中示出了依照所描述的实施例的实施方式。虽然这些实施例描述的足够详细以使本领域技术人员能够实现这些实施例,但这些实施例是非限 制性的;这样就可以使用其它的实施例,并且在不脱离所描述的实施例的范围的情况下还可以做出变化。比如,流程图中所描述的操作顺序是非限制性的,因此在流程图中阐释并且根据流程图描述的两个或两个以上操作的顺序可以根据若干实施例进行改变。作为另一个例子,在若干实施例中,在流程图中阐释并且根据流程图描述的一个或一个以上操作是可选的,或是可删除的。另外,某些步骤或功能可以添加到所公开的实施例中,或两个以上的步骤顺序被置换。所有这些变化被认为包含在所公开的实施例以及权利要求中。
另外,上述技术描述中使用术语以提供所描述的实施例的透彻理解。然而,并不需要过于详细的细节以实现所描述的实施例。因此,实施例的上述描述是为了阐释和描述而呈现的。上述描述中所呈现的实施例以及根据这些实施例所公开的例子是单独提供的,以添加上下文并有助于理解所描述的实施例。上述说明书不用于做到无遗漏或将所描述的实施例限制到本申请的精确形式。根据上述教导,若干修改、选择适用以及变化是可行的。在某些情况下,没有详细描述为人所熟知的处理步骤以避免不必要地影响所描述的实施例。
以上所述,仅为本申请实施例的具体实施方式,但本申请实施例的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请实施例揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请实施例的保护范围之内。因此,本申请实施例的保护范围应以所述权利要求的保护范围为准。

Claims (10)

  1. 一种设备类型识别的方法,其特征在于,包括:
    获取未知设备的流量数据;
    从所述未知设备的流量数据中提取识别特征,所述识别特征包括所述未知设备的流量数据的加密协议特征;
    根据所述识别特征和设备类型识别模型确定所述未知设备的设备类型,所述设备类型识别模型是基于已知设备的样本特征训练得到的,所述样本特征包括所述已知设备的流量数据的加密协议特征。
  2. 根据权利要求1所述的方法,其特征在于,所述加密协议特征包括:
    所述流量数据的加密协议、使用所述加密协议加密的所述流量数据的长度和使用所述加密协议加密的所述流量数据的分片数量中至少一项。
  3. 根据权利要求1或2所述的方法,其特征在于,所述识别特征还包括所述未知设备的流量数据的数据包大小特征、协议栈特征。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述获取未知设备的流量数据包括:
    通过主动方式获取所述未知设备的流量数据和/或通过被动方式获取所述未知设备的流量数据;
    其中,所述通过主动方式获取所述未知设备的流量数据包括:
    发送所需流量数据的加密协议特征有关的检测指令;
    接收响应于所述检测指令的所述流量数据。
  5. 一种设备类型识别的装置,其特征在于,包括:
    获取单元(310),用于获取未知设备的流量数据;
    处理单元(320),用于从所述未知设备的流量数据中提取识别特征,所述识别特征包括所述未知设备的流量数据的加密协议特征;
    所述处理单元(320)还用于根据所述识别特征和设备类型识别模型确定所述未知设备的设备类型,所述设备类型识别模型是基于已知设备的样本特征训练得到的,所述样本特征包括所述已知设备的流量数据的加密协议特征。
  6. 根据权利要求5所述的装置,其特征在于,所述加密协议特征包括:
    所述流量数据的加密协议、使用所述加密协议加密的所述流量数据的长度和使用所述加密协议加密的所述流量数据的分片数量中至少一项。
  7. 根据权利要求5或6所述的装置,其特征在于,所述识别特征还包括所述未知设备的流量数据的数据包大小特征、协议栈特征。
  8. 根据权利要5至7中任一项所述的装置,其特征在于,所述获取单元(310)具体用于:
    通过主动方式获取所述未知设备的流量数据和/或通过被动方式获取所述未知设备的流量数据;
    所述装置还包括发送单元(330);
    所述发送单元(330)用于发送所需流量数据的加密协议特征有关的检测指令;
    所述获取单元(310)还具体用于接收响应于所述检测指令的所述流量数据。
  9. 一种设备类型识别的装置,其特征在于,包括:
    存储器(401),用于存储程序;
    处理器(402),用于执行所述存储器(401)存储的程序,当所述存储器(401)存储的程序被执行时,所述处理器(402)用于执行根据权利要求1至4中任一项所述的设备类型识别的方法。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读介质存储用于设备执行的程序代码,所述程序代码包括用于执行根据权利要求1至4中任一项所述的设备类型识别的方法中的步骤的指令。
PCT/CN2021/109352 2021-07-29 2021-07-29 设备类型识别的方法和装置 WO2023004707A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/109352 WO2023004707A1 (zh) 2021-07-29 2021-07-29 设备类型识别的方法和装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/109352 WO2023004707A1 (zh) 2021-07-29 2021-07-29 设备类型识别的方法和装置

Publications (1)

Publication Number Publication Date
WO2023004707A1 true WO2023004707A1 (zh) 2023-02-02

Family

ID=85086010

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/109352 WO2023004707A1 (zh) 2021-07-29 2021-07-29 设备类型识别的方法和装置

Country Status (1)

Country Link
WO (1) WO2023004707A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106533669A (zh) * 2016-11-15 2017-03-22 百度在线网络技术(北京)有限公司 设备识别的方法、装置和系统
CN109063745A (zh) * 2018-07-11 2018-12-21 南京邮电大学 一种基于决策树的网络设备类型识别方法及系统
CN110445689A (zh) * 2019-08-15 2019-11-12 平安科技(深圳)有限公司 识别物联网设备类型的方法、装置及计算机设备
US20200219005A1 (en) * 2019-01-09 2020-07-09 International Business Machines Corporation Device discovery and classification from encrypted network traffic
CN112671757A (zh) * 2020-12-22 2021-04-16 无锡江南计算技术研究所 一种基于自动机器学习的加密流量协议识别方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106533669A (zh) * 2016-11-15 2017-03-22 百度在线网络技术(北京)有限公司 设备识别的方法、装置和系统
CN109063745A (zh) * 2018-07-11 2018-12-21 南京邮电大学 一种基于决策树的网络设备类型识别方法及系统
US20200219005A1 (en) * 2019-01-09 2020-07-09 International Business Machines Corporation Device discovery and classification from encrypted network traffic
CN110445689A (zh) * 2019-08-15 2019-11-12 平安科技(深圳)有限公司 识别物联网设备类型的方法、装置及计算机设备
CN112671757A (zh) * 2020-12-22 2021-04-16 无锡江南计算技术研究所 一种基于自动机器学习的加密流量协议识别方法及装置

Similar Documents

Publication Publication Date Title
US11606275B2 (en) Network quality measurement method and apparatus
CN105704091B (zh) 一种基于ssh协议的会话解析方法及系统
CN111901300B (zh) 一种对网络流量进行分类的方法和分类装置
CN107770132B (zh) 一种对算法生成域名进行检测的方法及装置
US8965968B2 (en) Computer-readable medium storing system visualization processing program, method and device
US20240064107A1 (en) System for classifying encrypted traffic based on data packet
KR20100073153A (ko) 패킷 처리 방법 및 이를 이용한 toe 장치
US8490173B2 (en) Unauthorized communication detection method
CN111147394A (zh) 一种远程桌面协议流量行为的多级分类检测方法
WO2023004707A1 (zh) 设备类型识别的方法和装置
CN113765891A (zh) 一种设备指纹识别方法以及装置
CN113315678A (zh) 加密tcp流量采集方法与装置
CN110858837A (zh) 一种网络管控方法、装置以及电子设备
CN111224891B (zh) 一种基于动态学习三元组的流量应用识别系统及方法
CN108076070B (zh) 一种fasp协议阻断方法、装置及分析系统
CN116828044A (zh) 基于数据平面开发套件的消息队列遥感传输方法和系统
CN114205151B (zh) 基于多特征融合学习的http/2页面访问流量识别方法
CN116074056A (zh) 智能物联终端操作系统和应用软件的精准识别方法和系统
CN114666282B (zh) 一种基于机器学习的5g流量识别方法及装置
CN111669431B (zh) 消息传输方法、装置、计算机设备和存储介质
JP2013243534A (ja) 遅延時間評価装置および遅延時間評価方法
CN113438503A (zh) 视频文件还原方法、装置、计算机设备和存储介质
CN115589362B (zh) 设备类型指纹的生成方法及识别方法、设备及介质
KR102714331B1 (ko) 사이버 공방 시뮬레이션 통합 운영을 위한 다중 채널 모니터링 시스템 및 이의 운영방법
JP2015076879A (ja) 暗号化されたデータフローを分類する方法および装置、コンピュータプログラム、ならびに情報記憶手段

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21951314

Country of ref document: EP

Kind code of ref document: A1