WO2021190398A1 - 设备型号的识别方法、装置及系统 - Google Patents
设备型号的识别方法、装置及系统 Download PDFInfo
- Publication number
- WO2021190398A1 WO2021190398A1 PCT/CN2021/081615 CN2021081615W WO2021190398A1 WO 2021190398 A1 WO2021190398 A1 WO 2021190398A1 CN 2021081615 W CN2021081615 W CN 2021081615W WO 2021190398 A1 WO2021190398 A1 WO 2021190398A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- model
- candidate
- data
- target
- similarity
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 115
- 238000012549 training Methods 0.000 claims description 43
- 238000004422 calculation algorithm Methods 0.000 claims description 37
- 230000015654 memory Effects 0.000 claims description 20
- 238000003860 storage Methods 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 13
- 238000012545 processing Methods 0.000 abstract description 5
- 230000000875 corresponding effect Effects 0.000 description 39
- 239000013598 vector Substances 0.000 description 35
- 238000007726 management method Methods 0.000 description 22
- 230000006870 function Effects 0.000 description 19
- 230000008569 process Effects 0.000 description 15
- 238000010586 diagram Methods 0.000 description 13
- 238000005516 engineering process Methods 0.000 description 12
- 238000003062 neural network model Methods 0.000 description 11
- 238000012423 maintenance Methods 0.000 description 9
- 238000010276 construction Methods 0.000 description 5
- 238000003058 natural language processing Methods 0.000 description 5
- 230000001360 synchronised effect Effects 0.000 description 5
- 238000012937 correction Methods 0.000 description 4
- 238000002372 labelling Methods 0.000 description 4
- 238000004140 cleaning Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000010845 search algorithm Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000035515 penetration Effects 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L61/00—Network arrangements, protocols or services for addressing or naming
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2101/00—Indexing scheme associated with group H04L61/00
- H04L2101/60—Types of network addresses
- H04L2101/618—Details of network addresses
- H04L2101/622—Layer-2 addresses, e.g. medium access control [MAC] addresses
Definitions
- This application relates to the field of data processing, and in particular to a method, device and system for identifying device models.
- the specifications of the terminal device can be obtained through the device model of the terminal device, and then more efficient performance can be performed based on the specifications and parameters. Maintenance or fault detection.
- a gateway device can obtain the media access control (MAC) address (also called physical address) of the terminal device it is connected to, and can query the MAC address and device pre-stored in the database. Correspondence of the model, determine the device model of the terminal device.
- MAC media access control
- the device model of the terminal device cannot be determined.
- the method in the related technology requires a relatively high amount of data stored in the database, and the success rate of device model identification is low.
- This application provides a method, device, and system for identifying device models, which can solve the problems that methods in related technologies have high requirements on the amount of data stored in the database and a low recognition success rate.
- the technical solutions are as follows:
- a device model identification method which can obtain the target MAC address of the target terminal device; and determine the first number of candidate MAC addresses from a database, the database includes multiple MAC addresses, and each The device model corresponding to the MAC address, where the similarity between each candidate MAC address and the target MAC address is greater than the similarity between other MAC addresses in the database and the target MAC address, and the first number is an integer greater than one; Among the device models corresponding to the first number of candidate MAC addresses, the device model with the most occurrences may be determined as the device model of the target terminal device.
- the device model of the target terminal device can be determined based on the similarity of the MAC address, thereby effectively improving the success rate of device model identification and reducing Requirements for the amount of data stored in the database.
- the database may include multiple data groups, each data group includes one or more data pairs, and each data pair includes a MAC address and a device model corresponding to the MAC address; the first data group is determined from the database.
- the process of a number of alternative MAC addresses may include:
- the second number of candidate data groups are determined, and the MAC address of any data pair in each candidate data group and the target The similarity of the MAC address is greater than the similarity of the MAC address of any data pair in other data groups to the target MAC address, wherein the second number is an integer greater than 1; from the second number of candidate data groups A first number of candidate MAC addresses is determined from the included MAC addresses, and the similarity between each candidate MAC address and the target MAC address is greater than that of other MAC addresses in the second number of candidate data groups and the target MAC The similarity of addresses.
- the search range of the MAC address can be narrowed, thereby effectively improving the search efficiency of the candidate MAC address.
- determining the first number of candidate MAC addresses from the MAC addresses included in the second number of candidate data groups includes: The arrangement sequence of the data pairs determines the first number of candidate MAC addresses from the MAC addresses included in the second number of candidate data groups.
- the nearest neighbor to the target MAC address can be quickly determined from the candidate data group. (That is, the candidate MAC address with higher similarity) effectively improves the search efficiency of the candidate MAC address.
- the method may further include: obtaining multiple data pairs; using a clustering algorithm to group the multiple data pairs to obtain multiple data groups; for each data group, according to the central data pair in the data group
- the similarity between the MAC address and the MAC address in each other data pair, the data pairs included in the data group are sorted in the order of the similarity from high to low.
- the database constructed by the above method can facilitate searching for candidate MAC addresses that are close to the target MAC address.
- the method may further include: using a similarity model to determine the similarity between the MAC address in the database and the target MAC address; wherein the similarity is The degree model is trained based on multiple MAC address samples whose similarity has been determined.
- the similarity model can be obtained by training based on a deep metric learning algorithm, and the similarity of two MAC addresses determined by using the similarity model can more accurately reflect the similarity of the device models corresponding to the two MAC addresses.
- the method may further include: obtaining the target device name of the target terminal device; determining the candidate device model of the target terminal device according to the target device name; if the candidate device model is not an unknown model, it is based on the The target device name can determine the device model of the target terminal device, then the candidate device model is determined as the device model of the target terminal device; accordingly, the process of determining the first number of candidate MAC addresses from the database may include : If the model of the candidate device is an unknown model, that is, the device model of the target terminal device cannot be determined based on the name of the target device, the first number of candidate MAC addresses is determined from the database.
- the solution provided in this application can also use the target device name to determine the device model, which effectively improves the identification success rate of the device model compared to determining the device model based on a single parameter.
- the process of determining the candidate device model of the target terminal device according to the target device name may include: using a model determination model to determine the candidate device model of the target terminal device from the target device name; wherein the model is determined The model is trained based on multiple device name samples of the determined device model.
- the process of using the model determination model to determine the candidate device model of the target terminal device from the target device name may include: using the model determination model to determine whether each character in the target device name is a valid character; The character string composed of valid characters in the target device name is determined as the candidate device model of the target terminal device.
- Using a neural network model to determine the device model from the target device name can ensure the recognition success rate of the device model and ensure the reliability of the determined device model.
- the method may further include: obtaining a device name sample and a device model sample corresponding to the device name sample; in the device name sample, each character in the string matching the device model sample is marked as valid Characters, all characters except the character string are marked as invalid characters; model training is performed on the marked device name samples to obtain the model determination model.
- the process of determining the candidate device model of the target terminal device according to the target device name may include:
- the matching degree between the target device name and each device model template in the multiple device model templates is determined respectively; the device model template with the highest matching degree is determined as the candidate device model of the target terminal device.
- the method based on template matching does not require pre-training the neural network model, and the complexity of the method is low.
- an apparatus for identifying a device model may include at least one module, and the at least one module may be used to implement the method for identifying the device model provided in the foregoing aspect.
- an apparatus for identifying a device model may include a processor, a memory, and a computer program stored in the memory and capable of running on the processor.
- the processor executes the computer program, Implement the device model identification method provided in the above-mentioned aspects.
- a computer-readable storage medium is provided, and instructions are stored in the computer-readable storage medium.
- the computer-readable storage medium runs on a computer, the computer can execute the device model provided in the above-mentioned aspect. Method of identification.
- a device model identification system in another aspect, can include: a first server and a second server; the first server can be used to execute the device model identification method provided in the foregoing aspect Steps; the second server can be used to perform the steps of model training and/or database construction in the device model identification method provided in the above aspect.
- a device model identification system may include: a first server, and the first server may be used to execute the device model identification method provided in the foregoing aspect.
- the system may further include: a gateway device; the gateway device is respectively connected to the terminal device and the first server, and the gateway device is used to obtain the MAC address of the terminal device and send the obtained MAC address to The first server.
- a gateway device is respectively connected to the terminal device and the first server, and the gateway device is used to obtain the MAC address of the terminal device and send the obtained MAC address to The first server.
- this application provides a device model identification method, device and system.
- the solution provided in this application can determine the similarity from the database based on the similarity between the target MAC address and the MAC address stored in the database.
- the highest first number of candidate MAC addresses and then, among the device models corresponding to the first number of candidate MAC addresses, the device model with the most occurrences is determined as the device model of the target terminal device. Therefore, even if the device model corresponding to the target MAC address is not stored in the database, the device model of the target terminal device can be determined according to the similarity of the MAC address, thereby effectively improving the success rate of device model identification and reducing the database Requirements for the amount of data stored in.
- FIG. 1 is a schematic structural diagram of a device model identification system provided by an embodiment of the present application
- FIG. 2 is a flowchart of a method for identifying a device model provided by an embodiment of the present application
- FIG. 3 is a flowchart of another device model identification method provided by an embodiment of the present application.
- FIG. 4 is a flowchart of a method for determining a candidate device model of a target terminal device according to an embodiment of the present application
- Fig. 5 is a schematic diagram of a target device name recognition vector provided by an embodiment of the present application.
- FIG. 6 is a flowchart of a method for determining a first number of candidate MAC addresses according to an embodiment of the present application
- FIG. 7 is a schematic structural diagram of a database provided by an embodiment of the present application.
- FIG. 8 is a schematic diagram of using a model identification model to determine a device model provided by an embodiment of the present application.
- FIG. 9 is a flowchart of a method for training a model determination model provided by an embodiment of the present application.
- FIG. 10 is a flowchart of a method for training a similarity model provided by an embodiment of the present application.
- FIG. 11 is a flowchart of a method for constructing a database provided by an embodiment of the present application.
- FIG. 12 is a schematic diagram of dividing a data group according to an embodiment of the present application.
- FIG. 13 is a schematic structural diagram of a VP-tree provided by an embodiment of the present application.
- FIG. 14 is a schematic structural diagram of a device model identification device provided by an embodiment of the present application.
- 15 is a schematic structural diagram of another device type identification device provided by an embodiment of the present application.
- FIG. 16 is a schematic structural diagram of another device model identification device provided by an embodiment of the present application.
- FIG. 17 is a schematic structural diagram of another device type identification device provided by an embodiment of the present application.
- the device model identification of the terminal device is of great significance to the operator's home network service.
- WiFi wireless fidelity
- the key to determining the user experience in a home network is the speed of wireless fidelity (WiFi)
- WiFi wireless fidelity
- the performance of terminal devices does not support the corresponding WiFi bandwidth, and factors such as WiFi signal penetration through walls account for only 20%.
- gateway devices support 2.4 gigahertz (GHz) and 5GHz frequency bands
- routers or mobile phones used in homes may only support the 2.4GHz frequency band, which may result in users not being able to enjoy the high-speed experience brought by the 5G frequency band. Lead to user complaints.
- FIG. 1 is a schematic structural diagram of a device model identification system provided by an embodiment of the present application.
- the system may include a model identification server 01.
- the model identification server 01 may be a server, a server cluster composed of several servers, or a cloud computing service center.
- the system may also include a gateway device 02, which is used to implement devices for interconnecting different networks, and may also be called an inter-network connector or a protocol converter.
- the system may also include a network management device 03, and the network management device 03 may be a computer.
- the gateway device 02 can be connected to multiple terminal devices 04, and each terminal device 04 can be a smart terminal device such as a mobile phone, a computer, a router, a wearable device, or a household device.
- household equipment can include speakers, electronic scales, TVs, and air conditioners.
- FIG. 2 is a flowchart of a device model identification method provided by an embodiment of the present application.
- the gateway device 02 can be used to implement step A: data collection, that is, the gateway device 02 can collect terminal equipment 04 MAC address (or MAC address and device name as shown in Figure 2), and send the collected data to the model identification server 01.
- the data collected and reported by the gateway device 02 may include device names N1 to N3, and the MAC address corresponding to each device name, that is, the MAC addresses M1 to M3.
- the model identification server 01 can be used to implement step B: device model identification.
- the model identification server 01 can identify the device model based on the MAC address (step B2); the model identification server 01 can also identify the device model based on the device name (step B1); or the model identification server 01 can perform steps B2 and step B1, and combine the recognition results obtained through step B1 and step B2 to obtain the final device model.
- the model identification server 01 is based on the device name N1 and the MAC address M1, and the finally identified device model may be X1.
- the model identification server 01 may further send the finally determined device model to the network management device 03.
- the network management device 03 can display the device model recognized by the model recognition server 01 (step C).
- the network administrator for example, the operation and maintenance personnel of the operator who manages the network management device 03 can then determine the specifications of the terminal device 04 based on the identified device model, so as to perform network maintenance, fault detection, or demand analysis.
- the model identification server 01 may include a first server 011 and a second server 012.
- the second server 012 may be used to construct a database, train a neural network model for device model identification according to labeled data, and send the constructed database and the trained neural network model to the first server 011.
- the label data refers to the sample data of the determined device model.
- the first server 011 can then identify the device model of the terminal device 04 based on the database and the neural network model. Therefore, the first server 011 may also be referred to as an online recognition server, and the second server 012 may also be referred to as an offline training server.
- the network administrator may also verify the device model of the terminal device 04 provided by the network management device 03. If it is determined that the device model is wrong, the network administrator can enter the correct device model in the network management device 03 and instruct the network management device 03 to send the corrected data (including the correct device model, MAC address and device name) to the second server 012.
- the second server 012 may periodically retrain the neural network model based on the accumulated correction data to obtain an updated neural network model, and send the updated neural network model to the first server 011. That is, the network management device 03 may also instruct the second server 012 to implement step D shown in FIG. 2.
- first server 011 and the second server 012 may be an integrated server, or may be two independent servers.
- first server 011 and the network management device 03 may be an integrated device.
- the gateway device 02 may not be required in the system, and the terminal device 04 may directly report the MAC address to the model recognition server 01, or the network management device 03 collects relevant data and reports to the model recognition server 01.
- the embodiment of the present application provides a method for identifying a device model.
- the method can be applied to the model identification server 01 shown in FIG. 1, for example, it can be applied to the first server 011.
- the method may include:
- Step 101 Obtain the target MAC address and the target device name of the target terminal device.
- the target terminal device refers to the terminal device of the device model to be queried.
- the model identification server 01 may obtain the target MAC address and target device name of the target terminal device reported by the gateway device 02.
- the gateway device 02 can periodically collect the MAC address and device name of the terminal device it is connected to, and report the MAC address and device name of the terminal device to the model identification server 01.
- the period for the gateway device 02 to collect the MAC address and device name can be flexibly adjusted according to requirements, for example, it can be one hour, one day, or one month.
- the gateway device 02 may respond to the collection instruction sent by the model recognition server 01 or the network management device 03 to collect the target MAC address and the target device name of the target terminal device and report to the model recognition server 01.
- the collection instruction may carry an identifier that can uniquely identify the target terminal device, such as an Internet protocol (IP) address.
- IP Internet protocol
- the model identification server 01 may also directly obtain the target MAC address and the target device name of the target terminal device.
- the gateway device 02 collects and reports the MAC addresses and devices of the four terminal devices 04
- the name can be as shown in Table 1.
- the network management device 02 can report four pieces of equipment information to the model identification server 01, and each piece of equipment information may also be referred to as a piece of equipment record.
- Each device information can include two fields: MAC address and device name.
- the device information of the mobile phone reported by the network management device 02 obtained by the model identification server 01 may include the target MAC address: 001234AB56C1, and the target device name: Xiao Ming's AA-a1-666.
- Step 102 Determine the candidate device model of the target terminal device according to the name of the target device.
- the model identification server 01 may first determine the candidate device model of the target terminal device according to the target device name.
- the model identification server 01 may store a model determination model, and the model determination model may be based on natural language processing (natural language processing, NLP) technology, using multiple devices with determined device models Name sample obtained by training. After the model identification server 01 obtains the target device name of the target terminal device, the model determination model can be used to determine the candidate device model of the target terminal device from the target device name.
- natural language processing natural language processing, NLP
- the process of using the model determination model to determine the candidate device model of the target terminal device from the target device name may include:
- Step 1021 Encode the name of the target device to obtain a name vector that can be recognized by the machine.
- the model recognition server 01 requires First encode the name of the target device to obtain a name vector that can be recognized by the machine.
- a one-hot encoding algorithm can be used to encode the name of the target device, so that each character in the name of the target device is converted into characters that can be recognized by the machine, and these machine-recognizable characters form the name vector.
- the process of using the one-hot encoding algorithm to encode the name of the target device is equivalent to mapping each character in the name of the target device into an integer according to a pre-created mapping table.
- the 26 Arabic letters of az can be mapped to 26 integers from 0 to 25; 10 numbers from 0 to 9 can be mapped to integers from 26 to 35; punctuation that often appears in device names, such as "-" can be mapped to the three integers 35-38; commonly used Chinese characters can also be mapped to integers after 38.
- "small” can be mapped to 39
- "ming” can be mapped to 40
- "of” can be mapped to 41.
- the name vector can be obtained [39, 40,41,1,1,36,1,27,36,32,32,32], the name vector is a vector.
- Step 1022 Input the name vector to the model determination model to obtain the identification vector output by the model determination model.
- the model identification server 01 may input the encoded name vector to the model determination model to obtain the identification vector output by the model determination model.
- the identification vector may include multiple identifiers, and each identifier may be used to identify whether a character in the name of the target device is a valid character. That is, the model recognition server 01 can use the model determination model to determine whether each character in the target device name is a valid character.
- the valid characters refer to the characters used to compose the device model of the target terminal device. In the embodiment of the present application, different values of the identifier may be used to indicate whether the character indicated by the identifier is a valid character.
- the value of the identifier when the value of the identifier is a first integer, it may indicate that the character indicated is a valid character, and when the value of the identifier is a second integer, it may indicate that the character indicated is an invalid character.
- the first integer is different from the second integer.
- the identification vector may be an integer sequence whose length is equal to the length of the name vector, and the value range of each identifier may be [0,1,2].
- the identifier value is 0, it can indicate that the indicated character is a valid character, and the character is the initial character (that is, the first character) of the device model.
- the value of the identifier is 1 to indicate that the indicated character is a valid character, and the character is the middle character of the device model.
- a value of 2 for the identifier can indicate that the character indicated is an invalid character. That is, the first integer includes 0 and 1, and the second integer is 2.
- the model recognition server 01 will name the vector [39,40,41,1,1,36 ,1,27,36,32,32]
- the identification vector output by the model determination model can be [2,2,2,0,1,1,1,1,1,2 ,2,2].
- Step 1023 Determine the candidate device model of the target terminal device from the target device name according to the identification vector.
- the model recognition server 01 may determine the character string composed of valid characters in the target device name as the candidate device model of the target terminal device according to the identification vector.
- the model recognition server 01 can combine the identifier 0 and the identifier in the identification vector [2,2,2,0,1,1,1,1,1,2,2,2]
- the character string "AA-a1" indicated by 1 is determined as the candidate device model of the target terminal device.
- the characters indicated by each identifier in the identification vector output by the model determination model may be invalid characters.
- the characters in the target device name are all invalid characters
- the model identification server 01 may determine that the device model of the target terminal device is unknown, that is, determine that the candidate device model of the target terminal device is an unknown model.
- the identification vector output by the model determination model can be [2,2,2,2,2], and the model identification server 01 can determine the candidate of the target terminal device based on the identification vector The device model is unknown.
- multiple device model templates may be pre-stored in the model identification server 01. After the model identification server 01 obtains the target device name of the target terminal device, it can determine the matching degree between the target device name and each device model template, and can determine the device model template with the highest matching degree as the backup of the target terminal device. Select the device model.
- the model identification server 01 can use the maximum covenant string matching method to calculate the maximum covenant string length between the target device name and each device model template, and can determine the device model template with the largest length as the target terminal Alternative device model of the device.
- the model recognition server 01 uses the maximum convention substring matching method to calculate the matching degree, it can determine that the device model template with the highest matching degree with the target device name "Xiaoming's AA-a1-666" is "AA-a1", so The device model template "AA-a1" can be determined as the candidate device model of the target terminal device.
- the method based on template matching does not require pre-training the neural network model, and the complexity of the method is low.
- Step 103 Detect whether the candidate device model is an unknown model.
- the model identification server 01 may not necessarily be able to determine a reliable candidate device model based on the target device name. For example, live network data shows that about 55% of the device names can give information about a valid device model, and about 45% of the device names are invalid.
- the model identification server 01 may further detect whether the candidate device model is an unknown model. If the candidate device model is not an unknown model, the model identification server 01 can execute step 104; if the candidate device model is an unknown model, the model identification server 01 can execute step 105.
- the model identification server 01 can detect whether the candidate device model is unknown, if it is not unknown, step 104 can be executed; if it is unknown, step 105 can be executed.
- Step 104 Determine the candidate device model as the device model of the target terminal device.
- the model identification server 01 may directly determine the candidate device model as the device model of the target terminal device.
- Step 105 Determine the first number of candidate MAC addresses from the database according to the target MAC address.
- the model identification server 01 can continue to determine the device model of the target terminal device according to the target MAC address of the target terminal device, thereby increasing the success rate of device model identification.
- a database may be stored in the model identification server 01, and the database includes multiple MAC addresses and a device model corresponding to each MAC address.
- the model recognition server 01 After the model recognition server 01 obtains the target MAC address, it can calculate the similarity between the target MAC address and each MAC address in the database, and can determine the first number of candidate MACs from the database based on the calculated similarity address.
- the similarity between each candidate MAC address and the target MAC address is greater than the similarity between other MAC addresses in the database and the target MAC address. That is, the model identification server 01 can determine the first number of candidate MAC addresses with the highest similarity to the target MAC address from the database.
- the first number may be an integer greater than or equal to 1, for example, 5 or 10, etc.
- the model recognition server 01 may use a similarity model to determine the similarity between the MAC address in the database and the target MAC address.
- the similarity model may be pre-trained based on multiple MAC address samples whose similarity has been determined.
- the similarity of the two MAC addresses determined by using the similarity model can reflect the probability that the device models corresponding to the two MAC addresses are the same. That is, the higher the similarity of the two MAC addresses, the higher the probability that the device models corresponding to the two MAC addresses are the same.
- the database in order to improve the efficiency of querying candidate MAC addresses, the database may be constructed using locality-sensitive hashing (LSH) technology.
- LSH locality-sensitive hashing
- a database constructed using the LSH technology may include multiple data groups, each data group including one or more data pairs, and each data pair including a MAC address and a device model corresponding to the MAC address.
- the multiple data groups may be obtained by dividing using a clustering algorithm, so each data group may also be referred to as a data cluster (cluster).
- the storage location of each data pair in the database can be represented by a two-level index (index 1, index 2).
- index 1 is the index of the data group
- index 2 is the index of the data pair in the data group. Since the database constructed based on the LSH technology can divide the data pairs with the same or similar device models into the same data group, the index 1 of the data pairs with the same or similar device models is the same.
- the 9 data pairs are divided into 3 data groups, and the indexes of the 3 data groups are 001, 002, and 003, respectively. Then the device models in each data pair in each data group can be the same. For example, the device models in the three data pairs whose index 1 is 001 may all be AA-a1.
- FIG. 6 is a flowchart of a method for determining a first number of candidate MAC addresses provided by an embodiment of the present application. As shown in FIG. 6, the above step 105 may include:
- Step 1051 Determine the second number of candidate data groups according to the similarity between the target MAC address and the MAC address of any data pair in each data group.
- the model identification server 01 can first calculate the similarity between the target MAC address and the MAC address of any data pair in each data group, and obtain the same number of similarities as the number of data groups included in the database. Spend. After that, the model recognition server 01 can determine the second number of candidate data sets based on the calculated similarity.
- the similarity between the MAC address of any data pair in each candidate data group and the target MAC address is greater than the similarity between the MAC address of any data pair in other data groups and the target MAC address. That is, the model identification server 01 may first determine the second number of candidate data groups that are most similar to the target MAC address.
- the second number can be an integer greater than or equal to 1, and the second number and the first number can be the same or different.
- the index 2 of any data pair selected for calculation of the similarity with the target MAC address in each data group may be the same.
- the index 2 of any selected data pair in each data group may be 001, that is, the similarity between the target MAC address and the MAC address of the first data pair in each data group can be calculated separately.
- multiple data groups in the database may be obtained by dividing based on a k-centers clustering algorithm, and then there is one center data pair in each data group.
- the center data pair in each data group may be in the first place, and other data pairs may be arranged in descending order of similarity with the MAC addresses of the center data pair.
- the model identification server 01 can select the center data pair in each data group for calculation.
- the model identification server 01 can calculate the similarity y1 between the target MAC address and the MAC address in data pair 1, the similarity y2 between the target MAC address and the MAC address in data pair 4, and The similarity between the target MAC address and the MAC address in the data pair 7 is y3. If the second number is 2, and the model recognition server 01 calculates that three similarities meet: y1>y3>y2, the model recognition server 01 can determine the data group with index 1 as 001 and the data group with index 1 as 003 as Alternative data set.
- Step 1052 Determine the first number of candidate MAC addresses from the MAC addresses included in the second number of candidate data groups.
- the similarity between each candidate MAC address and the target MAC address may be greater than the similarity between other MAC addresses in the candidate data group and the target MAC address. That is, the model identification server 01 may determine the first number of candidate MAC addresses that are most similar to the target MAC address from the MAC addresses included in the second number of candidate data groups.
- the model identification server 01 may adopt a neighbor search algorithm to determine the first number of backups from the MAC addresses included in the second number of candidate data groups. Select the MAC address.
- the model identification server 01 can be based on the data pairs in each candidate data group. Arrange the order, and determine the first number of candidate MAC addresses from the MAC addresses included in the second number of candidate data groups.
- one or more data pairs included in each data group may be sorted using a vantage point tree (VP-tree) algorithm.
- the model recognition server 01 can use the VP-tree algorithm to determine the first number of candidate MAC addresses from the MAC addresses included in the second number of candidate data groups, which can greatly speed up each data group. The nearest neighbor search efficiency of the data pair.
- the model identification server 01 calculates the target MAC address and the MAC address of the central data pair in each data group (that is, the data pair whose index 2 is 001) After the similarity, 5 candidate data groups can be determined from the database. Assuming that the 5 candidate data groups are c1, c2, c3, c4, and c5, thanks to the clustering algorithm adopted in the database construction, the 10 MAC addresses that are most similar to the target MAC address have a greater probability between c1 to c5 among the 5 candidate data sets.
- the model identification server 01 may query the candidate data group according to the VP-tree for several candidate data pairs whose physical address is most similar to the target MAC address. Among them, the number of candidate data pairs determined in different candidate data groups may be the same or different. Assuming that the model recognition server 01 determines 5 candidate data pairs from each candidate data group, a total of 25 candidate data pairs can be obtained, and 25 neighboring data pairs can be determined. Finally, the model identification server 01 can sort the 25 candidate data pairs in descending order of similarity with the target MAC address, and determine the 10 candidate data pairs with the highest similarity to the target MAC address. , The 10 MAC addresses in the 10 candidate data pairs are the finally determined candidate MAC addresses, that is, the 10 MAC addresses closest to the target MAC address.
- the VP-tree algorithm adopted in the embodiment of the present application can greatly reduce the complexity of query within the data group.
- the computational complexity of the traditional B-tree algorithm is O(2*log(n)), where n data groups The number of data pairs included in.
- the model recognition server 01 calculates the metric distance between the target MAC address and the VP-tree node (ie The similarity with the MAC address in the central data pair) is quickly located to the nearest neighbor, and then the remaining neighbors can be quickly located within a small area near the position of the nearest neighbor in the VP-tree. Therefore, the computational complexity of the VP-tree algorithm can be reduced to O(log(n)).
- the model recognition server 01 determines the data group whose index 1 is 003 after calculating the similarity between the target MAC address and the MAC address of the central data pair whose index 1 is 003 and index 2 is 001 It is a candidate data group.
- the index 2 of the central data pair of the data group whose index 1 is 003 is 001, and the model identification server 01 can quickly locate the MAC address and the central data pair near the central data pair whose index 1 is 003 and index 2 is 001.
- Other data pairs with similar destination MAC addresses For example, as shown in Figure 7, when the VP-tree is constructed, the data pair whose index 2 is 002 and the data pair whose index 2 is 003 are similar to the central data pair.
- the two data pairs are in the VP-tree.
- the tree is located near the center data pair, and the model identification server 01 can quickly determine that in the candidate data group whose index 1 is 003, other data pairs that are adjacent to the target MAC address are data pairs whose index 2 is 002, and The index 2 is a data pair of 003.
- Step 106 Among the device models corresponding to the first number of candidate MAC addresses, the device model with the most occurrences is determined as the device model of the target terminal device.
- the model identification server 01 After the model identification server 01 determines the first number of candidate MAC addresses, it can determine the device model corresponding to each candidate MAC address, and count the number of appearances of each device model, that is, the number of repetitions. After that, the model identification server 01 may determine the device model with the most occurrences as the device model of the target terminal device.
- the similarity threshold may be pre-stored in the model recognition server 01. After the model recognition server 01 determines the first number of candidate MAC addresses, it may also first The candidate MAC address whose similarity with the target MAC address is less than the similarity threshold is eliminated, and then the device model of the target terminal device is determined from the device models corresponding to the remaining candidate MAC addresses. If the similarity between each candidate MAC address and the target MAC address is less than the similarity threshold, the model identification server 01 can determine that the device model of the target terminal device is an unknown model, that is, the device model is "unknown".
- the model recognition server 01 can be based on the above steps
- the methods shown in 102 to 106 identify the device model of the target terminal device. For example, referring to FIG. 2, the model identification server 01 can determine the device model X1 according to the device name N1 and the MAC address M1, and can determine the device model X2 according to the device name N2 and the MAC address M2.
- the sequence of steps in the device model identification method provided in the embodiments of the present application can be appropriately adjusted, and the steps can also be increased or decreased accordingly according to the situation.
- the model identification server 01 can identify the device model of the target terminal device according to the similarity between the target MAC address and the MAC address stored in the database.
- the device model of the target terminal device can be determined according to the similarity of the MAC address, thereby effectively improving the success rate of device model identification and reducing the database Requirements for the amount of data stored in. Any person familiar with the technical field can easily think of a method of change within the technical scope disclosed in this application, which should be covered by the protection scope of this application, and therefore will not be repeated.
- the embodiments of the present application provide a device model identification method, which can determine the device model of the target terminal device by combining the two parameters of the device name and the MAC address, compared to only based on the MAC address or only based on The device name determines the device signal, which can effectively improve the success rate and reliability of device model identification.
- the model recognition server 01 may also store a model recognition model, which is obtained by training based on multiple MAC address samples of the determined device model. Therefore, when the model identification server 01 determines the device model based on the target MAC address, in addition to the method of querying the database shown in the above step 105 and step 106, it can also directly input the target MAC address into the model identification model to obtain the The model identifies the model of the device output by the model.
- the model identification server 01 may first perform one-hot encoding on the target MAC address of the target terminal device to obtain the address vector of the target MAC address.
- the address vector can then be input to the model recognition model, and the model recognition model can output the probability that the target terminal device belongs to a different device model.
- the model identification server 01 may then determine the device model with the highest probability as the device model of the target terminal device. If the probability that the target terminal device belongs to each device model output by the model recognition model is less than the probability threshold (for example, 0.8), the model recognition server 01 can determine that the device model of the target terminal device is an unknown model. The device model of the terminal device is marked as unknown.
- the model recognition server 01 performs one-hot encoding on the target MAC address to obtain the address vector [0 ,0,12,13,15,14,8,12,1,10,12,14].
- the model recognition server 01 can then input the address vector to the model recognition model. If the target terminal device output by the model recognition model has the highest probability of belonging to the device model EE-e1, the model recognition server 01 can determine that the device model of the target terminal device is EE-e1.
- model determination model The construction process of the model determination model, the similarity model, the database, and the model recognition model pre-stored in the model recognition server 01 will be introduced below.
- FIG. 9 is a flowchart of a method for training a model determination model provided by an embodiment of the present application.
- the method can be applied to the model recognition server 01, for example, it can be applied to the second server 012 in the system shown in FIG. 1.
- the following takes the training method applied to the second server 012 as an example for description.
- the method may include:
- Step 201 Obtain a device name sample and a device model sample corresponding to the device name sample.
- the second server 012 may obtain a large amount of label data, and each label data may include a device name sample and a device model sample corresponding to the device name sample. That is, the second server 012 can obtain a large number of device name samples for which device models have been determined.
- Step 202 In the device name sample, each character in the character string matching the device model sample is marked as a valid character, and all characters except the character string are marked as invalid characters.
- the second server 012 may mark the device name samples at the character level according to the device model samples to obtain identification vector samples.
- the identification vector sample includes a plurality of identifiers, and each identifier can be used to indicate whether a character in the device name sample is a valid character. For example, when the value of the identifier is a first integer, it may indicate that the character indicated is a valid character, and when the value of the identifier is a second integer, it may indicate that the character indicated is an invalid character. By labeling each character in the device name sample, the classification of the characters in the device name sample is realized.
- the second server 012 can determine that the string matching the device model sample in the device name sample is "AA-a1", and can mark each character in the string as a valid character, that is, determine each character
- the identifiers are all the first integer.
- the identification vector sample obtained by the second server 012 after labeling the device name sample "Xiaoming's AA-a1-666" can be [2,2,2,0,1,1,1,1, 1,2,2,2],
- Step 203 Perform model training on the labeled device name samples to obtain a model determination model.
- the second server 012 may use NLP technology to perform model training on the labeled device name samples to obtain a model determination model.
- the second server 012 also needs to encode the device name sample to obtain a machine-recognizable name vector sample.
- the second server 012 may use a one-hot encoding algorithm to encode the device name sample, so that each character in the device name sample is converted into a machine-recognizable value.
- the second server 012 can use the name vector sample as the input of the model, and use the identification vector sample as the output of the model for model training to obtain the model determination model.
- the second server 012 may use a bi-directional long short term memory (Bi-LSTM) algorithm and a conditional random field (conditional random fields, CRF) algorithm to construct a model determination model.
- the Bi-LSTM algorithm can extract the context semantics of each character, and the CRF algorithm selects a reasonable classification result based on the processing result of the Bi-LSTM algorithm.
- the model training process can also adopt a cross-validation mechanism, that is, the second server 012 can divide multiple label data into a training set and a validation set.
- the second server 012 uses the label data set in the training set to update the model parameters, and then uses the label data in the validation set to verify the update result. Repeat training in this way until the model is optimal.
- FIG. 10 is a flowchart of a method for training a similarity model provided by an embodiment of the present application.
- the method may be applied to the model recognition server 01, for example, may be applied to the second server 012 in the system shown in FIG. 1.
- the following takes the training method applied to the second server 012 as an example for description.
- the method may include:
- Step 301 Perform preprocessing on multiple acquired data pairs.
- the second server 012 can obtain multiple data pairs issued by the network management device 03 for training the similarity model, and each data pair includes a MAC address and the device model corresponding to the MAC address .
- the device model in each data pair may be obtained by processing the device name using a model determination model.
- the second server 012 may first preprocess the acquired data pair to clean out some dirty data.
- Dirty data refers to the data that the device model and the manufacturer's information given by the MAC address obviously do not match. Since the MAC address is usually composed of 12 hexadecimal integers (0-9, af), the first six digits of the MAC address are the organization unique identifier (OUI), which is usually used to identify the network card manufacturer of the device, so The second server 012 can identify the vendor information through the OUI in the MAC address.
- UUI organization unique identifier
- the device name is incorrect the user may modify the device name at will, such as changing the device name of the AA brand mobile phone to the BB brand;
- MAC address Wrong Some terminal devices will modify the MAC address when connecting to the gateway to prevent tracking.
- the second server 012 when cleaning dirty data, can detect whether the device manufacturer determined according to the device model is consistent with the device manufacturer determined according to the MAC address. If they are consistent, the data pair is retained. If they are inconsistent, the data pair can be determined to be dirty data, and therefore can be deleted.
- Step 302 Determine multiple training samples according to the preprocessed data pair.
- the second server 012 may determine multiple MAC address groups from the MAC addresses included in the preprocessed data pair, and each MAC address group may include at least two MAC addresses.
- the second server 012 can obtain the similarity of any two MAC addresses in the MAC address group. The similarity can be manually labeled, or the second server 012 can obtain the similarity according to the two MAC addresses. The similarity of the device model corresponding to the address is automatically marked.
- each training sample includes at least two MAC address samples, and the similarity of any two MAC address samples in the at least two MAC address samples.
- the second server 012 may encode it to obtain a machine-recognizable address vector sample.
- the second server 012 may perform one-hot encoding on each character in each MAC address sample.
- other encoding methods can also be used.
- the first six bits and the last six bits of the MAC address can be encoded in different ways.
- the annotator or the second server 012 can determine the value of the similarity y of the two MAC address samples according to the device models corresponding to the two MAC address samples, and the value of y It can be negatively correlated with the similarity of two MAC address samples.
- Step 303 Train the multiple training samples to obtain a similarity model.
- the second server 012 can take the address sample vector of the MAC address sample in each training sample as input, output the similarity of any two address samples as the target, and use the deep metric learning algorithm for training Model, get the similarity model.
- the similarity model that can be learned based on the deep metric learning algorithm can be understood as a function used to measure the similarity (also called distance or metric distance) between MAC addresses.
- the similarity model can make terminals of the same device model The similarity of the MAC addresses of the devices is relatively high (that is, the measurement distance is small), while the similarity of the MAC addresses of terminal devices of different device models is low (that is, the measurement distance is relatively large).
- FIG. 11 is a flowchart of a method for constructing a database provided by an embodiment of the present application.
- the method may be applied to the model identification server 01, for example, may be applied to the second server 012 in the system shown in FIG. 1.
- the construction method is applied to the second server 012 as an example for description.
- the method may include:
- Step 401 Obtain multiple data pairs.
- the second server 012 can obtain multiple data pairs issued by the network management device 03 for building a database, and each data pair includes a MAC address and a device model corresponding to the MAC address.
- the multiple data pairs used to construct the database may be the same as or different from the multiple data pairs used to train the similarity model in step 301, which is not limited in the embodiment of the present application.
- the second server 012 may first preprocess the acquired data pair to clean out some dirty data. For the process of cleaning dirty data, refer to the above step 301, which will not be repeated here.
- Step 402 Use a clustering algorithm to group the multiple data pairs to obtain multiple data groups.
- the second server 012 can construct a database based on LSH technology according to the similarity model, making it a fast neighbor search for massive high-dimensional data (MAC addresses have undergone one-hot encoding and deep metric learning will be mapped into high-dimensional vectors) possible.
- the MAC address corresponding to the device model is roughly segmented and continuous, that is, manufacturers are used to assigning a continuous MAC address to terminal devices of the same device model, that is, the MAC address of terminal devices of the same device model usually occupies one MAC Address range.
- this segmentation continuity rule is extremely complicated. For example, a mobile phone of a certain model may occupy multiple MAC address ranges. With the increase of collected data, on the one hand, this kind of interval continuous law is more and more difficult to describe with rules.
- the database based on the traditional index method will make the query efficiency very low. Therefore, the embodiments of the present application provide a database creation solution based on the similarity model and LSH technology.
- the second server 012 may first determine the similarity between different MAC addresses through the similarity model, and then cluster the MAC addresses with higher similarity together through a clustering algorithm, that is, divide them into the same data group . Therefore, even if the MAC address of a mobile phone of a certain model occupies multiple MAC address ranges, after clustering based on similarity, all the MAC addresses of the mobile phone of this model can be divided into the same data group.
- the second server 012 can use the clustering algorithm k-centers (of course, other distance-based clustering algorithms can also be used) to divide the acquired data pairs into N data groups, namely N
- Each cluster can include one or more data pairs.
- the number of data pairs included in different data groups may be the same or different.
- N may be a preset integer greater than 1, for example, N may be equal to the number of device models included in the multiple data pairs, or equal to 10 times the number of device models.
- Step S11 randomly selecting a data pair from a plurality of data pairs as the clustering center of the first data group, that is, the center data pair.
- Step S12 Calculate the similarity between the MAC address in each remaining data pair and the MAC address in the central data pair of the first data group, and use the data pair with the smallest similarity (that is, the largest metric distance) as the second one The central data pair of the data group.
- Step S13 Continue to calculate the similarity between the MAC address of each remaining data pair and the MAC address of the central data pair of the first data group, and the similarity with the MAC address of the central data pair of the second data group , And use the data pair with the smallest similarity (that is, the largest measurement distance) as the center data pair of the third data group.
- the "smallest degree of similarity" mentioned in step S13 may mean that the sum of the similarities with the MAC addresses of the two center data pairs is the smallest, or the average of the similarities is the smallest.
- Step S14 For each data pair except the center data pair, determine the similarity between the MAC address of the data pair and the MAC address of each center data pair, and divide the data pair into the center data pair with the highest similarity. The data group to which it belongs.
- the data pair can be divided into the first data group.
- the second server 012 can divide the acquired data pair into N data groups.
- the k-centers algorithm can ensure that the MAC addresses of each data pair divided into the same data group are relatively similar, that is, the metric distance between the MAC addresses is small.
- the similarity of the MAC addresses of two data pairs belonging to different data groups is lower, that is, the metric distance of the MAC addresses of the two data pairs is larger. Therefore, for the target MAC address to be queried, the close neighbors of the target MAC address can be quickly locked by matching the similarity between the target MAC address and the MAC address of each center data pair (that is, the MAC address with higher similarity to the target MAC address). ) Is located in the area.
- the second server 012 uses a similarity model and a clustering algorithm to cluster the multiple data pairs to obtain c1 to There are 7 data groups in c7, and the device model of each data pair in each data group can be the same.
- each data pair in the database can be represented by a two-level index (index 1, index 2).
- the index 1 of each data pair may be the index of the data group to which it belongs, that is, the index 1 of each data pair in the same data group is the same.
- the index 1 of each data pair in the data group c1 may be 001
- the index 1 of each data pair in the data group c2 may be 002.
- Step 403 For each data group, according to the similarity between the MAC address of the central data pair in the data group and the MAC address of each other data pair, the data pairs included in the data group are in descending order of similarity. The order is sorted.
- the second server 012 can also perform data pairs in each data group in order of the similarity of the MAC addresses of the central data pair from high to low. Sort. After that, the second server 012 can assign index 2 to each sorted data pair.
- the second server 012 may use the VP-tree algorithm to sort the data pairs in each data group.
- the sorting process of data pairs in any data group is as follows:
- Step S21 Determine the center data pair in the data group as the root node of the VP-tree.
- Step S22 Calculate the similarity between the MAC address of each other data pair in the data group and the MAC address of the root node, and divide the data pairs except the root node into two subsets according to the median of the calculated similarity. .
- a subset of the two subsets may include: data pairs whose similarity with the MAC address of the root node is greater than or equal to the median, and the subset is the left subtree of the VP-tree.
- Another subset may include: data pairs whose similarity with the MAC address of the root node is less than the median, and the subset is the right subtree of the VP-tree.
- Step S23 For each subset, select a data pair from the subset as the new child node of the subset, calculate the similarity between the MAC addresses of other data pairs in the subset and the MAC address of the child node, and then The calculated median of similarity divides the other data except the child nodes into two subsets again.
- the new child node may be any data pair in the subset; or it may be a data pair corresponding to the median of the similarity between each data pair in the subset and the MAC address of the root node; and Or, it may also be a data pair with the highest similarity to the MAC address of the root node in the subset.
- each data pair in the VP-tree can be arranged in the order from top to bottom (that is, from the root node to the tail node) and from left to right (that is, from the left subtree to the right subtree).
- a certain data group includes 7 data pairs D1 to D7, where data pair D1 is the central data pair, and the second server 012 uses the central data pair D1 as the root node of the VP-
- the tree can be as shown in Figure 13.
- the 7 data pairs in the VP-tree are arranged in the order of D1, D3, D2, D6, D5, D4, and D7.
- a database constructed based on LSH technology can make data pairs of similar device models have similar positions in the database, thereby facilitating neighbor search.
- the embodiment of the present application also provides a method for training a model recognition model.
- the method may include the following steps:
- Step S31 The second server 012 obtains multiple MAC address samples and the device model corresponding to each MAC address sample.
- the multiple MAC address samples and the device model corresponding to each MAC address sample may be delivered by the network management device 03 to the second server 012.
- Step S32 The second server trains the multiple MAC address samples and the device model corresponding to each MAC address sample to obtain a model identification model.
- the second server 012 may first preprocess the acquired data, for example, remove dirty data. Then encode each MAC address sample obtained after preprocessing. After that, the second server 012 can use the encoded MAC address sample as the input of the model, the device model corresponding to the MAC address sample as the target output of the model, and use deep learning or random forest algorithms to train the model until the loss function (
- the model identification model can be obtained by convergence of loss function. Since the model recognition model can output the probability that the terminal device belongs to each device model among multiple device models, the model recognition model can also be called a classifier.
- the model identification server 01 may also send the identified device model to the network management device 03.
- the network administrator can manually verify the device model. If the device model is wrong, the network administrator can input the corrected device model to the network management device 03.
- the network management device 03 can also periodically send the collected correction data to the model recognition server 01, so that the model recognition server 01 can perform the model (such as similarity model, model determination model, and model recognition model) based on the correction data. Heavy training.
- the corrected data may include the device name, MAC address, and corrected device model; or may only include the MAC address and corrected device model.
- the network management device 03 may send the correction data to the second server 012 to trigger the second server 012 to retrain the model.
- the second server 012 After the second server 012 completes the retraining, it can send the updated model to the first server 011 so that the first server 011 can identify the device model based on the updated model.
- the device model can be determined by combining the two parameters of the MAC address and the device name.
- the device model determined based on the device name can be used as the device model of the terminal device, and for the device name that cannot provide a valid device model, the device model can be determined based on the MAC address.
- the recognition rate can be increased from 55% to 95%.
- the recognition rate refers to the proportion of the number of terminal devices that have recognized the device model in the total number of terminal devices that have recognized the device model through the model recognition server.
- an NLP-based model determination model is used.
- This model can automatically learn the extraction rules of the device model based on the label data. Compared with traditional manual labeling or regular expression based on complex rules
- the labeling method has low development cost and high model versatility, which is easy to generalize to different regions or different languages, and the development and maintenance cost of the model is low.
- the solutions provided by the embodiments of this application mainly have the following advantages: (1) The method of determining the device model based on the MAC address and the method of determining the device model based on the device name are both based on the principles of big data and statistics, and the recognition accuracy rate is Can be guaranteed. (2) Both methods are data-driven, do not rely on rules, and the methods are easy to generalize to different regions and different humanistic environments; model maintenance costs are small, and the model only needs to be retrained simply after the data increases. (3) The models used by the two methods are very small, the recognition speed is fast, and the recognition efficiency is high. (4) The algorithm theory framework of the two methods is the same and can be reused, which can effectively reduce the development workload. (5) It has a retraining mechanism, which can effectively enhance the accuracy and robustness of model prediction.
- Fig. 14 is a schematic structural diagram of a device model identification device provided by an embodiment of the present application. As shown in Fig. 14, the device may include:
- the first obtaining module 501 is configured to obtain the target physical address of the target terminal device.
- the function implementation of the first obtaining module 501 reference may be made to the related description of step 101 above.
- the first determining module 502 is configured to determine a first number of candidate physical addresses from a database.
- the database includes multiple physical addresses and a device model corresponding to each physical address.
- Each candidate physical address corresponds to the target physical address.
- the similarity of the address is greater than the similarity of other physical addresses in the database and the target physical address, and the first number is an integer greater than one.
- the second determining module 503 determines the device model with the most occurrences among the device models corresponding to the first number of candidate physical addresses as the device model of the target terminal device. For the function implementation of the second determining module 503, reference may be made to the related description of the above step 106.
- the database includes multiple data groups, each data group includes one or more data pairs, and each data pair includes a physical address and a device model corresponding to the physical address; the first determining module 502 may Used for:
- the second number of candidate data groups, and the physical address of any data pair in each candidate data group and the target The degree of similarity of the physical address is greater than the degree of similarity between the physical address of any data pair in the other data groups and the target physical address, wherein the second number is an integer greater than one;
- each data group there is a central data pair in each data group, and the data pairs in each data group are arranged in descending order of similarity with the physical address of the central data pair; Any data pair is the central data pair;
- the first determining module 502 may be used to determine the first number of candidate physical addresses from the physical addresses included in the second number of candidate data groups according to the arrangement sequence of the data pairs in each candidate data group .
- step 1051 For the functional realization of the first determining module 502, reference may also be made to the related description of the above step 1051 and step 1052.
- the device may further include:
- the second acquisition module 504 is configured to acquire the multiple data pairs.
- the function implementation of the second acquiring module 504 reference may be made to the related description of the above step 401.
- the clustering module 505 is configured to use a clustering algorithm to group the multiple data pairs to obtain the multiple data groups.
- a clustering algorithm to group the multiple data pairs to obtain the multiple data groups.
- the sorting module 506 is configured to, for each data group, according to the similarity between the physical address of the central data pair in the data group and the physical addresses in each other data pair, the data pairs included in the data group are determined by the similarity according to the similarity. Sort from high to low. For the functional realization of the sorting module 506, reference may be made to the relevant description of the above step 403.
- the device may further include:
- the third determining module 507 is configured to determine the similarity between the physical address in the database and the target physical address using a similarity model before the first number of candidate physical addresses is determined from the database; wherein, the similarity model Trained based on multiple physical address samples with determined similarity.
- the first obtaining module 501 may also be used to obtain the target device name of the target terminal device.
- the device may further include:
- the fourth determining module 508 is configured to determine the candidate device model of the target terminal device according to the name of the target device. For the function implementation of the fourth determining module 508, reference may be made to the related description of step 102 above.
- the fifth determining module 509 is configured to determine the candidate device model as the device model of the target terminal device if the candidate device model is not an unknown model.
- the fifth determining module 509 For the functional realization of the fifth determining module 509, reference may be made to the related description of the foregoing step 104.
- the first determining module 502 can be used to determine the first number of candidate physical addresses from the database if the candidate device model is the unknown model.
- the fourth determining module 508 may be used to:
- the model determination model is used to determine the candidate device model of the target terminal device from the target device name; wherein the model determination model is trained based on multiple device name samples of the determined device model.
- the fourth determining module 508 may be used to:
- the model determination model is used to determine whether each character in the target device name is a valid character; the character string composed of valid characters in the target device name is determined as the candidate device model of the target terminal device.
- the device may further include:
- the third obtaining module 510 is configured to obtain a device name sample and a device model sample corresponding to the device name sample.
- the third acquiring module 510 reference may be made to the related description of step 201 above.
- the sixth determining module 511 is configured to mark each character in the device name sample that matches the device model sample as a valid character, and mark all characters except the character string as invalid characters .
- the function implementation of the sixth determining module 511 reference may be made to the related description of step 202 above.
- the training module 512 is configured to perform model training on the labeled device name sample to obtain the model determination model.
- the training module 512 For the function implementation of the training module 512, reference may be made to the relevant description of the above step 203.
- the fourth determining module 508 may be used to:
- the matching degree between the target device name and each device model template in the multiple device model templates is determined respectively; the device model template with the highest matching degree is determined as the candidate device model of the target terminal device.
- the embodiment of the present application provides a device model identification device, which can determine the first number with higher similarity from the database according to the similarity between the target MAC address and the MAC address stored in the database. Then, among the device models corresponding to the first number of candidate MAC addresses, the device model with the most occurrences is determined as the device model of the target terminal device. Therefore, even if the device model corresponding to the target MAC address is not stored in the database, the device model of the target terminal device can be determined according to the similarity of the MAC address, thereby effectively improving the success rate of device model identification and reducing the database Requirements for the amount of data stored in.
- the device provided by the embodiment of the present application can also determine the device model of the target terminal device by combining the two parameters of the device name and the MAC address. Compared with determining the device signal based on only the MAC address or the device name, it can effectively improve the device The success rate and reliability of model identification.
- the device model identification device can be implemented by an application-specific integrated circuit (ASIC) or a programmable logic device (PLD), and the above PLD can be Complex programmable logical device (CPLD), field-programmable gate array (FPGA), generic array logic (GAL) or any combination thereof.
- ASIC application-specific integrated circuit
- PLD programmable logic device
- CPLD Complex programmable logical device
- FPGA field-programmable gate array
- GAL generic array logic
- FIG. 17 is a schematic structural diagram of an apparatus for identifying a device model provided by an embodiment of the present application.
- the apparatus for identifying a device model may include: a processor 1701, a memory 1702, a network interface 1703, and a bus 1704.
- the bus 1704 is used to connect the processor 1701, the memory 1702, and the network interface 1703.
- the communication connection with other devices can be realized through the network interface 1703 (which can be wired or wireless).
- a computer program 17021 is stored in the memory 1702, and the computer program 17021 is used to implement various application functions.
- the processor 1701 may be a CPU, and the processor 1701 may also be other general-purpose processors, digital signal processors (DSP), application-specific integrated circuits (ASIC), field programmable gate arrays ( FPGA), GPU or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- DSP digital signal processors
- ASIC application-specific integrated circuits
- FPGA field programmable gate arrays
- GPU GPU or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- the general-purpose processor may be a microprocessor or any conventional processor.
- the memory 1702 may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
- the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
- the volatile memory may be random access memory (RAM), which is used as an external cache.
- RAM random access memory
- SRAM static random access memory
- DRAM dynamic random access memory
- SDRAM synchronous dynamic random access memory
- Double data rate synchronous dynamic random access memory double data date SDRAM, DDR SDRAM
- enhanced SDRAM enhanced synchronous dynamic random access memory
- SLDRAM synchronous connection dynamic random access memory
- direct rambus RAM direct rambus RAM
- bus 1704 may also include a power bus, a control bus, and a status signal bus. However, for clear description, various buses are marked as bus 1704 in the figure.
- the processor 1701 is configured to execute a computer program stored in the memory 1702, and the processor 1701 executes the computer program 17021 to implement the steps in the foregoing method embodiment.
- the embodiment of the present application also provides a computer-readable storage medium that stores instructions in the computer-readable storage medium, and when the computer-readable storage medium runs on a computer, the computer is caused to execute the steps in the above-mentioned method embodiment. .
- the embodiments of the present application also provide a computer program product containing instructions, which when the computer program product runs on a computer, cause the computer to execute the steps in the foregoing method embodiments.
- the embodiment of the present application also provides a device model identification system.
- the system may include: a first server 011 and a second server 012.
- the first server 011 can be used to implement the steps in the method embodiments shown in Figures 3, 4, and 6; the second server 012 can be used to implement the steps in the method embodiments shown in Figures 9 to 11 step.
- the first server 011 may include the device shown in FIG. 14 or FIG. 16.
- the second server 012 may include modules 504 to 506 and modules 510 to 512 in the apparatus shown in FIG. 15.
- the system may also include: a gateway device 02.
- the gateway device 02 may be connected to the terminal device 04 and the first server 011 respectively, and the gateway device 02 may be used to obtain the MAC address of the terminal device 04 and send the obtained MAC address to the first server 011.
- the device model identification system provided in the embodiment of the present application may also include only the first server 011, and the first server 011 may be used to implement as shown in FIG. 3, FIG. 4, FIG. 6, and FIG. 9-11.
- the first server 011 may include the device shown in FIG. 15.
- the disclosed system, device, and method can be implemented in other ways.
- the division of the modules is only a logical function division, and there may be other divisions in actual implementation.
- multiple modules or components can be combined or integrated into another system, or some features can be ignored or not. implement.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or modules, and may also be electrical, mechanical or other forms of connection.
- modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical modules, that is, they may be located in one place, or they may be distributed on multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the objectives of the solutions of the embodiments of the present application.
- each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
- the above-mentioned integrated modules can be implemented in the form of hardware or software function modules.
- the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can be stored in a computer readable storage medium.
- the technical solution of the present application is essentially or a part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a computer program product.
- the computer program product includes one or more Computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part.
- the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
- the computer program product is stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
- the computer instructions can be passed from a website, computer, server, or data center.
- Wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
- wireless such as infrared, wireless, microwave, etc.
- the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
- the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本申请提供了一种设备型号的识别方法、装置及系统,属于数据处理技术领域。该方法可以根据获取到的目标物理地址,先从数据库中确定出与该目标物理地址相似度较高的第一数量个备选物理地址,然后可以将该第一数量个备选物理地址对应的设备型号中,出现次数最多的设备型号确定为目标终端设备的设备型号。由此,即使数据库中未存储该目标物理地址对应的设备型号,也可以根据物理地址的相似度确定出该目标终端设备的设备型号,从而有效提高了设备型号识别的成功率,降低了对数据库中存储的数据量的要求。
Description
本申请要求于2020年3月24日提交的申请号为202010211207.9、发明名称为“设备型号的识别方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及数据处理领域,特别涉及一种设备型号的识别方法、装置及系统。
在对网络中的终端设备(例如手机、路由器或智能设备等)进行维护或故障检测时,可以通过该终端设备的设备型号获取该终端设备的规格参数,进而可以基于该规格参数进行较为高效的维护或故障检测。
相关技术中,网关(gateway)设备可以获取其所连接的终端设备的介质访问控制(media access control,MAC)地址(也称为物理地址),并可以通过查询数据库中预先存储的MAC地址与设备型号的对应关系,确定终端设备的设备型号。
但是,若数据库中未记录终端设备的MAC地址,则无法确定终端设备的设备型号,相关技术中的方法对数据库中存储的数据量要求较高,设备型号识别的成功率较低。
发明内容
本申请提供了一种设备型号的识别方法、装置及系统,可以解决相关技术中的方法对数据库存储的数据量要求较高,识别成功率较低的问题,技术方案如下:
一方面,提供了一种设备型号的识别方法,该方法可以获取目标终端设备的目标MAC地址;并从数据库中确定第一数量个备选MAC地址,该数据库包括多个MAC地址,以及每个MAC地址对应的设备型号,其中每个备选MAC地址与该目标MAC地址的相似度,大于该数据库中其他MAC地址与该目标MAC地址的相似度,该第一数量为大于1的整数;之后可以将该第一数量个备选MAC地址对应的设备型号中,出现次数最多的设备型号确定为该目标终端设备的设备型号。
基于本申请提供的方案,即使数据库中未存储目标MAC地址对应的设备型号,也可以根据MAC地址的相似度确定出目标终端设备的设备型号,从而有效提高了设备型号识别的成功率,降低了对数据库中存储的数据量的要求。
可选的,该数据库可以包括多个数据组,每个数据组包括一个或多个数据对,每个数据对包括一个MAC地址,以及与该MAC地址对应的设备型号;从该数据库中确定第一数量个备选MAC地址的过程可以包括:
根据该目标MAC地址与每个数据组中任一数据对的MAC地址的相似度,确定第二数量个备选数据组,每个备选数据组中该任一数据对的MAC地址与该目标MAC地址的相似度,大于其他数据组中该任一数据对的MAC地址与该目标MAC地址的相似度,其中,该第二数 量为大于1的整数;从该第二数量个备选数据组包括的MAC地址中确定出第一数量个备选MAC地址,每个备选MAC地址与该目标MAC地址的相似度,大于该第二数量个备选数据组中的其他MAC地址与该目标MAC地址的相似度。
通过先确定第二数量个备选数据组,可以缩小MAC地址的搜索范围,从而有效提高备选MAC地址的搜索效率。
可选的,每个数据组中存在一个中心数据对,且每个数据组中的数据对按照与该中心数据对的MAC地址的相似度由高到低的顺序排列;每个数据组中的任一数据对为该中心数据对;相应的,从该第二数量个备选数据组包括的MAC地址中确定出第一数量个备选MAC地址,包括:根据每个备选数据组中的数据对的排列顺序,从该第二数量个备选数据组包括的MAC地址中确定出第一数量个备选MAC地址。
由于每个数据组中的数据对按照相似度由高到低的顺序排列,因此在采用近邻搜索算法确定备选MAC地址时,能够较为快速的从备选数据组中确定出与目标MAC地址近邻(即相似度较高)的备选MAC地址,有效提高了备选MAC地址的搜索效率。
可选的,该方法还可以包括:获取多个数据对;采用聚类算法对多个数据对进行分组,得到多个数据组;对于每个数据组,根据该数据组中的中心数据对的MAC地址与其他每个数据对中的MAC地址的相似度,对该数据组包括的数据对按照相似度由高到低的顺序进行排序。采用上述方法构建的数据库可以便于搜索与目标MAC地址近邻的备选MAC地址。
可选的,在该从数据库中确定第一数量个备选MAC地址之前,该方法还可以包括:采用相似度模型确定该数据库中的MAC地址与该目标MAC地址的相似度;其中,该相似度模型基于已确定相似度的多个MAC地址样本训练得到。
该相似度模型可以基于深度度量学习算法训练得到,采用该相似度模型确定出的两个MAC地址的相似度可以较为准确的反映该两个MAC地址对应的设备型号的相似程度。
可选的,该方法还可以包括:获取该目标终端设备的目标设备名称;根据该目标设备名称确定该目标终端设备的备选设备型号;若该备选设备型号不为未知型号,即根据该目标设备名称能够确定出该目标终端设备的设备型号,则将该备选设备型号确定为该目标终端设备的设备型号;相应的,从数据库中确定第一数量个备选MAC地址的过程可以包括:若该备选设备型号为未知型号,即根据该目标设备名称无法确定出该目标终端设备的设备型号,则从数据库中确定第一数量个备选MAC地址。
本申请提供的方案还可以采用目标设备名称确定设备型号,相比于仅基于单一参数确定设备型号,有效提高了设备型号的识别成功率。
可选的,根据该目标设备名称确定该目标终端设备的备选设备型号的过程可以包括:采用型号确定模型从该目标设备名称中确定该目标终端设备的备选设备型号;其中,该型号确定模型基于已确定设备型号的多个设备名称样本训练得到。
可选的,采用型号确定模型从该目标设备名称中确定该目标终端设备的备选设备型号的过程可以包括:采用型号确定模型确定该目标设备名称中的每个字符是否为有效字符;将该目标设备名称中的有效字符组成的字符串确定为目标终端设备的备选设备型号。
采用神经网络模型从该目标设备名称中确定设备型号,可以确保设备型号的识别成功率,以及确保确定出的设备型号的可靠性。
可选的,该方法还可以包括:获取设备名称样本以及该设备名称样本对应的设备型号样本;将该设备名称样本中,与该设备型号样本匹配的字符串中的每个字符均标注为有效字符, 将除该字符串之外的其他字符均标注为无效字符;对标注后的设备名称样本进行模型训练,得到该型号确定模型。
可选的,根据该目标设备名称确定该目标终端设备的备选设备型号的过程可以包括:
分别确定该目标设备名称与多个设备型号模板中每个设备型号模板的匹配度;将匹配度最高的设备型号模板确定为该目标终端设备的备选设备型号。
基于模板匹配的方法无需预先训练神经网络模型,该方法的复杂度较低。
另一方面,提供了一种设备型号的识别装置,该装置可以包括至少一个模块,该至少一个模块可以用于实现上述方面所提供的设备型号的识别方法。
又一方面,提供了一种设备型号的识别装置,该装置可以包括:处理器,存储器,以及存储在该存储器上并能够在该处理器上运行的计算机程序,该处理器执行该计算机程序时实现如上述方面所提供的设备型号的识别方法。
再一方面,提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当该计算机可读存储介质在计算机上运行时,可以使得计算机执行如上述方面所提供的设备型号的识别方法。
再一方面,提供了一种设备型号的识别系统,该系统可以包括:第一服务器和第二服务器;该第一服务器可以用于执行上述方面所提供的设备型号的识别方法中确定设备型号的步骤;该第二服务器可以用于执行上述方面所提供的设备型号的识别方法中模型训练和/或数据库构建的步骤。
再一方面,提供了一种设备型号的识别系统,该系统可以包括:第一服务器,该第一服务器可以用于执行如上述方面所提供的设备型号的识别方法。
可选的,该系统还可以包括:网关设备;该网关设备分别与终端设备和该第一服务器连接,该网关设备用于获取该终端设备的MAC地址,并将获取到的该MAC地址发送至该第一服务器。
综上所述,本申请提供了一种设备型号的识别方法、装置及系统,本申请提供的方案可以根据目标MAC地址与数据库中存储的MAC地址的相似度,从数据库中确定出相似度较高的第一数量个备选MAC地址,然后再将该第一数量个备选MAC地址对应的设备型号中,出现次数最多的设备型号确定为目标终端设备的设备型号。由此,即使数据库中未存储该目标MAC地址对应的设备型号,也可以根据MAC地址的相似度确定出该目标终端设备的设备型号,从而有效提高了设备型号识别的成功率,降低了对数据库中存储的数据量的要求。
图1是本申请实施例提供的一种设备型号的识别系统的结构示意图;
图2是本申请实施例提供的一种设备型号的识别方法的流程图;
图3是本申请实施例提供的另一种设备型号的识别方法的流程图;
图4是本申请实施例提供的一种确定目标终端设备的备选设备型号的方法流程图;
图5是本申请实施例提供的一种目标设备名称的识向量的示意图;
图6是本申请实施例提供的一种确定第一数量个备选MAC地址的方法流程图;
图7是本申请实施例提供的一种数据库的结构示意图;
图8是本申请实施例提供的一种采用型号识别模型确定设备型号的示意图;
图9是本申请实施例提供的一种型号确定模型的训练方法的流程图;
图10是本申请实施例提供的一种相似度模型的训练方法的流程图;
图11是本申请实施例提供的一种数据库的构建方法的流程图;
图12是本申请实施例提供的一种划分数据组的示意图;
图13是本申请实施例提供的一种VP-tree的结构示意图;
图14是本申请实施例提供的一种设备型号的识别装置的结构示意图;
图15是本申请实施例提供的另一种设备型号的识别装置的结构示意图;
图16是本申请实施例提供的又一种设备型号的识别装置的结构示意图;
图17是本申请实施例提供的再一种设备型号的识别装置的结构示意图。
下面结合附图详细介绍本申请实施例提供的设备型号的识别方法、装置及系统。
终端设备的设备型号识别对于运营商的家庭网络业务来说有着重要的意义。众所周知,在家庭网络中决定用户体验的关键是无线保真(wireless fidelity,WiFi)的网速,而现网数据分析表明,导致WiFi网速较差的原因中,约有80%是因为手机和路由器等终端设备自身性能不支持相应WiFi带宽,而WiFi信号穿墙等因素只占20%。譬如,很多网关设备支持2.4吉赫兹(GHz)和5GHz的频段,而家庭所用的路由器或者手机有可能只支持2.4GHz的频段,由此即可导致用户不能享受5G频段带来的高速体验,进而导致用户投诉。
目前,运营商在处理家庭网络业务的质量投诉时通常采取上门服务,导致人工运维成本过高。而通过识别网关设备所连接的终端设备的设备型号,确定其对WiFi频段的支持能力,不仅可以快速定位家庭网络质量差的根本因素,从而降低运维人员上门服务的次数,亦可以提前分析家庭对于WiFi的需求,从而推荐合适的网络套餐。总而言之,设备型号的识别既可以降低运营商的人工运维成本,又可以提升用户的宽带使用体验。
图1是本申请实施例提供的一种设备型号的识别系统的结构示意图,如图1所示,该系统可以包括型号识别服务器01。其中,该型号识别服务器01可以是一台服务器,也可以是由若干台服务器组成的服务器集群,或者还可以是一个云计算服务中心。该系统还可以包括网关设备02,该网关设备02用于实现不同网络互连的设备,又可以称为网间连接器或协议转换器。该系统还可以包括网管设备03,该网管设备03可以是一台电脑。
参考图1可以看出,该网关设备02可以连接有多个终端设备04,每个终端设备04可以为手机、电脑、路由器、可穿戴设备或家居设备等智能终端设备。其中家居设备可以包括音箱、电子秤、电视和空调等。
图2是本申请实施例提供的一种设备型号的识别方法的流程图,结合图1和图2,该网关设备02可以用于实现步骤A:数据采集,即该网关设备02可以采集终端设备04的MAC地址(或如图2所示的MAC地址和设备名称),并将采集到的数据发送至型号识别服务器01。例如参考图2,网关设备02采集并上报的数据可以包括设备名称N1至N3,以及每个设备名称对应的MAC地址,即MAC地址M1至M3。
型号识别服务器01可以用于实现步骤B:设备型号的识别。例如参考图2,该型号识别服务器01可以根据MAC地址识别设备型号(步骤B2);该型号识别服务器01也可以根据设备名称识别设备型号(步骤B1);或者,该型号识别服务器01可以执行步骤B2和步骤B1, 并且合并通过步骤B1和步骤B2得到的识别结果,得到最终的设备型号。例如图2所示,型号识别服务器01基于设备名称N1和MAC地址M1,最终识别出的设备型号可以为X1。该型号识别服务器01进而还可以将最终确定出的设备型号发送至网管设备03。
该网管设备03可以显示型号识别服务器01识别出的设备型号(步骤C)。管理该网管设备03的网络管理员(例如运营商的运维人员)进而可以基于识别出的设备型号确定终端设备04的规格参数,以便进行网络维护、故障检测或需求分析。
可选的,如图1所示,该型号识别服务器01可以包括第一服务器011以及第二服务器012。该第二服务器012可以用于构建数据库,以及根据标签数据(labeled data)训练用于进行设备型号识别的神经网络模型,并将构建的数据库以及训练好的神经网络模型发送至第一服务器011。其中,标签数据是指已确定设备型号的样本数据。第一服务器011进而可以基于该数据库和神经网络模型识别终端设备04的设备型号。因此,其中第一服务器011也可以称为在线识别服务器,该第二服务器012也可以称为离线训练服务器。
在本申请实施例中,该网络管理员还可以对网管设备03提供的终端设备04的设备型号进行校验。若确定该设备型号有误,则网络管理员可以在网管设备03中输入正确的设备型号,并指示网管设备03将纠正数据(包括正确的设备型号、MAC地址和设备名称)发送至第二服务器012。第二服务器012可以周期性地基于累积的纠正数据对神经网络模型进行重训练,得到更新后的神经网络模型,并将该更新后的神经网络模型发送至该第一服务器011。也即是,该网管设备03还可以指示第二服务器012实现图2所示的步骤D。
需要说明的是,该第一服务器011和第二服务器012可以为集成的一个服务器,也可以为相互独立的两个服务器。或者,该第一服务器011与该网管设备03可以为集成的一个设备。又或者,该系统中也可以无需网关设备02,终端设备04可以直接向型号识别服务器01上报MAC地址,或者该网管设备03收集相关数据并向型号识别服务器01上报。
本申请实施例提供了一种设备型号的识别方法,该方法可以应用于如图1所示的型号识别服务器01中,例如可以应用于该第一服务器011中。参考图3,该方法可以包括:
步骤101、获取目标终端设备的目标MAC地址和目标设备名称。
该目标终端设备是指待查询设备型号的终端设备。
作为一种可选的实现方式,型号识别服务器01可以获取网关设备02上报的目标终端设备的目标MAC地址和目标设备名称。
参考图1,网关设备02可以周期性采集其所连接的终端设备的MAC地址和设备名称,并向型号识别服务器01上报终端设备的MAC地址和设备名称。其中,该网关设备02采集MAC地址和设备名称的周期可以根据需求灵活调整,例如可以为1小时、一天或一个月等。
或者,网关设备02可以响应于型号识别服务器01或网管设备03发送的采集指令,采集目标终端设备的目标MAC地址和目标设备名称并上报至型号识别服务器01。其中,该采集指令中可以携带有能够唯一标识目标终端设备的标识,例如互联网协议(Internet protocol,IP)地址。
作为另一种可选的实现方式,型号识别服务器01也可以直接获取目标终端设备的目标MAC地址和目标设备名称。
示例的,如图1所示,假设该网关设备02连接有手机、电脑、电视和路由器共4个终端设备04,则该网关设备02采集并上报的该4个终端设备04的MAC地址和设备名称可以如 表1所示。参考表1,网管设备02可以向型号识别服务器01上报4个设备信息,每个设备信息也可以称为一条设备记录(record)。其中每个设备信息可以包括MAC地址和设备名称两个字段。例如,该型号识别服务器01获取到的网管设备02上报的手机的设备信息可以包括目标MAC地址:001234AB56C1,以及目标设备名称:小明的AA-a1-666。
表1
MAC地址 | 设备名称 |
001234AB56C1 | 小明的AA-a1-666 |
001234EF56B2 | 小明的电脑 |
005678AB56C1 | 小明家的电视 |
008912AC56C1 | CC-c1 |
步骤102、根据该目标设备名称确定该目标终端设备的备选设备型号。
在本申请实施例中,型号识别服务器01获取到目标终端设备的目标MAC地址和目标设备名称后,可以先根据该目标设备名称确定该目标终端设备的备选设备型号。
作为一种可选的实现方式,该型号识别服务器01中可以存储有型号确定模型,该型号确定模型可以是基于自然语言处理(natural language processing,NLP)技术,采用已确定设备型号的多个设备名称样本训练得到的。型号识别服务器01获取到目标终端设备的目标设备名称后,可以采用该型号确定模型从该目标设备名称中确定该目标终端设备的备选设备型号。
如图4所示,采用该型号确定模型从该目标设备名称中确定该目标终端设备的备选设备型号的过程可以包括:
步骤1021、对目标设备名称进行编码,得到机器可识别的名称向量。
由于设备名称通常是由汉字、数字、字母以及标点符号等字符组成,而这些字符中的大部分字符(例如汉字、字母和标点符号等)是机器学习算法不能理解的,因此型号识别服务器01需要先对目标设备名称进行编码,得到机器可识别的名称向量。
可选的,可以采用独热(one-hot)编码算法对目标设备名称进行编码,从而将目标设备名称中的每个字符均转化成机器可以识别的字符,这些机器可识别的字符组成了名称向量。其中,采用one-hot编码算法对目标设备名称进行编码的过程相当于根据预先创建的映射表,将目标设备名称中的每个字符映射成一个整数。
例如,a-z这26个阿拉伯字母,可以被映射成0-25这26个整数;0-9这10个数字,可以被映射成26-35这些整数;常在设备名称中出现的标点符号,诸如“-”,可以被映射成35-38这3个整数;常用的汉字也可以被映射成38之后的整数。譬如“小”可以被映射成39,“明”可以被映射成40,“的”可以被映射成41。
示例的,假设型号识别服务器01获取到的目标设备名称为“小明的AA-a1-666”,则该型号识别服务器01对该目标设备名称进行one-hot编码后,可以得到名称向量[39,40,41,1,1,36,1,27,36,32,32,32],该名称向量为矢量。
步骤1022、将名称向量输入至型号确定模型,得到型号确定模型输出的标识向量。
型号识别服务器01可以将编码得到的名称向量输入至型号确定模型,得到型号确定模型输出的标识向量。该标识向量可以包括多个标识符,每个标识符可以用于标识该目标设备名称中的一个字符是否为有效字符。也即是,型号识别服务器01可以采用该型号确定模型确定 该目标设备名称中的每个字符是否为有效字符。其中有效字符是指用于组成目标终端设备的设备型号的字符。在本申请实施例中,可以通过标识符的不同取值表示其所指示的字符是否为有效字符。例如,标识符的取值为第一整数时可以表示其所指示的字符为有效字符,标识符的取值为第二整数时可以表示其所指示的字符为无效字符。该第一整数与该第二整数不同。
示例的,该标识向量可以是一个长度与该名称向量的长度相等的整数序列,且每个标识符的取值范围可以为[0,1,2]。其中,标识符取值为0可以表示其指示的字符为有效字符,且该字符为设备型号的起始字符(即首个字符)。标识符取值为1可以表示其指示的字符为有效字符,且该字符为设备型号的中间字符。标识符取值为2则可以表示其指示的字符为无效字符。也即是,第一整数包括0和1,第二整数为2。
参考图5,假设目标终端设备的设备型号为AA-a1,目标设备名称为“小明的AA-a1-666”,则型号识别服务器01将名称向量[39,40,41,1,1,36,1,27,36,32,32,32]输入至型号确定模型后,该型号确定模型输出的标识向量可以为[2,2,2,0,1,1,1,1,1,2,2,2]。
步骤1023、根据该标识向量从目标设备名称中确定目标终端设备的备选设备型号。
型号识别服务器01可以根据该标识向量,将该目标设备名称中的有效字符组成的字符串确定为该目标终端设备的备选设备型号。
示例的,如图5所示,型号识别服务器01可以将标识向量[2,2,2,0,1,1,1,1,1,2,2,2]中,标识符0和标识符1指示的字符串“AA-a1”确定为目标终端设备的备选设备型号。
需要说明的是,对于某些无意义的目标设备名称,型号确定模型输出的标识向量中的每个标识符指示的字符可能均为无效字符,对于目标设备名称中的字符均为无效字符的情况,型号识别服务器01可以确定该目标终端设备的设备型号未知(unknown),即确定该目标终端设备的备选设备型号为未知型号。比如对于目标设备名称“小明的电脑”,型号确定模型输出的标识向量可以是[2,2,2,2,2],则型号识别服务器01基于该标识向量可以确定该目标终端设备的备选设备型号为unknown。
作为另一种可选的实现方式,该型号识别服务器01中可以预先存储有多个设备型号模板。型号识别服务器01获取到目标终端设备的目标设备名称后,可以分别确定该目标设备名称与每个设备型号模板的匹配度,并可以将匹配度最高的设备型号模板确定为该目标终端设备的备选设备型号。
可选的,该型号识别服务器01可以采用最大公约子串匹配法,计算该目标设备名称与每个设备型号模板的最大公约子串长度,并可以将长度最大的设备型号模板确定为该目标终端设备的备选设备型号。
示例的,假设型号识别服务器01获取到的目标设备名称为“小明的AA-a1-666”,且型号识别服务器01中存储的设备型号模板包括“AA-a1”、“AA-a2”以及“BB-b1”。则型号识别服务器01采用最大公约子串匹配法进行匹配度计算后,可以确定与该目标设备名称为“小明的AA-a1-666”匹配度最高的设备型号模板为“AA-a1”,因此可以将该设备型号模板“AA-a1”确定为该目标终端设备的备选设备型号。
基于模板匹配的方法无需预先训练神经网络模型,该方法的复杂度较低。
步骤103、检测该备选设备型号是否为未知型号。
由于用户可以自主设置终端设备的设备名称,因此型号识别服务器01基于目标设备名称并不一定能确定出可靠的备选设备型号。例如现网数据表明,大约55%的设备名称能够能给出有效的设备型号的信息,而剩余约45%的设备名称则是无效的。
因此在本申请实施例中,型号识别服务器01在基于目标设备名称确定出备选设备型号之后,还可以进一步检测该备选设备型号是否为未知型号。若该备选设备型号不为未知型号,则型号识别服务器01可以执行步骤104;若该备选设备型号为未知型号,则型号识别服务器01可以执行步骤105。
示例的,型号识别服务器01可以检测该备选设备型号是否为unknown,若不为unknown,则可以执行步骤104;若为unknown,则可以执行步骤105。
步骤104、将该备选设备型号确定为该目标终端设备的设备型号。
若基于目标设备名称确定出的备选设备型号不为未知型号,则型号识别服务器01可以直接将该备选设备型号确定为该目标终端设备的设备型号。
步骤105、根据该目标MAC地址从数据库中确定第一数量个备选MAC地址。
若基于目标设备名称确定出的备选设备型号为未知型号,则型号识别服务器01可以继续根据该目标终端设备的目标MAC地址确定目标终端设备的设备型号,从而提升该设备型号识别的成功率。
在本申请实施例中,型号识别服务器01中可以存储有数据库,该数据库包括多个MAC地址,以及每个MAC地址对应的设备型号。型号识别服务器01获取到目标MAC地址后,可以计算该目标MAC地址与该数据库中每个MAC地址的相似度,并可以基于计算得到的相似度,从数据库中确定出第一数量个备选MAC地址。
其中,每个备选MAC地址与该目标MAC地址的相似度,大于该数据库中其他MAC地址与该目标MAC地址的相似度。也即是,该型号识别服务器01可以从数据库中确定出与该目标MAC地址相似度最高的第一数量个备选MAC地址。该第一数量可以为大于等于1的整数,例如可以为5或者10等。
可选的,型号识别服务器01可以采用相似度模型确定该数据库中的MAC地址与该目标MAC地址的相似度。该相似度模型可以是预先基于已确定相似度的多个MAC地址样本训练得到的。采用该相似度模型确定出的两个MAC地址的相似度,可以反映该两个MAC地址对应的设备型号相同的概率。即两个MAC地址的相似度越高,表明该两个MAC地址对应的设备型号相同的概率越高。
在本申请实施例中,为了提高查询备选MAC地址的效率,该数据库可以采用局部敏感哈希(locality-sensitive hashing,LSH)技术构建得到。采用该LSH技术构建的数据库可以包括多个数据组,每个数据组包括一个或多个数据对,每个数据对包括一个MAC地址,以及与该MAC地址对应的设备型号。
其中,该多个数据组可以是采用聚类算法划分得到的,因此每个数据组也可以称为一个数据簇(cluster)。每个数据对在数据库中的存储位置可以用两级索引(索引1,索引2)表示。其中,索引1(index1)为数据组的索引,索引2(index2)为数据对在数据组中的索引。由于基于该LSH技术构建的数据库可以将设备型号相同或相似的数据对划分至同一个数据组,因此设备型号相同或相似的数据对的索引1相同。
表2
索引1/索引2 | 001 | 002 | 003 |
001 | 数据对1 | 数据对2 | 数据对3 |
002 | 数据对4 | 数据对5 | 数据对6 |
003 | 数据对7 | 数据对8 | 数据对9 |
示例的,参考表2,假设该数据库中存储有9个数据对,该9个数据对被划分为3个数据组,该3个数据组的索引分别为001、002和003。则每个数据组中各个数据对中的设备型号可以均相同。例如,索引1为001的3个数据对中的设备型号可以均为AA-a1。
图6是本申请实施例提供的一种确定第一数量个备选MAC地址的方法流程图,如图6所示,上述步骤105可以包括:
步骤1051、根据目标MAC地址与每个数据组中任一数据对的MAC地址的相似度,确定第二数量个备选数据组。
在本申请实施例中,由于数据库中的多个数据对已经基于聚类算法被划分为了多个数据组,每个数据组中各个数据对的MAC地址的相似度较高,而属于不同数据组的两个数据对的MAC地址的相似度较低。因此为了提高查询的效率,型号识别服务器01可以先计算该目标MAC地址与每个数据组中任一数据对的MAC地址的相似度,得到与该数据库包括的数据组的个数相同数量的相似度。之后,型号识别服务器01即可基于计算得到的相似度,确定第二数量个备选数据组。
其中,每个备选数据组中该任一数据对的MAC地址与该目标MAC地址的相似度,大于其他数据组中该任一数据对的MAC地址与该目标MAC地址的相似度。也即是,该型号识别服务器01可以先确定出与该目标MAC地址最相似的第二数量个备选数据组。该第二数量可以为大于等于1的整数,且该第二数量与该第一数量可以相同,也可以不同。并且,各个数据组中所选取用于与该目标MAC地址进行相似度计算的任一数据对的索引2可以相同。例如,每个数据组中所选取的任一数据对的索引2可以均为001,即可以分别计算该目标MAC地址与每个数据组中的第一个数据对的MAC地址的相似度。
可选的,该数据库中的多个数据组可以是基于k中心(k-centers)聚类算法划分得到的,则每个数据组中存在一个中心数据对。并且,每个数据组中的中心数据对可以位于第一位,其他数据对可以按照与该中心数据对的MAC地址的相似度由高到低的顺序排列。相应的,型号识别服务器01在计算该目标MAC地址与每个数据组中任一数据对的MAC地址的相似度时,可以选择每个数据组中的中心数据对进行计算。
示例的,参考表2,假设索引1为001的数据组中的中心数据对为数据对1,索引1为002的数据组中的中心数据对为数据对4,索引1为003的数据组中的中心数据对为数据对7,则该3个中心数据对的索引2可以均为001,即均可以位于其所属的数据组中的第一位。型号识别服务器01在获取到目标MAC地址后,可以分别计算该目标MAC地址与数据对1中的MAC地址的相似度y1,该目标MAC地址与数据对4中的MAC地址的相似度y2,以及该目标MAC地址与数据对7中的MAC地址的相似度y3。若该第二数量为2,且型号识别服务器01计算得到三个相似度满足:y1>y3>y2,则型号识别服务器01可以索引1为001的数据组以及索引1为003的数据组确定为备选数据组。
步骤1052、从该第二数量个备选数据组包括的MAC地址中确定出第一数量个备选MAC地址。
对于任意一个备选数据组,其中的每个备选MAC地址与该目标MAC地址的相似度,可以大于该备选数据组中的其他MAC地址与该目标MAC地址的相似度。也即是,该型号识别 服务器01可以从该第二数量个备选数据组包括的MAC地址中确定出与该目标MAC地址最相似的第一数量个备选MAC地址。
在本申请实施例中,为了提高备选MAC地址的查询效率,该型号识别服务器01可以采用近邻搜索算法,从该第二数量个备选数据组包括的MAC地址中确定出第一数量个备选MAC地址。
例如,若每个数据组中的数据对均按照与该中心数据对的MAC地址的相似度由高到低的顺序排列,则型号识别服务器01可以根据每个备选数据组中的数据对的排列顺序,从该第二数量个备选数据组包括的MAC地址中确定出第一数量个备选MAC地址。
可选的,每个数据组包括的一个或多个数据对可以采用制高点树(vantage point tree,VP-tree)算法进行排序。相应的,型号识别服务器01可以采用该VP-tree算法从该第二数量个备选数据组包括的MAC地址中确定出第一数量个备选MAC地址,由此可以大大加速每个数据组内的数据对的近邻搜索效率。
示例的,假设该第一数量为10,第二数量为5,则型号识别服务器01计算目标MAC地址与每个数据组中的中心数据对(即索引2为001的数据对)的MAC地址的相似度后,可以从数据库中确定出5个备选数据组。假设该5个备选数据组分别为c1,c2,c3,c4和c5,则得益于数据库构建中采取的聚类算法,与目标MAC地址最相似的10个MAC地址较大概率在c1至c5这5个备选数据组中。
之后,对于c1至c5中的每个备选数据组,型号识别服务器01可以根据VP-tree查询该备选数据组中物理地址与该目标MAC地址最相似的若干个备选数据对。其中,不同备选数据组中确定出的备选数据对的个数可以相同,也可以不同。假设型号识别服务器01从每个备选数据组中均确定出5个备选数据对,则一共可以得到25个备选数据对,即可以确定出25个近邻数据对。最后,型号识别服务器01可以对该25个备选数据对按照与目标MAC地址的相似度由高到低的顺序进行排序,并确定出与目标MAC地址的相似度最高的10个备选数据对,该10个备选数据对中的10个MAC地址即为最终确定出的备选MAC地址,即与该目标MAC地址最近邻的10个MAC地址。
应理解的是,相比于与传统的数据库查询方式,本申请实施例所采用的VP-tree算法可以大大降低数据组内查询的复杂度。譬如,查询数据组内与目标MAC地址最相似的2个数据对时,传统的多路搜索树(B-tree)算法的计算复杂度为O(2*log(n)),其中n数据组中包括的数据对的个数。而采用VP-tree算法时,由于MAC地址相似的数据对在数据组中的位置也接近(即索引2接近),因此型号识别服务器01通过计算目标MAC地址与VP-tree节点的度量距离(即与中心数据对中的MAC地址的相似度)而迅速定位到最近邻点之后,就可以根据最近邻点在VP-tree中的位置的附近小范围内快速定位到其余的近邻点。因此采用VP-tree算法的计算复杂度可以降低至O(log(n))。
示例的,如图7所示,假设型号识别服务器01通过计算目标MAC地址与索引1为003,索引2为001的中心数据对的MAC地址的相似度后,确定该索引1为003的数据组为备选数据组。该索引1为003的数据组的中心数据对的索引2为001,则型号识别服务器01可以在与该索引1为003,索引2为001的中心数据对的附近,快速定位到MAC地址与该目标MAC地址相似的其他数据对。例如图7中所示,由于构建VP-tree时,索引2为002的数据对,以及索引2为003的数据对与该中心数据对的相似度较高,因此该两个数据对在VP-tree中位于该中心数据对的附近,进而型号识别服务器01可以快速确定出该索引1为003的备选 数据组中,与目标MAC地址近邻的其他数据对为索引2为002的数据对,以及索引2为003的数据对。
步骤106、将该第一数量个备选MAC地址对应的设备型号中,出现次数最多的设备型号确定为该目标终端设备的设备型号。
型号识别服务器01确定出第一数量个备选MAC地址后,可以确定每个备选MAC地址对应的设备型号,并统计每个设备型号的出现次数,即重复次数。之后,型号识别服务器01可以将出现次数最多的设备型号确定为该目标终端设备的设备型号。
需要说明的是,为了确保最终确定出的设备型号的可靠性,型号识别服务器01中可以预先存储有相似度阈值,型号识别服务器01在确定出第一数量个备选MAC地址后,还可以先剔除与该目标MAC地址的相似度小于该相似度阈值的备选MAC地址,然后再从剩余的备选MAC地址对应的设备型号中,确定该目标终端设备的设备型号。若每个备选MAC地址与该目标MAC地址的相似度均小于该相似度阈值,则型号识别服务器01可以确定该目标终端设备的设备型号为未知型号,即设备型号为“unknown”。
还需要说明的是,在上述步骤101中,若型号识别服务器01获取到了多个目标终端设备的目标MAC地址和目标设备名称,则对于每个目标终端设备,型号识别服务器01均可以基于上述步骤102至106所示的方法识别该目标终端设备的设备型号。例如参考图2,型号识别服务器01可以根据设备名称N1和MAC地址M1确定出设备型号X1,并可以根据设备名称N2和MAC地址M2确定出设备型号X2。
还需要说明的是,本申请实施例提供的设备型号的识别方法的步骤先后顺序可以进行适当调整,步骤也可以根据情况进行相应增减。例如,上述步骤101中可以仅获取目标终端设备的目标MAC地址,相应的,步骤102至步骤104可以根据情况删除。也即是,型号识别服务器01可以根据目标MAC地址与数据库中存储的MAC地址的相似度,识别目标终端设备的设备型号。由此,即使数据库中未存储该目标MAC地址对应的设备型号,也可以根据MAC地址的相似度确定出该目标终端设备的设备型号,从而有效提高了设备型号识别的成功率,降低了对数据库中存储的数据量的要求。任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化的方法,都应涵盖在本申请的保护范围之内,因此不再赘述。
综上所述,本申请实施例提供了一种设备型号的识别方法,该方法可以结合设备名称和MAC地址两个参数来确定目标终端设备的设备型号,相比于仅基于MAC地址或者仅基于设备名称确定设备信号,可以有效提高设备型号识别的成功率和可靠性。
在本申请实施例中,该型号识别服务器01中还可以存储有型号识别模型,该型号识别模型基于已确定设备型号的多个MAC地址样本训练得到。因此型号识别服务器01在基于目标MAC地址确定设备型号时,除了可以采用上述步骤105和步骤106所示的查询数据库的方法,还可以直接将该目标MAC地址输入至该型号识别模型,从而得到该型号识别模型输出的设备型号。
可选的,型号识别服务器01可以先对目标终端设备的目标MAC地址进行one-hot编码,得到该目标MAC地址的地址向量。然后可以将该地址向量输入至该型号识别模型,该型号识别模型即可输出该目标终端设备属于不同设备型号的概率。型号识别服务器01进而可以将概率最高的设备型号确定为该目标终端设备的设备型号。若该型号识别模型输出的该目标终端设备属于每个设备型号的概率均小于概率阈值(例如0.8),则型号识别服务器01可以确定 该目标终端设备的设备型号为未知型号,例如可以将该目标终端设备的设备型号标注为unknown。
示例的,如图8所示,假设型号识别服务器01获取到的目标终端设备的目标MAC地址为00CDFE8C1ACE,则型号识别服务器01对该目标MAC地址进行one-hot编码后,可以得到地址向量[0,0,12,13,15,14,8,12,1,10,12,14]。之后型号识别服务器01可以将该地址向量输入至型号识别模型。若该型号识别模型输出的目标终端设备属于设备型号EE-e1的概率最高,则型号识别服务器01可以确定该目标终端设备的设备型号为EE-e1。
下文对该型号识别服务器01中预先存储的型号确定模型、相似度模型、数据库以及型号识别模型的构建过程进行介绍。
图9是本申请实施例提供的一种型号确定模型的训练方法的流程图,该方法可以应用于型号识别服务器01,例如可以应用于图1所示系统中的第二服务器012。下文以该训练方法应用于第二服务器012为例进行说明。如图9所示,该方法可以包括:
步骤201、获取设备名称样本以及该设备名称样本对应的设备型号样本。
在本申请实施例中,第二服务器012可以获取大量标签数据,每个标签数据可以包括一个设备名称样本以及该设备名称样本对应的设备型号样本。也即是,第二服务器012可以获取到大量已经确定了设备型号的设备名称样本。
步骤202、将该设备名称样本中,与该设备型号样本匹配的字符串中的每个字符均标注为有效字符,将除该字符串之外的其他字符均标注为无效字符。
第二服务器012可以根据设备型号样本对设备名称样本进行字符级的标注,得到标识向量样本。该标识向量样本包括多个标识符,每个标识符可以用于指示该设备名称样本中的一个字符是否为有效字符。例如标识符的取值为第一整数时可以表示其所指示的字符为有效字符,标识符的取值为第二整数时可以表示其所指示的字符为无效字符。通过对设备名称样本中的每个字符进行标注,也即是实现了对设备名称样本中的字符的分类。
示例的,参考图5,假设第二服务器012获取到的设备名称样本为“小明的AA-a1-666”,对应的设备型号样本为“AA-a1”,并且该第一整数包括0和1(其中0表示第一个有效字符,1表示位于中间的有效字符),第二整数为2。则第二服务器012可以确定设备名称样本中与该设备型号样本匹配的字符串为“AA-a1”,并可以将该字符串中的每个字符均标注为有效字符,即确定该每个字符的标识符均为第一整数。相应的,对于该设备名称样本中除字符串“AA-a1”之外的其他字符,则均可以标注为无效字符,即确定其他每个字符的标识符均为第二整数。如图5所示,第二服务器012对设备名称样本“小明的AA-a1-666”进行标注后得到的标识向量样本可以为[2,2,2,0,1,1,1,1,1,2,2,2],
步骤203、对标注后的设备名称样本进行模型训练,得到型号确定模型。
第二服务器012可以采用NLP技术对标注后的设备名称样本进行模型训练,得到型号确定模型。
可选的,第二服务器012还需对设备名称样本进行编码,得到机器可识别的名称向量样本。例如,第二服务器012可以采用one-hot编码算法对该设备名称样本进行编码,从而将该设备名称样本中的每个字符均转换为机器可识别的数值。之后,第二服务器012即可将该名称向量样本作为模型的输入,并将该标识向量样本作为模型的输出进行模型训练,得到该型号确定模型。
在本申请实施例中,该第二服务器012可以采用双向长短期记忆(Bi-directional long short term memory,Bi-LSTM)算法和条件随机场(conditional random fields,CRF)算法构建型号确定模型。该Bi-LSTM算法可以提取每个字符的上下文语义,CRF算法再对Bi-LSTM算法的处理结果选出合理的分类结果。通过上述方式,可以训练出较为优化的具有识别设备型号功能的神经网络模型,即型号确定模型。
其中,该模型训练过程还可以采用交叉验证机制,即第二服务器012可以将多个标签数据分成训练集和验证集。在神经网络模型的每一轮训练中,第二服务器012采用训练集中的标签数据集更新模型参数,再采用验证集中的标签数据验证更新结果。如此反复训练,直至模型最优。
图10是本申请实施例提供的一种相似度模型的训练方法的流程图,该方法可以应用于型号识别服务器01,例如可以应用于图1所示系统中的第二服务器012。下文以该训练方法应用于第二服务器012为例进行说明。如图10所示,该方法可以包括:
步骤301、对获取到的多个数据对进行预处理。
在本申请实时中,第二服务器012可以获取到网管设备03下发的用于训练相似度模型的多个数据对,每个数据对均包括一个MAC地址,以及与该MAC地址对应的设备型号。其中,每个数据对中的设备型号可以是采用型号确定模型对设备名称进行处理得到的。
可选的,为了确保获取到的数据对的可靠性,第二服务器012可以先对获取到的数据对进行预处理,以清洗掉一些脏数据。其中脏数据是指设备型号与MAC地址给出的厂商信息明显不吻合的数据。由于MAC地址通常由12个16进制整数(0-9,a-f)组成,其前六位为组织唯一标识符(organization unique identifier,OUI),该OUI通常用于标识设备的网卡生产商,因此第二服务器012可以通过MAC地址中的OUI识别厂商信息。
其中,设备型号与厂商信息不吻合可能有两个原因:(1)设备名称有误:用户可能会随意修改设备名称,比如将AA品牌的手机的设备名称修改为BB品牌;(2)MAC地址有误:某些终端设备会在连接网关时修改MAC地址,从而防止跟踪。
在本申请实施例中,在清洗脏数据时,第二服务器012可以检测根据设备型号确定出的设备厂商,与根据MAC地址确定出的设备厂商是否一致。若一致则保留数据对,若不一致则可以确定该数据对为脏数据,因此可以删除。
步骤302、根据预处理后的数据对,确定多个训练样本。
第二服务器012在完成数据对的清洗后,可以从预处理后的数据对包括的MAC地址中确定出多个MAC地址组,每个MAC地址组可以包括至少两个MAC地址。并且,对于每个MAC地址组,第二服务器012可以获取该MAC地址组中任意两个MAC地址的相似度,该相似度可以是人工标注的,也可以是第二服务器012根据该两个MAC地址所对应的设备型号的相似程度自动标注的。
第二服务器012获取到每个MAC地址组中任意两个MAC地址的相似度后,即可基于该多个MAC地址组以及获取到的相似度生成多个训练样本。其中,每个训练样本包括至少两个MAC地址样本,以及该至少两个MAC地址样本中任意两个MAC地址样本的相似度。
可选的,对于每个MAC地址样本,第二服务器012可以对其进行编码,得到机器可识别的地址向量样本。例如,第二服务器012可以对每个MAC地址样本中的每个字符进行one-hot编码。或者,也可以采用其他编码方式,比如也可以对MAC地址的前六位和后六位分别 采用不同的方式编码。
对于任意两个MAC地址样本的相似度,标注人员或第二服务器012可以根据该两个MAC地址样本对应的设备型号确定该两个MAC地址样本的相似度y的取值,且y的取值可以与两个MAC地址样本的相似度负相关。例如,当两个MAC地址样本所属的终端设备为不同厂商的设备时,相似度y的取值可以为:y=2,表示该两个MAC地址样本的相似度很低。当两个MAC地址样本所属的终端设备属于同一厂商但设备型号不同时,相似度y的取值可以为:y=1,表示该两个地址样本的相似度较高。当两个MAC地址样本所属的终端设备属于同一厂商,且设备型号相同时,相似度y的取值可以为:y=0,表示该两个地址样本的相似度很高。
示例的,假设两个MAC地址样本所属的终端设备分别为:AA品牌的型号为a1的手机(即设备型号为AA-a1),以及AA品牌的型号为a2的手机(即设备型号为AA-a2),则第二服务器012可以确定该两个MAC地址样本的相似度为:y=1。
步骤303、对该多个训练样本进行训练,得到相似度模型。
第二服务器012可以将每个训练样本中MAC地址样本的地址样本向量作为输入,将任意两个地址样本的相似度作为目标(target)输出,并采用深度度量学习(deep metric learning)算法进行训练模型,得到相似度模型。
基于该深度度量学习算法可以学习到的相似度模型可以理解为用于衡量MAC地址之间的相似度(也可以称为距离或度量距离)的函数,该相似度模型可以使得相同设备型号的终端设备的MAC地址的相似度较高(即度量距离较小),而不同设备型号的终端设备的MAC地址的相似度较低(即度量距离较大)。
示例的,假设网管设备03提供了1000个数据对,第二服务器012可以基于该1000个数据对生成1000×500=500000个不同的MAC地址组,每个MAC地址组包括两个MAC地址。之后,第二服务器012可以随机选取100000个MAC地址组,以及每个MAC地址组中两个MAC地址的相似度,得到100000个训练样本。第二服务器012对该100000个训练样本进行训练,即可得到相似度模型。对于任意输入的两个MAC地址,该相似度模型均可以输出一个相似度y,该相似度y可以衡量输入的两个MAC地址对应的设备型号是否相似。
图11是本申请实施例提供的一种数据库的构建方法的流程图,该方法可以应用于型号识别服务器01,例如可以应用于图1所示系统中的第二服务器012。下文以该构建方法应用于第二服务器012为例进行说明。如图11所示,该方法可以包括:
步骤401、获取多个数据对。
在本申请实时中,第二服务器012可以获取到网管设备03下发的用于构建数据库的多个数据对,每个数据对均包括一个MAC地址,以及与该MAC地址对应的设备型号。其中,用于构建数据库的多个数据对与上述步骤301中用于训练相似度模型的多个数据对可以相同,也可以不同,本申请实施例对此不做限定。
为了确保获取到的数据对的可靠性,第二服务器012可以先对获取到的数据对进行预处理,以清洗掉一些脏数据。清洗脏数据的过程可以参考上述步骤301,此处不再赘述。
步骤402、采用聚类算法对该多个数据对进行分组,得到多个数据组。
第二服务器012可以根据相似度模型,构建基于LSH技术的数据库,使得海量高维的数据(MAC地址经历了one-hot编码以及深度度量学习会映射成高维的矢量)的快速近邻搜索成为了可能。
大数据表明,设备型号对应的MAC地址大致具有分段连续的特征,即厂家习惯把一段连续的MAC地址分配给同一设备型号的终端设备,即同一设备型号的终端设备的MAC地址通常占据一个MAC地址区间。然而由于终端设备的型号太多,这种分段连续规律又极其复杂,例如某一型号的手机可能会占据多个MAC地址区间。随着采集到的数据的增多,一方面这种区间连续的规律越来越难以用规则去描述,另一方面基于传统索引方式的数据库会使得查询效率变得很低。因此本申请实施例提供了基于相似度模型和LSH技术的数据库创建方案。
可选的,第二服务器012可以先通过相似度模型确定不同MAC地址之间的相似度,进而可以通过聚类算法将相似度较高的MAC地址聚类在一起,即划分至同一个数据组。由此,即使某个型号的手机的MAC地址占据了多个MAC地址区间,经过基于相似度的聚类之后,该型号的手机的所有MAC地址也可以被划分至同一个数据组。
在本申请实施例中,第二服务器012可以采用k-centers这一聚类算法(当然也可以采用其他基于距离的聚类算法),将获取到的数据对划分成N个数据组,即N个cluster,每个cluster可以包括一个或多个数据对。并且,不同数据组包括的数据对的个数可以相同,也可以不同。其中,N可以为预先设定的大于1的整数,例如N可以等于该多个数据对中包括的设备型号的数量,或者等于设备型号的数量的10倍。
采用k-centers算法进行聚类的过程如下:
步骤S11、从多个数据对中任意选取一个数据对作为第一个数据组的聚类中心,即中心数据对。
步骤S12、计算剩余的每个数据对中的MAC地址与该第一个数据组的中心数据对中的MAC地址的相似度,将相似度最小(即度量距离最大)的数据对作为第二个数据组的中心数据对。
步骤S13、继续计算剩余的每个数据对的MAC地址与该第一个数据组的中心数据对的MAC地址的相似度,以及与该第二个数据组的中心数据对的MAC地址的相似度,并将相似度最小(即度量距离最大)的数据对作为第三个数据组的中心数据对。
以此类推,直至确定出N个数据组的中心数据对。其中,该步骤S13中所述的“相似度最小”可以是指:与该两个中心数据对的MAC地址的相似度之和最小,或者相似度的均值最小。
步骤S14、对于除中心数据对之外的每个数据对,确定该数据对的MAC地址与每个中心数据对的MAC地址的相似度,并将该数据对划分至相似度最高的中心数据对所属的数据组。
例如,假设某数据对的MAC地址与第一个数据组的中心数据对的MAC地址的相似度最高,则可以将该数据对划分至第一个数据组。
通过上述方法,第二服务器012即可将获取到的数据对划分成N个数据组。并且,通过k-centers算法可以确保划分至同一个数据组的各个数据对的MAC地址较为相似,即MAC地址之间的度量距离较小。而属于不同数据组的两个数据对的MAC地址的相似度则较低,即两个数据对的MAC地址的度量距离较大。由此,对于待查询的目标MAC地址,可以通过匹配该目标MAC地址与各个中心数据对的MAC地址的相似度,快速锁定目标MAC地址的近邻(即与目标MAC地址相似度较高的MAC地址)所在的区域。
示例的,假设第二服务器012获取到了如图12所示的多个数据对,则该第二服务器012采用相似度模型以及聚类算法对该多个数据对进行聚类后,可以得到c1至c7共7个数据组,且每个数据组中各个数据对的设备型号可以相同。
需要说明的是,每个数据对在数据库中的存储位置可以用两级索引(索引1,索引2)表示。其中,每个数据对的索引1可以为其所属的数据组的索引,即同一数据组中的各个数据对的索引1相同。例如,结合图7和图12,数据组c1中每个数据对的索引1可以均为001,数据组c2中每个数据对的索引1可以均为002。
步骤403、对于每个数据组,根据数据组中的中心数据对的MAC地址与其他每个数据对中的MAC地址的相似度,对该数据组包括的数据对按照相似度由高到低的顺序进行排序。
为了有效提高每个数据组内的MAC地址的近邻查询效率,第二服务器012还可以对每个数据组中的数据对按照与该中心数据对的MAC地址的相似度由高到低的顺序进行排序。之后,第二服务器012即可为该排序后的各个数据对分配索引2。
可选的,第二服务器012可以采用VP-tree算法对每个数据组中的各个数据对进行排序。其中,对任一数据组中的数据对的排序过程如下:
步骤S21、将该数据组中的中心数据对确定为VP-tree的根(root)节点。
步骤S22、计算数据组内其他每个数据对的MAC地址与该根节点的MAC地址的相似度,根据计算得到的相似度的中位数将除根节点之外的其他数据对划分为两个子集。
该两个子集中的一个子集可以包括:与该根节点的MAC地址的相似度大于或等于该中位数的数据对,该子集即为VP-tree的左子树。另一个子集可以包括:与该根节点的MAC地址的相似度小于该中位数的数据对,该子集即为VP-tree的右子树。
步骤S23、对于每个子集,从该子集中选取一个数据对作为该子集的新的子节点,计算该子集中的其他数据对的MAC地址与该子节点的MAC地址的相似度,进而根据计算得到的相似度的中位数将除子节点之外的其他数据再次拆分成两个子集。其中,该新的子节点可以为该子集中的任意一个数据对;或者也可以是该子集中各个数据对与该根节点的MAC地址的相似度的中位数所对应的一个数据对;又或者,还可以是该子集中与该根节点的MAC地址的相似度最高的一个数据对。
以此类推,直至每个子集均只剩下一个数据对,由此即可完成VP-tree的构建,即数据对的排序。该VP-tree中各个数据对可以按照从上至下(即从根节点至尾节点)以及从左至右(即从左子树至右子树)的顺序排列。
示例的,参考图13,假设某个数据组中包括D1至D7共7个数据对,其中数据对D1为中心数据对,则第二服务器012以该中心数据对D1为根节点构造的VP-tree可以如图13所示。该VP-tree中的7个数据对按照D1、D3、D2、D6、D5、D4和D7的顺序排列。
根据上文描述可知,基于LSH技术构建的数据库,可以使得相似设备型号的数据对在数据库中的位置也相近,从而便于近邻搜索。
本申请实施例还提供了一种型号识别模型的训练方法,该方法可以包括如下步骤:
步骤S31、第二服务器012获取多个MAC地址样本,以及每个MAC地址样本对应的设备型号。
该多个MAC地址样本,以及每个MAC地址样本对应的设备型号可以是网管设备03下发至第二服务器012的。
步骤S32、第二服务器对该多个MAC地址样本,以及每个MAC地址样本对应的设备型号进行训练,得到型号识别模型。
在本申请实施例中,第二服务器012可以先对获取到的数据进行预处理,例如去除脏数 据。然后对预处理后得到的每个MAC地址样本进行编码。之后,第二服务器012可以将该编码后的MAC地址样本作为模型的输入,将MAC地址样本对应的设备型号作为模型的目标输出,采用深度学习或随机森林等算法进行模型训练,直至损失函数(loss function)收敛即可得到型号识别模型。由于该型号识别模型可以输出终端设备属于多个设备型号中每个设备型号的概率,因此该型号识别模型也可以称为分类器(classifier)。
在本申请实施例中,型号识别服务器01在识别出目标终端设备的设备型号后,还可以将识别出的设备型号发送至网管设备03。网络管理员可以对该设备型号进行人工验证,若该设备型号错误,则网络管理员可以向网管设备03输入纠正的设备型号。并且,该网管设备03还可以定期将收集到的纠正数据发送至型号识别服务器01,以便该型号识别服务器01可以基于该纠正数据对模型(例如相似度模型、型号确定模型以及型号识别模型)进行重训练。其中,该纠正数据可以包括设备名称、MAC地址以及纠正的设备型号;或者可以仅包括MAC地址和纠正的设备型号。
示例的,网管设备03可以将纠正数据发送至第二服务器012,以触发该第二服务器012进行模型的重训练。该第二服务器012完成重训练后,可以将更新的模型发送至第一服务器011,以便该第一服务器011可以基于更新后的模型进行设备型号的识别。通过该重训练机制,可以确保模型的不断完善和优化,提高设备型号识别的准确率。
本申请实施例提供的方法,可以结合MAC地址和设备名称两个参数确定设备型号。对于能够提供有效的设备型号的设备名称,可以将基于该设备名称确定出的设备型号作为终端设备的设备型号,而对于无法提供有效设备型号的设备名称,则可以基于MAC地址来确定设备型号,由此有效提升了设备型号的识别率。例如,可以将识别率由55%提升至95%。其中,识别率是指识别出设备型号的终端设备的数量占全体通过该型号识别服务器识别设备型号的终端设备的比例。
其中,在根据设备名称识别设备型号时,采用了基于NLP的型号确定模型,该模型能够自动根据标签数据学习设备型号的提取规律,相对于传统的人工标注或者基于复杂的规则的正则表达式的标注方法,开发代价小,模型通用性高,便于泛化到不同地域或者不同语系,模型的开发和维护成本较低。
在根据MAC地址识别设备型号时,采用了深度度量学习以及LSH技术,实现了对设备型号在MAC地址上的分布规律的统计,从而从大数据和统计学的角度给出了关于设备型号的预测,使得高维的海量数据的快速近邻搜索成为了可能。
综上可知,本申请实施例提供的方案主要具有如下优势:(1)基于MAC地址确定设备型号的方法以及基于设备名称确定设备型号的方法均是基于大数据和统计学的原理,识别准确率可以得到保证。(2)两种方法都是数据驱动的,不依赖于规则,方法容易泛化到不同的地域和不同的人文环境;模型维护代价小,数据增多之后模型只需要简单地重训练即可。(3)两个方法所采用的模型都很小,识别速度快,识别效率较高。(4)两个方法的算法理论框架一致,可以复用,从而能够有效降低开发工作量。(5)具有重训练机制,能够有效增强模型预测的准确性和鲁棒性。
图14是本申请实施例提供的一种设备型号的识别装置的结构示意图,如图14所示,该 装置可以包括:
第一获取模块501,用于获取目标终端设备的目标物理地址。该第一获取模块501的功能实现可以参考上述步骤101的相关描述。
第一确定模块502,用于从数据库中确定第一数量个备选物理地址,该数据库包括多个物理地址,以及每个物理地址对应的设备型号,其中每个备选物理地址与该目标物理地址的相似度,大于该数据库中其他物理地址与该目标物理地址的相似度,该第一数量为大于1的整数。该第一确定模块502的功能实现可以参考上述步骤105的相关描述。
第二确定模块503,将该第一数量个备选物理地址对应的设备型号中,出现次数最多的设备型号确定为该目标终端设备的设备型号。该第二确定模块503的功能实现可以参考上述步骤106的相关描述。
可选的,该数据库包括多个数据组,每个数据组包括一个或多个数据对,每个数据对包括一个物理地址,以及与该物理地址对应的设备型号;该第一确定模块502可以用于:
根据该目标物理地址与每个数据组中任一数据对的物理地址的相似度,确定第二数量个备选数据组,每个备选数据组中该任一数据对的物理地址与该目标物理地址的相似度,大于其他数据组中该任一数据对的物理地址与该目标物理地址的相似度,其中,该第二数量为大于1的整数;
从该第二数量个备选数据组包括的物理地址中确定出第一数量个备选物理地址,每个备选物理地址与该目标物理地址的相似度,大于该第二数量个备选数据组中的其他物理地址与该目标物理地址的相似度。
可选的,每个数据组中存在一个中心数据对,且每个数据组中的数据对按照与该中心数据对的物理地址的相似度由高到低的顺序排列;每个数据组中的任一数据对为该中心数据对;
该第一确定模块502可以用于:根据每个备选数据组中的数据对的排列顺序,从该第二数量个备选数据组包括的物理地址中确定出第一数量个备选物理地址。
该第一确定模块502的功能实现还可以参考上述步骤1051和步骤1052的相关描述。
可选的,如图15所示,该装置还可以包括:
第二获取模块504,用于获取该多个数据对。该第二获取模块504的功能实现可以参考上述步骤401的相关描述。
聚类模块505,用于采用聚类算法对该多个数据对进行分组,得到该多个数据组。该聚类模块505的功能实现可以参考上述步骤402的相关描述。
排序模块506,用于对于每个数据组,根据该数据组中的中心数据对的物理地址与其他每个数据对中的物理地址的相似度,对该数据组包括的数据对按照相似度由高到低的顺序进行排序。该排序模块506的功能实现可以参考上述步骤403的相关描述。
可选的,如图15和图16所示,该装置还可以包括:
第三确定模块507,用于在该从数据库中确定第一数量个备选物理地址之前,采用相似度模型确定该数据库中的物理地址与该目标物理地址的相似度;其中,该相似度模型基于已确定相似度的多个物理地址样本训练得到。
可选的,该第一获取模块501,还可以用于获取该目标终端设备的目标设备名称。继续参考图15和图16,该装置还可以包括:
第四确定模块508,用于根据该目标设备名称确定该目标终端设备的备选设备型号。该第四确定模块508的功能实现可以参考上述步骤102的相关描述。
第五确定模块509,用于若该备选设备型号不为未知型号,则将该备选设备型号确定为该目标终端设备的设备型号。该第五确定模块509的功能实现可以参考上述步骤104的相关描述。
相应的,该第一确定模块502,可以用于若该备选设备型号为该未知型号,则从数据库中确定第一数量个备选物理地址。
可选的,该第四确定模块508可以用于:
采用型号确定模型从该目标设备名称中确定该目标终端设备的备选设备型号;其中,该型号确定模型基于已确定设备型号的多个设备名称样本训练得到。
可选的,该第四确定模块508可以用于:
采用型号确定模型确定该目标设备名称中的每个字符是否为有效字符;将该目标设备名称中的有效字符组成的字符串确定为该目标终端设备的备选设备型号。该第四确定模块508的功能实现还可以参考上述步骤1021至步骤1023的相关描述。
可选的,如图15所示,该装置还可以包括:
第三获取模块510,用于获取设备名称样本以及该设备名称样本对应的设备型号样本。该第三获取模块510的功能实现可以参考上述步骤201的相关描述。
第六确定模块511,用于将该设备名称样本中,与该设备型号样本匹配的字符串中的每个字符均标注为有效字符,将除该字符串之外的其他字符均标注为无效字符。该第六确定模块511的功能实现可以参考上述步骤202的相关描述。
训练模块512,用于对标注后的该设备名称样本进行模型训练,得到该型号确定模型。该训练模块512的功能实现可以参考上述步骤203的相关描述。
可选的,该第四确定模块508可以用于:
分别确定该目标设备名称与多个设备型号模板中每个设备型号模板的匹配度;将匹配度最高的设备型号模板确定为该目标终端设备的备选设备型号。
综上所述,本申请实施例提供了一种设备型号的识别装置,该装置可以根据目标MAC地址与数据库中存储的MAC地址的相似度,从数据库中确定出相似度较高的第一数量个备选MAC地址,然后再将该第一数量个备选MAC地址对应的设备型号中,出现次数最多的设备型号确定为目标终端设备的设备型号。由此,即使数据库中未存储该目标MAC地址对应的设备型号,也可以根据MAC地址的相似度确定出该目标终端设备的设备型号,从而有效提高了设备型号识别的成功率,降低了对数据库中存储的数据量的要求。
并且,本申请实施例提供的装置,还可以结合设备名称和MAC地址两个参数来确定目标终端设备的设备型号,相比于仅基于MAC地址或者仅基于设备名称确定设备信号,可以有效提高设备型号识别的成功率和可靠性。
应理解的是,本申请实施例提供的设备型号的识别装置可以用专用集成电路(application-specific integrated circuit,ASIC)实现,或可编程逻辑器件(programmable logic device,PLD)实现,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD),现场可编程门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。
图17是本申请实施例提供的一种设备型号的识别装置的结构示意图,参考图17,该设备型号的识别装置可以包括:处理器1701、存储器1702、网络接口1703和总线1704。其中, 总线1704用于连接处理器1701、存储器1702和网络接口1703。通过网络接口1703(可以是有线或者无线)可以实现与其他器件之间的通信连接。存储器1702中存储有计算机程序17021,该计算机程序17021用于实现各种应用功能。
应理解,在本申请实施例中,处理器1701可以是CPU,该处理器1701还可以是其他通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、GPU或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者是任何常规的处理器等。
存储器1702可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data date SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
总线1704除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线1704。
处理器1701被配置为执行存储器1702中存储的计算机程序,处理器1701通过执行该计算机程序17021来实现上述方法实施例中的步骤。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当该计算机可读存储介质在计算机上运行时,使得计算机执行如上述方法实施例中的步骤。
本申请实施例还提供了一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述方法实施例中的步骤。
本申请实施例还提供了一种设备型号的识别系统,如图1所示,该系统可以包括:第一服务器011和第二服务器012。
该第一服务器011可以用于实现如图3、图4和图6所示的方法实施例中的步骤;该第二服务器012可以用于实现如图9至图11所示方法实施例中的步骤。
示例的,该第一服务器011可以包括如图14或图16所示的装置。该第二服务器012可以包括如图15所示的装置中的模块504至506,以及模块510至512。
如图1所示,该系统还可以包括:网关设备02。该网关设备02可以分别与终端设备04和该第一服务器011连接,该网关设备02可以用于获取该终端设备04的MAC地址,并将获取到的MAC地址发送至该第一服务器011。
可选的,本申请实施例提供的设备型号的识别系统也可以仅包括第一服务器011,该第一服务器011可以用于实现如图3、图4、图6以及图9至图11所示的方法实施例中的步骤。例如,该第一服务器011可以包括如图15所示的装置。
应当理解的是,在本文中提及的“和/或”,表示可以存在三种关系,例如,A和/或B,可 以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的模块及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本领域普通技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的系统、装置和模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或模块的间接耦合或通信连接,也可以是电的,机械的或其它的形式连接。
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本申请实施例方案的目的。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以是两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以计算机程序产品的形式体现出来,该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机程序产品存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。
Claims (25)
- 一种设备型号的识别方法,其特征在于,所述方法包括:获取目标终端设备的目标物理地址;从数据库中确定第一数量个备选物理地址,所述数据库包括多个物理地址,以及每个所述物理地址对应的设备型号,其中每个所述备选物理地址与所述目标物理地址的相似度,大于所述数据库中其他物理地址与所述目标物理地址的相似度,所述第一数量为大于1的整数;将所述第一数量个备选物理地址对应的设备型号中,出现次数最多的设备型号确定为所述目标终端设备的设备型号。
- 根据权利要求1所述的方法,其特征在于,所述数据库包括多个数据组,每个所述数据组包括一个或多个数据对,每个所述数据对包括一个物理地址,以及与所述物理地址对应的设备型号;所述从数据库中确定第一数量个备选物理地址,包括:根据所述目标物理地址与每个所述数据组中任一数据对的物理地址的相似度,确定第二数量个备选数据组,每个所述备选数据组中所述任一数据对的物理地址与所述目标物理地址的相似度,大于其他数据组中所述任一数据对的物理地址与所述目标物理地址的相似度,其中,所述第二数量为大于1的整数;从所述第二数量个备选数据组包括的物理地址中确定出第一数量个备选物理地址,每个所述备选物理地址与所述目标物理地址的相似度,大于所述第二数量个备选数据组中的其他物理地址与所述目标物理地址的相似度。
- 根据权利要求2所述的方法,其特征在于,每个所述数据组中存在一个中心数据对,且每个所述数据组中的数据对按照与所述中心数据对的物理地址的相似度由高到低的顺序排列;每个所述数据组中的任一数据对为所述中心数据对;所述从所述第二数量个备选数据组包括的物理地址中确定出第一数量个备选物理地址,包括:根据每个所述备选数据组中的数据对的排列顺序,从所述第二数量个备选数据组包括的物理地址中确定出第一数量个备选物理地址。
- 根据权利要求3所述的方法,其特征在于,所述方法还包括:获取多个所述数据对;采用聚类算法对多个所述数据对进行分组,得到多个所述数据组;对于每个所述数据组,根据所述数据组中的中心数据对的物理地址与其他每个数据对中的物理地址的相似度,对所述数据组包括的数据对按照相似度由高到低的顺序进行排序。
- 根据权利要求2所述的方法,其特征在于,在所述从数据库中确定第一数量个备选物理地址之前,所述方法还包括:采用相似度模型确定所述数据库中的物理地址与所述目标物理地址的相似度;其中,所述相似度模型基于已确定相似度的多个物理地址样本训练得到。
- 根据权利要求1至5任一所述的方法,其特征在于,所述方法还包括:获取所述目标终端设备的目标设备名称;根据所述目标设备名称确定所述目标终端设备的备选设备型号;若所述备选设备型号不为未知型号,则将所述备选设备型号确定为所述目标终端设备的 设备型号;所述从数据库中确定第一数量个备选物理地址,包括:若所述备选设备型号为所述未知型号,则从数据库中确定第一数量个备选物理地址。
- 根据权利要求6所述的方法,其特征在于,所述根据所述目标设备名称确定所述目标终端设备的备选设备型号,包括:采用型号确定模型从所述目标设备名称中确定所述目标终端设备的备选设备型号;其中,所述型号确定模型基于已确定设备型号的多个设备名称样本训练得到。
- 根据权利要求7所述的方法,其特征在于,所述采用型号确定模型从所述目标设备名称中确定所述目标终端设备的备选设备型号,包括:采用型号确定模型确定所述目标设备名称中的每个字符是否为有效字符;将所述目标设备名称中的有效字符组成的字符串确定为所述目标终端设备的备选设备型号。
- 根据权利要求8所述的方法,其特征在于,所述方法还包括:获取设备名称样本以及所述设备名称样本对应的设备型号样本;将所述设备名称样本中,与所述设备型号样本匹配的字符串中的每个字符均标注为有效字符,将除所述字符串之外的其他字符均标注为无效字符;对标注后的所述设备名称样本进行模型训练,得到所述型号确定模型。
- 根据权利要求6所述的方法,其特征在于,所述根据所述目标设备名称确定所述目标终端设备的备选设备型号,包括:分别确定所述目标设备名称与多个设备型号模板中每个设备型号模板的匹配度;将匹配度最高的设备型号模板确定为所述目标终端设备的备选设备型号。
- 一种设备型号的识别装置,其特征在于,所述装置包括:第一获取模块,用于获取目标终端设备的目标物理地址;第一确定模块,用于从数据库中确定第一数量个备选物理地址,所述数据库包括多个物理地址,以及每个所述物理地址对应的设备型号,其中每个所述备选物理地址与所述目标物理地址的相似度,大于所述数据库中其他物理地址与所述目标物理地址的相似度,所述第一数量为大于1的整数;第二确定模块,用于将所述第一数量个备选物理地址对应的设备型号中,出现次数最多的设备型号确定为所述目标终端设备的设备型号。
- 根据权利要求11所述的装置,其特征在于,所述数据库包括多个数据组,每个所述数据组包括一个或多个数据对,每个所述数据对包括一个物理地址,以及与所述物理地址对应的设备型号;所述第一确定模块,用于:根据所述目标物理地址与每个所述数据组中任一数据对的物理地址的相似度,确定第二数量个备选数据组,每个所述备选数据组中所述任一数据对的物理地址与所述目标物理地址的相似度,大于其他数据组中所述任一数据对的物理地址与所述目标物理地址的相似度,其中,所述第二数量为大于1的整数;从所述第二数量个备选数据组包括的物理地址中确定出第一数量个备选物理地址,每个所述备选物理地址与所述目标物理地址的相似度,大于所述第二数量个备选数据组中的其他物理地址与所述目标物理地址的相似度。
- 根据权利要求12所述的装置,其特征在于,每个所述数据组中存在一个中心数据对, 且每个所述数据组中的数据对按照与所述中心数据对的物理地址的相似度由高到低的顺序排列;每个所述数据组中的任一数据对为所述中心数据对;所述第一确定模块,用于:根据每个所述备选数据组中的数据对的排列顺序,从所述第二数量个备选数据组包括的物理地址中确定出第一数量个备选物理地址。
- 根据权利要求13所述的装置,其特征在于,所述装置还包括:第二获取模块,用于获取多个所述数据对;聚类模块,用于采用聚类算法对多个所述数据对进行分组,得到多个所述数据组;排序模块,用于对于每个所述数据组,根据所述数据组中的中心数据对的物理地址与其他每个数据对中的物理地址的相似度,对所述数据组包括的数据对按照相似度由高到低的顺序进行排序。
- 根据权利要求12所述的装置,其特征在于,所述装置还包括:第三确定模块,用于在所述从数据库中确定第一数量个备选物理地址之前,采用相似度模型确定所述数据库中的物理地址与所述目标物理地址的相似度;其中,所述相似度模型基于已确定相似度的多个物理地址样本训练得到。
- 根据权利要求11至15任一所述的装置,其特征在于,所述装置还包括:所述第一获取模块,还用于获取所述目标终端设备的目标设备名称;第四确定模块,用于根据所述目标设备名称确定所述目标终端设备的备选设备型号;第五确定模块,用于若所述备选设备型号不为未知型号,则将所述备选设备型号确定为所述目标终端设备的设备型号;所述第一确定模块,用于若所述备选设备型号为所述未知型号,则从数据库中确定第一数量个备选物理地址。
- 根据权利要求16所述的装置,其特征在于,所述第四确定模块,用于:采用型号确定模型从所述目标设备名称中确定所述目标终端设备的备选设备型号;其中,所述型号确定模型基于已确定设备型号的多个设备名称样本训练得到。
- 根据权利要求17所述的装置,其特征在于,所述第四确定模块,用于:采用型号确定模型确定所述目标设备名称中的每个字符是否为有效字符;将所述目标设备名称中的有效字符组成的字符串确定为所述目标终端设备的备选设备型号。
- 根据权利要求18所述的装置,其特征在于,所述装置还包括:第三获取模块,用于获取设备名称样本以及所述设备名称样本对应的设备型号样本;第六确定模块,用于将所述设备名称样本中,与所述设备型号样本匹配的字符串中的每个字符均标注为有效字符,将除所述字符串之外的其他字符均标注为无效字符;训练模块,用于对标注后的所述设备名称样本进行模型训练,得到所述型号确定模型。
- 根据权利要求16所述的装置,其特征在于,所述第四确定模块,用于:分别确定所述目标设备名称与多个设备型号模板中每个设备型号模板的匹配度;将匹配度最高的设备型号模板确定为所述目标终端设备的备选设备型号。
- 一种设备型号的识别装置,其特征在于,所述装置包括:处理器,存储器,以及存储在所述存储器上并能够在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至10任一所述的方法。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当 所述计算机可读存储介质在计算机上运行时,使得计算机执行如权利要求1至10任一所述的方法。
- 一种设备型号的识别系统,其特征在于,所述系统包括:第一服务器和第二服务器;所述第一服务器用于执行如权利要求1至3,权利要求5至8以及权利要求10中任一所述的方法;所述第二服务器用于执行如权利要求4或9所述的方法。
- 一种设备型号的识别系统,其特征在于,所述系统包括:第一服务器,所述第一服务器用于执行如权利要求1至10任一所述的方法。
- 根据权利要求23或24所述的系统,其特征在于,所述系统还包括:网关设备;所述网关设备分别与终端设备和所述第一服务器连接,所述网关设备用于获取所述终端设备的物理地址,并将获取到的所述物理地址发送至所述第一服务器。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010211207.9A CN113452802A (zh) | 2020-03-24 | 2020-03-24 | 设备型号的识别方法、装置及系统 |
CN202010211207.9 | 2020-03-24 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021190398A1 true WO2021190398A1 (zh) | 2021-09-30 |
Family
ID=77806313
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/081615 WO2021190398A1 (zh) | 2020-03-24 | 2021-03-18 | 设备型号的识别方法、装置及系统 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113452802A (zh) |
WO (1) | WO2021190398A1 (zh) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114338602A (zh) * | 2021-12-06 | 2022-04-12 | 深圳市联洲国际技术有限公司 | 网络设备的识别方法及其装置、计算机可读存储介质 |
CN114390511A (zh) * | 2021-12-20 | 2022-04-22 | 苏州迈科网络安全技术股份有限公司 | 基于mac地址的终端型号动态识别方法、装置、终端及存储介质 |
CN114697295A (zh) * | 2022-03-28 | 2022-07-01 | 视联动力信息技术股份有限公司 | 一种终端入网方法和装置 |
CN117668581A (zh) * | 2023-12-13 | 2024-03-08 | 北京知其安科技有限公司 | 一种多源数据的实体识别方法、装置及电子设备 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113821242B (zh) * | 2021-09-29 | 2023-08-22 | 深圳威消保科技有限公司 | 一种固件智能匹配方法及系统 |
CN116582133B (zh) * | 2023-07-12 | 2024-02-23 | 东莞市联睿光电科技有限公司 | 一种变压器生产过程数据智能管理系统 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7590141B1 (en) * | 2005-10-28 | 2009-09-15 | Hewlett-Packard Development Company, L.P. | Method and apparatus for an automatic network boot procedure for a resource in a utility computing environment |
CN106712986A (zh) * | 2015-07-31 | 2017-05-24 | 深圳触云科技有限公司 | 一种识别智能终端的方法 |
CN107666662A (zh) * | 2016-07-28 | 2018-02-06 | 华为技术有限公司 | 一种终端识别方法和接入点 |
CN108319729A (zh) * | 2018-03-19 | 2018-07-24 | 深圳市中科新业信息科技发展有限公司 | 一种手机型号计算方法及手机型号查询方法 |
CN109347880A (zh) * | 2018-11-30 | 2019-02-15 | 北京神州绿盟信息安全科技股份有限公司 | 一种安全防护方法、装置及系统 |
-
2020
- 2020-03-24 CN CN202010211207.9A patent/CN113452802A/zh active Pending
-
2021
- 2021-03-18 WO PCT/CN2021/081615 patent/WO2021190398A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7590141B1 (en) * | 2005-10-28 | 2009-09-15 | Hewlett-Packard Development Company, L.P. | Method and apparatus for an automatic network boot procedure for a resource in a utility computing environment |
CN106712986A (zh) * | 2015-07-31 | 2017-05-24 | 深圳触云科技有限公司 | 一种识别智能终端的方法 |
CN107666662A (zh) * | 2016-07-28 | 2018-02-06 | 华为技术有限公司 | 一种终端识别方法和接入点 |
CN108319729A (zh) * | 2018-03-19 | 2018-07-24 | 深圳市中科新业信息科技发展有限公司 | 一种手机型号计算方法及手机型号查询方法 |
CN109347880A (zh) * | 2018-11-30 | 2019-02-15 | 北京神州绿盟信息安全科技股份有限公司 | 一种安全防护方法、装置及系统 |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114338602A (zh) * | 2021-12-06 | 2022-04-12 | 深圳市联洲国际技术有限公司 | 网络设备的识别方法及其装置、计算机可读存储介质 |
CN114390511A (zh) * | 2021-12-20 | 2022-04-22 | 苏州迈科网络安全技术股份有限公司 | 基于mac地址的终端型号动态识别方法、装置、终端及存储介质 |
CN114390511B (zh) * | 2021-12-20 | 2024-05-17 | 苏州迈科网络安全技术股份有限公司 | 基于mac地址的终端型号动态识别方法、装置、终端及存储介质 |
CN114697295A (zh) * | 2022-03-28 | 2022-07-01 | 视联动力信息技术股份有限公司 | 一种终端入网方法和装置 |
CN117668581A (zh) * | 2023-12-13 | 2024-03-08 | 北京知其安科技有限公司 | 一种多源数据的实体识别方法、装置及电子设备 |
Also Published As
Publication number | Publication date |
---|---|
CN113452802A (zh) | 2021-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021190398A1 (zh) | 设备型号的识别方法、装置及系统 | |
US10013636B2 (en) | Image object category recognition method and device | |
CN110471916B (zh) | 数据库的查询方法、装置、服务器及介质 | |
WO2022037130A1 (zh) | 网络流量异常的检测方法、装置、电子装置和存储介质 | |
CN111027048B (zh) | 一种操作系统识别方法、装置、电子设备及存储介质 | |
WO2018014610A1 (zh) | 基于c4.5决策树算法的特定用户挖掘系统及其方法 | |
CN105095522B (zh) | 基于最近邻搜索的关系表集合外键识别方法 | |
WO2022247955A1 (zh) | 非正常账号识别方法、装置、设备和存储介质 | |
CN112650923A (zh) | 新闻事件的舆情处理方法及装置、存储介质、计算机设备 | |
WO2022048668A1 (zh) | 知识图谱构建方法和装置、检查方法、存储介质 | |
WO2021104444A1 (zh) | 一种数据流分类方法、装置及系统 | |
WO2023108995A1 (zh) | 向量相似度计算方法、装置、设备及存储介质 | |
CN101605126A (zh) | 一种多协议数据分类识别的方法和系统 | |
WO2021088234A1 (zh) | 一种基于卷积神经网络的数据包分类方法及系统 | |
WO2023174431A1 (zh) | 一种kpi曲线数据处理方法 | |
Mazumdar et al. | A theoretical analysis of first heuristics of crowdsourced entity resolution | |
CN115618249A (zh) | 一种基于LargeVis降维与DBSCAN聚类的低压配电台区相位识别方法 | |
CN116226103A (zh) | 一种基于FPGrowth算法进行政务数据质量检测的方法 | |
CN109543712B (zh) | 时态数据集上的实体识别方法 | |
WO2016177146A1 (zh) | 一种网络流量数据的分类方法及装置 | |
CN114978593B (zh) | 基于图匹配的不同网络环境的加密流量分类方法及系统 | |
US11868332B2 (en) | Data index establishment method, and apparatus | |
CN107579866B (zh) | 一种无线虚拟化接入自主管理网络的业务与虚拟服务智能匹配方法 | |
CN114756578A (zh) | Sql执行计划的确定方法和装置 | |
CN109086373B (zh) | 一种构建公平的链接预测评估系统的方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21775794 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21775794 Country of ref document: EP Kind code of ref document: A1 |