CN114423011A

CN114423011A - Open set WIFI equipment identification method and device

Info

Publication number: CN114423011A
Application number: CN202111659944.6A
Authority: CN
Inventors: 陈立全; 陈招发; 焦江浩; 胡爱群; 李古月
Original assignee: Network Communication and Security Zijinshan Laboratory
Current assignee: Network Communication and Security Zijinshan Laboratory
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-04-29

Abstract

The invention discloses an open set WIFI equipment identification method and device, which comprises the following steps: acquiring an output signal of authorized legal WIFI equipment, preprocessing the output signal, extracting a lead code, extracting artificial features and depth features based on the lead code, and splicing the two types of features to obtain a fusion feature vector; training a BP neural network model of a custom loss function as a discriminator; acquiring an output signal of the WIFI equipment to be identified, inquiring whether a corresponding discriminator exists, if so, extracting a fusion feature vector of the output signal, inputting the fusion feature vector into the discriminator to identify the WIFI equipment, and if not, considering the WIFI equipment as unknown equipment. The method integrates the artificial features and the depth features, enriches the feature set, and improves the identification accuracy of the WIFI equipment; meanwhile, a BP neural network with a single classification function is introduced to serve as a discriminator, and rejection of any unknown WIFI equipment which is disguised as legal WIFI equipment and does not have prior information is achieved.

Description

Open set WIFI equipment identification method and device

Technical Field

The invention relates to the field of artificial intelligence and information security, in particular to an open set WIFI equipment identification method and device.

Background

As the number of wirelessly connected devices grows, it becomes more challenging to secure wireless communications. Identity authentication is an important content of wireless security protection, and although there are many identity verification methods based on cryptography, many devices are limited by limited computation and power budget, and are not suitable for high-complexity security algorithms or additional security modules.

The physical layer identity authentication enhances wireless communication security by authenticating the device in combination with channel state information and transmitter hardware fingerprints. The transmitter fingerprints are caused by non-ideal conditions of its radio frequency components, the interaction between which causes signals from different transmitters to exhibit unique characteristics, common radio frequency fingerprints such as carrier frequency offset, etc.

Although many device identification methods based on radio frequency fingerprints can be used for classifying and identifying different devices and have good performance, the work has a limitation in identifying unknown camouflage devices, most of the used identification algorithms belong to supervised learning algorithms, and a mark sample with prior information is required to be used for training a model. In application, the fingerprint characteristics of unknown disguised devices cannot be acquired in advance, and if any unknown disguised device without prior information is identified, signals of the unknown disguised device are wrongly classified into the closest class, so that security holes are caused.

Disclosure of Invention

The purpose of the invention is as follows: aiming at the problems in the prior art, the invention provides an open set WIFI equipment identification method and device, which can reject any unknown equipment disguised as legal WIFI equipment and realize open set identification on the basis of ensuring that authorized legal WIFI equipment is classified and identified with high accuracy; in addition, the fusion features based on the artificial features and the depth features are used for classification and identification, so that the identification accuracy is effectively improved, and the misjudgment rate caused by the fact that the artificial features or the depth features are independently used for identification is reduced.

The technical scheme is as follows: on one hand, the invention provides an open set WIFI equipment identification method, which comprises the following steps:

s1: acquiring an output signal of authorized legal WIFI equipment, preprocessing the output signal, extracting a lead code, extracting artificial features and depth features of the output signal based on the lead code, splicing the two types of features to obtain a fusion feature vector, and making the fusion feature vector into a fusion feature vector set with a preset size;

s2: dividing the fusion feature vector set into a training set, a verification set and a test set, training a BP neural network model of a custom loss function based on the training set, adjusting parameters of the BP neural network model by using the verification set, selecting the BP neural network model meeting a first preset condition as a discriminator, and setting a label corresponding to the legal WIFI equipment for the discriminator;

s3: acquiring an output signal of the WIFI equipment to be identified, inquiring whether a corresponding discriminator exists, if so, extracting a fusion feature vector of the output signal, inputting the fusion feature vector into the discriminator to identify the WIFI equipment, and if not, considering the WIFI equipment as unknown equipment.

Further, the process of extracting the preamble from the output signal includes: presetting a standard lead code sequence, intercepting continuous sequences with the same length as the standard lead code sequence from output signals, calculating the correlation between the intercepted signals and the standard lead code sequence through conjugate multiplication, and extracting the intercepted signals with the correlation meeting a second preset condition as lead codes.

Further, the artificial features include: carrier frequency offset, channel estimation characteristics, and singular values based on frequency response.

Further, the specific process of extracting the depth features is as follows:

pre-training the auto-encoder model from the preamble set: the self-encoder network consists of an encoder layer and a decoder layer, and during training, the encoder layer learns the compressed representation of input data by using a nonlinear activation function; the decoder layer reconstructs original input data, a reconstruction error is calculated, and the self-encoder network selects characteristics of which the information quantity meets a third preset condition based on a back propagation algorithm and an optimization method to form a compressed representation of the input data;

the encoder layer of the pre-trained auto-encoder model is copied, and is used as a depth feature extractor, and the compressed representation of the input data is used as a depth feature.

Further, the specific process of S2 is as follows:

s2.1: dividing the fusion feature vector set of each legal WIFI device into a training set, a verification set and a test set in proportion by using a non-repeated sampling technology;

s2.2: extracting a preset number of samples from the training subset by using a non-return random sampling mode to form a training subset, and repeatedly sampling for a preset number of times to obtain the training subset, wherein the number of the training subset is the same as the preset number of times of sampling;

s2.3: training BP neural network models of the custom loss function in parallel based on the training subsets, wherein the number of the BP neural network models is the same as that of the training subsets;

s2.4: and performing parameter adjustment and performance verification on the BP neural network model based on a verification set, saving the BP neural network model with the prediction error meeting a first preset condition as a discriminator, and setting a label for the discriminator.

Further, the BP neural network model takes a single classification algorithm optimization target as a loss function.

Further, the BP neural network model is composed of an input layer, a hidden layer, and an output layer, and the specific process of training the BP neural network model includes: defining a loss function by self, and defining a single classification algorithm optimization target as a BP neural network loss function; calculating a prediction error based on a custom loss function, and iteratively updating BP neural network parameters by using a back propagation algorithm until convergence requirements are met; and marking the training sample points as normal or abnormal by utilizing a decision function on an output layer, so that the BP neural network has single classification capability.

Further, the specific process of S3 is as follows:

s3.1: acquiring an output signal of the WIFI equipment to be identified;

s3.2: extracting tag information from an output signal of the WIFI equipment to be identified, selecting a corresponding discriminator according to the tag information, and if the tag information does not have the corresponding discriminator, judging that the WIFI equipment to be identified belongs to unknown equipment and is not disguised as authorized legal equipment;

s3.3: if the tag information has the corresponding discriminator, extracting a fusion feature vector from an output signal of the WIFI equipment to be identified as the input of the discriminator;

s3.4: and the discriminator outputs discrimination results, wherein the first discrimination result is legal WIFI equipment, and the second discrimination result is unknown illegal WIFI equipment disguised as legal WIFI equipment.

On the other hand, the invention also provides an open set WIFI equipment identification device, which comprises the following units:

the signal processing unit is used for acquiring output signals of authorized legal WIFI equipment, preprocessing the output signals, extracting lead codes, extracting artificial features and depth features of the output signals based on the lead codes, splicing the two types of features to obtain fusion feature vectors, and making the fusion feature vectors into a fusion feature vector set with a preset size;

the model training unit is used for dividing the fusion feature vector set into a training set, a verification set and a test set, training a BP neural network model with a custom loss function based on the training set, adjusting parameters of the BP neural network model by using the verification set, selecting the BP neural network model meeting a first preset condition as a discriminator, and setting a label corresponding to the legal WIFI equipment for the discriminator;

and the device identification unit is used for acquiring an output signal of the WIFI device to be identified, inquiring whether a corresponding discriminator exists, if so, extracting a fusion characteristic vector of the output signal, inputting the fusion characteristic vector into the discriminator to identify the WIFI device, and if not, considering the WIFI device as an unknown device.

In the prior art, a supervised learning algorithm is usually adopted, a labeled sample with prior information is required to be used for training a model, the artificial feature and the depth feature are fused, the feature set is enriched, and the identification accuracy of WIFI equipment is improved; meanwhile, a BP neural network with a single classification function is introduced as a discriminator, and on the basis of ensuring classification and identification of authorized legal WIFI equipment with high accuracy, any unknown WIFI equipment without prior information disguised as legal WIFI equipment can be rejected, so that open set WIFI equipment identification is realized.

Drawings

Fig. 1 is a flowchart illustrating steps of an open set WIFI device identification method in embodiment 1 of the present invention;

fig. 2 is a working framework diagram of the open set WIFI device identification method in embodiment 1 of the present invention;

FIG. 3 is a schematic diagram of a topology of a BP neural network in embodiment 1 of the present invention;

FIG. 4 is a flowchart of a BP neural network training process in embodiment 1 of the present invention;

fig. 5 is a schematic structural diagram of an open set WIFI device identification apparatus in embodiment 2 of the present invention.

Detailed Description

The technical scheme of the invention will be explained in detail below by combining with the attached drawings in the embodiment of the invention, and the scheme of the invention is better described through the embodiment.

Example 1

Referring to fig. 1, the embodiment provides an open set WIFI device identification method, which includes the following specific steps:

specifically, in the embodiment, 10 WIFI devices are selected as target wireless devices and numbered, and the WIFI devices numbered 1-8 are used as authorized legal WIFI devices for training the arbiter; and taking the WIFI equipment with the number of 9-10 as unknown equipment disguised as legal WIFI equipment, and testing the rejection rate of the discriminator on the unknown disguised equipment.

The USRP is adopted to collect output signals of authorized legal WIFI equipment, 10000 frames of data are collected for each piece of equipment to be used for training a model, signals are collected by different WIFI equipment in different time periods, and mutual interference among different WIFI equipment is avoided.

Preprocessing the acquired output signals, wherein the preprocessing process comprises the following steps: down conversion, over sampling, signal detection and interception, energy normalization, frequency offset estimation and compensation. Wherein, the direct down-conversion of the wireless signal of gathering obtains the baseband signal:

x (t) represents the complex baseband representation of the transmitted signal, af represents the carrier frequency offset,

indicating a phase deviation; sampling the complex baseband signals by using a 20Mbps sampling rate after down-conversion, estimating the beginning and the end of each frame of signals by using a change point principle, and intercepting the signals; and finally, carrying out energy normalization on the intercepted signals and storing the energy normalization.

Extracting a preamble from the pre-processed signal by using a preset standard preamble sequence and a sliding window method: in this embodiment, a 20Mbps sampling rate is used, the length of the short preamble is 160, a sliding window is used to intercept a continuous sequence with the length of 160 from the preprocessed signal, the continuous sequence is conjugate-multiplied with a preset standard short preamble to calculate correlation, the intercepted signal is the preamble when a second preset condition is met, and the second preset condition in this embodiment is that when the correlation is maximum, the subscript of the sliding window is the start point of the preamble.

Three artificial features are extracted based on the lead code complex signal: the specific calculation method for the carrier frequency offset, the channel estimation characteristics and the singular value based on the frequency response is as follows:

calculating a carrier frequency offset: the PLCP preamble contains a short preamble consisting of 10 repeated short training sequences and a long preamble consisting of 2 repeated long training sequences. The frequency offset is:

wherein, N is the length of the preamble sequence, D is the distance between two repeated sequences, arg () is the angle calculation function, and R is the time-delay correlation of the repeated sequences:

l is the repeat length and r represents the preamble sequence. In this embodiment, first, a short preamble is used for coarse frequency offset estimation, and an average value is calculated by using 10-segment repeated sequences to obtain a coarse frequency offset estimation f₁(ii) a Compensation for long preamble: y (t) x (t) e^-j2πf1tObtaining a fine frequency offset estimate f using the compensated repeat sequence of the long preamble₂The total frequency offset is estimated as f₁+f₂。

Calculating channel estimation characteristics: under the sampling rate of 20Mbps, two sections of repetitive sequences with the length of 64 exist in the long lead code, the preset standard long lead code repetitive sequence frequency response H1 is calculated respectively, the long lead code repetitive sequence frequency responses H2 and H3 are actually received, and the channel estimation characteristics are obtained:

calculating singular values based on the frequency response: calculating the frequency response H of the long lead code, and calculating a singular value: s ═ svd (h), where svd () is the singular value calculation function in Matlab.

Extracting depth features using an encoder layer of a pre-trained self-encoder network based on the preamble complex signal:

pre-training an auto-encoder model for each legal WIFI device according to the preamble set: in this embodiment, a preamble complex signal with a length of 320 is used as an input of the self-encoder for pre-training the self-encoder network. The encoder layer of the self-encoder network consists of 3 hidden layers, the sigmoid activation function is used, the dimensionalities are respectively 160, 80 and 10, the self-encoder network is used for learning the compressed representation of the input data, the self-encoder network is forced to select the features meeting a third preset condition as the compressed representation of the input data based on a back propagation algorithm and an optimization method, and the third preset condition referred by the embodiment is to select the features with the largest information quantity as the compressed representation of the input data. The number of encoder layers and the dimension are determined according to actual needs, but not limited thereto.

The encoder layer of the pre-trained self-encoder network is copied as a depth feature extractor, and in the present embodiment, a preamble complex signal with a length of 320 is compressed into a real signal with a length of 10 as a depth feature.

The three extracted artificial features and the depth features are spliced into a one-dimensional feature vector to form a fused feature vector, and the fused feature vector forms a fused feature vector set, wherein the size of the fused feature vector set of each legal WIFI device is 10000.

s2.1: adopting a non-repeated random sampling technology to perform fusion on a feature vector set with the size of 10000 of each legal WIFI device according to the following steps of: 2: 1 into a training set, a validation set, and a test set, wherein the non-repetitive random sampling technique is implemented by a train _ test _ split helper function on scimit-spare.

S2.2: the training set size is 7000, 5000 samples are extracted from the training set to form a training subset in a non-return random sampling mode, the training subset is repeated for 10 times to obtain 10 training subsets, and the number of the extracted samples and the number of the training subsets are determined according to actual needs, but not limited to.

S2.3: training 10 BP neural network models of custom loss functions in parallel based on 10 training subsets: the BP neural network topology structure is shown in fig. 3, and is composed of an input layer, a hidden layer, and an output layer, the number of neurons in the output layer is 1, and a training sample point is marked as normal or abnormal by using a decision function.

Referring to fig. 4, the BP neural network training process is as follows:

s2.3.1: inputting a fusion feature vector training subset;

s2.3.2: network initialization: determining a BP neural network topological structure, and initializing weights among layers and thresholds of neurons;

s2.3.3: calculating hidden layer output based on the connection weight and the input fusion feature vector;

s2.3.4: calculating an output layer prediction output based on the connection weight and the hidden layer output;

s2.3.5: calculating an error between the target output and the predicted output by using a custom loss function, wherein the custom loss function optimizes a target design based on an OC-SVM single classification algorithm;

specifically, the OC-SVM algorithm is similar in principle to the SVM algorithm in that it trains the support vector machine by taking the zero point as a negative sample point and the other data as a positive sample point. The strategy is to map the data to a feature space corresponding to the kernel, and construct a hyperplane between the data and the origin, wherein the hyperplane has the maximum distance with the origin.

Assuming the hyperplane is: w is a^TΦ (X) - ρ ═ 0, where w is the norm perpendicular to the hyperplane, Φ (X) is the RKHS mapping function from the input space to the feature space F, and ρ is the deviation of the hyperplane. The objective is to maximize the distance between the hyperplane and the origin on the basis of correct classification, and after a relaxation variable xi is added, an OC-SVM algorithm optimizes an objective function as follows:

s.t.(w^TΦ(X_i))≥ρ-ξ_i，ξ_i≥0

where v ∈ (0,1) is used to adjust the degree of relaxation, and N is the sample dimension.

And applying the OC-SVM to the BP neural network to drive the training of the neural network. Most importantly by transformation w^Tg(VX_n) Instead of w in the OC-SVM algorithm^TΦ(X_n) Transformation where w is from hidden layer to inputScalar outputs obtained out of the layer, g () represents an activation function, V represents a weight matrix from the network inputs to the hidden units, so that the optimization objective can be achieved in the network, and the resulting loss function of the network is:

s.t.(w^Tg(VX_i)≥ρ-ξ_i，ξ_i≥0

s2.3.6: and calculating output errors of each layer by using a back propagation algorithm, and iteratively updating the connection weight between each layer until the convergence requirement is met, so that the network has single classification capability.

S2.4: the method comprises the steps of carrying out parameter adjustment and performance verification on 10 trained BP neural network models based on a verification set, saving the BP neural network models meeting first preset conditions as discriminators, selecting the BP neural network model with the smallest prediction error as the discriminator according to the first preset conditions, and setting MAC addresses corresponding to WIFI equipment as labels of the discriminator.

S3: acquiring an output signal of WIFI equipment to be identified, inquiring whether a corresponding discriminator exists, if so, extracting a fusion feature vector of the output signal, inputting the fusion feature vector into the discriminator to identify the WIFI equipment, and if not, considering the WIFI equipment as unknown equipment;

s3.1: acquiring an output signal of the WIFI equipment to be identified;

s3.2: extracting an MAC address from an output signal of the WIFI equipment to be identified, selecting a corresponding discriminator according to the MAC address, and if the MAC address does not have the corresponding discriminator, judging that the WIFI equipment to be identified belongs to unknown equipment and is not disguised as authorized legal equipment;

s3.3: if the MAC address has a corresponding discriminator, extracting a fusion characteristic vector from an output signal of the WIFI equipment to be identified as the input of the discriminator;

s3.4: the discriminator outputs a discrimination result, the output result of the discriminator is 1, the device to be identified belongs to legal WIFI equipment, and the legal WIFI equipment belongs to the category of the legal WIFI equipment corresponding to the discriminator; and the output result of the discriminator is 0, which indicates that the WIFI equipment to be identified belongs to unknown illegal WIFI equipment disguised as legal WIFI equipment. And repeatedly using different legal WIFI device samples (authorized WIFI devices with the numbers of 1-8) and unknown disguised WIFI device samples (unknown WIFI devices with the numbers of 9-10), and verifying the classification accuracy of the model on the legal WIFI devices and the rejection rate on the unknown disguised WIFI devices.

The method realizes the fusion of the artificial features and the depth features, enriches the feature set and further improves the identification accuracy of the WIFI equipment; in addition, the BP neural network with a single classification function is introduced as a discriminator, on the basis of ensuring classification and identification of authorized legal WIFI equipment with high accuracy, rejection of any unknown equipment without prior information disguised as the legal WIFI equipment can be realized, and open set WIFI equipment identification is realized.

Example 2

Referring to fig. 4, the present embodiment provides an open set WIFI device identification apparatus, which includes the following units:

the signal processing unit 101 is configured to acquire an output signal of an authorized legal WIFI device, preprocess the output signal, extract a preamble, extract artificial features and depth features of the output signal based on the preamble, splice the two types of features to obtain a fusion feature vector, and make the fusion feature vector into a fusion feature vector set of a preset size;

the model training unit 102 is configured to divide the fusion feature vector set into a training set, a verification set and a test set, train a BP neural network model with a custom loss function based on the training set, adjust parameters of the BP neural network model by using the verification set, select the BP neural network model meeting a first preset condition as a discriminator, and set a tag corresponding to the legal WIFI device for the discriminator;

the device identification unit 103 is configured to acquire an output signal of the WIFI device to be identified, query whether a corresponding discriminator exists, if so, extract a fusion feature vector of the output signal, input the fusion feature vector to the discriminator to identify the WIFI device, and if not, consider the WIFI device as an unknown device.

In addition, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium may store a program, and when the program is executed, the program includes some or all of the steps of any open-set WIFI device identification method described in the above method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned memory comprises: a U-disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash memory disks, Read-only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

It should be noted that the above mentioned embodiments are only specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive equivalent changes or substitutions within the technical scope of the present invention.

Claims

1. An open set WIFI equipment identification method is characterized by comprising the following steps:

2. The open-set WIFI device identification method of claim 1, wherein said pre-processing of output signals comprises: down conversion, over sampling, signal detection and interception, energy normalization, frequency offset estimation and compensation.

3. The open-set WIFI device identification method of claim 1, wherein said extracting a preamble from an output signal comprises: presetting a standard lead code sequence, intercepting continuous sequences with the same length as the standard lead code sequence from output signals, calculating the correlation between the intercepted signals and the standard lead code sequence through conjugate multiplication, and extracting the intercepted signals with the correlation meeting a second preset condition as lead codes.

4. The open-set WIFI device identification method of claim 1, wherein said artificial features include: carrier frequency offset, channel estimation characteristics, and singular values based on frequency response.

5. The open-set WIFI apparatus identification method of claim 1, wherein the specific process of extracting the depth features is:

pre-training the auto-encoder model from the preamble set: the self-encoder network consists of an encoder layer and a decoder layer, and the encoder layer learns the compressed representation of input data by using a nonlinear activation function during pre-training; the decoder layer reconstructs original input data, a reconstruction error is calculated, and the self-encoder network selects characteristics of which the information quantity meets a third preset condition based on a back propagation algorithm and an optimization method to form a compressed representation of the input data;

6. The open-set WIFI apparatus identification method of claim 1, wherein the specific process of S2 is as follows:

7. The open-set WIFI device identification method of claim 6 wherein said BP neural network model uses a single classification algorithm optimization objective as a loss function.

8. The open-set WIFI equipment identification method of claim 7, wherein the BP neural network model is composed of an input layer, a hidden layer and an output layer, and the specific process of training the BP neural network model comprises: defining a loss function by self, and defining a single classification algorithm optimization target as a BP neural network loss function; calculating a prediction error based on a custom loss function, and iteratively updating BP neural network parameters by using a back propagation algorithm until convergence requirements are met; and marking the training sample points as normal or abnormal by utilizing a decision function on an output layer, so that the BP neural network has single classification capability.

9. The open-set WIFI apparatus identification method of claim 1, wherein the specific process of S3 is as follows:

s3.1: acquiring an output signal of the WIFI equipment to be identified;

10. An open set WIFI device identification apparatus, comprising:

11. A computer-readable storage medium containing the open-set WIFI device identification method of any one of claims 1 to 9.