CN117272166A

CN117272166A - Distributed optical fiber perimeter security intrusion signal identification method based on cross-model knowledge distillation

Info

Publication number: CN117272166A
Application number: CN202311058305.3A
Authority: CN
Inventors: 陈帅; 厉小润; 王晶; 李东明
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2023-08-22
Filing date: 2023-08-22
Publication date: 2023-12-22

Abstract

The invention provides a distributed optical fiber perimeter security intrusion signal identification method based on cross-model knowledge distillation, and belongs to the field of distributed optical fiber sensing application. The method comprises the following steps: collecting signals of various intrusion events and various weather events by using a distributed optical fiber sensing system, obtaining signal data sets of various intrusion events and various weather events, screening the signal data sets, and constructing a typical event signal data set; extracting time-frequency characteristics of signals and preprocessing the time-frequency characteristics to obtain a time-frequency characteristic data set; constructing an HST-AT model, performing offline training, and taking the optimal model as a teacher model; constructing a student model BC-ResNet, and performing cross-model knowledge distillation by using a teacher model during training to obtain a final optimal model; and identifying the time-frequency characteristic data set to be detected by using the model to obtain a classification result. According to the invention, cross-model knowledge distillation is used, so that the excellent performance of a teacher model transducer architecture is reserved under the condition of not increasing the complexity of a model, and the recognition accuracy is greatly improved.

Description

Distributed optical fiber perimeter security intrusion signal identification method based on cross-model knowledge distillation

Technical Field

The invention belongs to the field of distributed optical fiber sensing application, and particularly relates to a distributed optical fiber perimeter security intrusion signal identification method based on cross-model knowledge distillation.

Background

The distributed optical fiber sensing system is an integrated system integrating optical, electrical and signal processing technologies and is widely applied to the field of safety monitoring in recent years. Wherein, the phase-sensitive optical time domain reflectometerOTDR) is widely applied to different fields such as perimeter security and pipeline transportation security due to the advantages of high precision, simple structure, strong anti-interference capability and the like. Because the environment where the perimeter security is changeable, the rapid and accurate identification of the intrusion event becomes +.>-research hotspots of OTDR technology in the field of perimeter security.

For the purpose ofOTDR technology, the currently prevailing pattern recognition methods are mainly divided into two categories. The first category is to manually select different characteristics of vibration signals, and identify the characteristics through a machine learning algorithm, and the currently mainstream machine learning algorithm includes a Support Vector Machine (SVM), a correlation vector machine, a Gaussian mixture model and the like. The traditional machine learning algorithm has the defects of low recognition accuracy, limited distinction category and weak generalization performance, and is too dependent on manual design of signal characteristics, although the traditional machine learning algorithm has the defects of less required data and low algorithm complexity. The second category is to utilize the deep learning model to automatically extract the characteristics of the signals for classification, compared with the machine learning, the classification accuracy of the deep learning is greatly improved, and the robustness is better. It has been proposed to use short-time fourier transforms to transform the vibration signal into a spectrogram, and then use a two-dimensional convolutional neural network (2D-CNN) for classification recognition. In addition, other deep learning methods such as one-dimensional convolutional neural networks (1D-CNN) and long-term memory networks (LSTM) that are widely used in the audio field also show good results. But with->The wide application of OTDR technology, various complex scenarios present challenges to the accuracy of pattern recognition.

In recent years, a transducer model based on a self-attention mechanism has become a research hotspot in the fields of natural language processing and image processing. Then, someone takes a mel spectrogram of the audio sample as input, and uses an improved ViT (Vision Transformer) algorithm for classification, so that the accuracy of audio classification is greatly improved. The transducer model and the CNN model have respective advantages, and although the transducer can generally obtain better effects than CNN on the premise of sufficient training samples, the transducer model has lower calculation efficiency due to high model complexity, which limits the further application of the transducer model in downstream tasks.

Thus, the prior art is thatThe identification accuracy is lower when the OTDR identification is applied to complex scenes; and when a model with a large number of parameters such as a transducer is applied, the recognition speed cannot meet the requirements of engineering scenes and the like.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a distributed optical fiber perimeter security intrusion signal identification method based on cross-model knowledge distillation to solve the problems. The distributed optical fiber perimeter security intrusion signal identification method based on cross-model knowledge distillation comprises the following steps:

1) The method comprises the steps of paving sensing optical fibers on a fence network of a perimeter, collecting signals of different types of events by using a distributed optical fiber sensing system, obtaining signal data sets of different types of events, screening the signal data sets, and constructing a typical event signal data set; the different types of events comprise various invasion events and various weather events;

2) Extracting time-frequency characteristics of signals in a typical event signal data set and preprocessing to obtain a time-frequency characteristic data set;

3) Constructing an HST-AT model, performing offline training on the HTS-AT model by adopting the time-frequency characteristic data set obtained in the step 2), and taking the optimal model as a teacher model; constructing a student model BC-ResNet, training the BC-ResNet by adopting the time-frequency characteristic data set obtained in the step 2), and performing cross-model knowledge distillation by using a teacher model during BC-ResNet training to obtain a final optimal model;

4) And identifying the time-frequency characteristic data set to be detected by using the final optimal model to obtain a classification result.

As a preferred embodiment of the present invention, the step 1) is: a phase-sensitive optical time domain reflectometer based on coherent detection is used as a distributed optical fiber acquisition system to acquire distributed optical fiber sensing signals in a perimeter security scene; the sampling rate of the acquisition system is sr, and the acquisition system can acquire an original signal waveform consisting of n multiplied by sr points in n seconds of any spatial point; collecting original signal waveforms generated when different types of events occur at a certain spatial point, cutting the original signal waveforms into sample fragments with fixed time length, and constructing sample data sets of the different types of events; and (3) checking sample fragments in the sample data set, and removing sample fragments with the event response waveform fragments accounting for less than 50% to obtain a typical event signal data set.

As a preferred embodiment of the present invention, the different types of events include natural environment, climbing protection net, beating protection net and strong wind.

As a preferred embodiment of the present invention, the step 2) is:

signals in a typical event signal data set are subjected to short-time Fourier transformation to obtain a frequency spectrum, and the obtained frequency spectrum is subjected to a set of Mel filters to obtain a Mel spectrogram, wherein the Mel spectrogram is the extracted time-frequency characteristic.

As a preferred scheme of the invention, the specific implementation mode of obtaining the Mel spectrogram by passing the obtained frequency spectrum through a group of Mel filters is as follows:

the spectrum after short-time Fourier transform is X _t (k) Where t is the number of the time dimension, t=1, 2, …,) L is the window overlap length of the short-time Fourier transform, k is the number in the frequency dimension, k=1, 2, …, N, then the Mel spectrogram Y is calculated _t The formula of (m) is formula (1); wherein H is _m (k) The method is characterized in that the method is a frequency response formula of a Mel filter, m is Mel scale, and N is the number of Mel filter groups;

H _m (k) The conversion formula of the mel scale and the actual frequency can be obtained by the formula (2), wherein f (m) is shown as the formula (3):

as a preferred scheme of the invention, the specific implementation mode of constructing the HST-AT model in the step 3) is as follows:

the HTS-AT model is added with two modules to adapt to the audio classification task on the basis of the Swin transducer model;

the first module is to add a window division operation in the time dimension when dividing an input Mel spectrogram into blocks, and the blocks under the same window are arranged together before the blocks are encoded into a one-dimensional sequence, so that the subsequent one-dimensional encoded sequence carries two-dimensional structure information, and the length of the window and the length and the size of the blocks are set to be proper to determine the length of a single block after encoding;

the second module is a module for generating a prediction classification result, and in order to utilize structural information on a time dimension and a frequency dimension learned by a coding sequence, the HTS-AT model composes a one-dimensional coding sequence into a two-dimensional image according to the transverse and longitudinal proportion of an input spectrogram, acquires characteristic information through a two-dimensional convolution layer, and outputs the prediction classification result after an average pooling operation.

In summary, compared with the prior art, the technical proposal of the invention has the following advantages:

(1) The distributed optical fiber perimeter security intrusion signal identification method based on cross-model knowledge distillation provided by the invention uses the Mel frequency spectrum to extract the characteristics of the signals, is simpler and more convenient than a method for extracting the characteristics by utilizing machine learning, and greatly reduces the complexity of preprocessing.

(2) The invention selects BC-ResNet as a student model, and the operation of dimension reduction is added, so that compared with the traditional ResNet parameter quantity, the method can greatly improve the recognition speed.

(3) The present invention uses cross-model knowledge distillation to allow BC-ResNet to obtain excellent recognition performance of HTS-AT models using a transducer structure. Compared with the method without using cross-model knowledge distillation training, the method improves the recognition accuracy, and compared with the method which uses HTS-AT model alone, the method greatly improves the recognition speed.

Drawings

Fig. 1 is a flowchart of a distributed optical fiber perimeter security intrusion signal identification method based on cross-model knowledge distillation provided by an embodiment of the invention.

Fig. 2 is an experimental scene graph for data acquisition provided by an embodiment of the present invention.

Fig. 3 is a waveform diagram of four kinds of original signals in a data set according to an embodiment of the present invention.

Fig. 4 is a spectrum diagram of four types of signals in a data set after preprocessing according to an embodiment of the present invention.

Fig. 5 is a schematic diagram of a model structure of an HTS-AT according to an embodiment of the present invention.

Fig. 6 is a schematic structural diagram of two modules in the BC-ResNet model according to an embodiment of the present invention.

Fig. 7 is a schematic diagram of cross-model knowledge distillation provided by an embodiment of the invention.

Detailed Description

In order to make the purposes, technical schemes and advantages of the invention more clear, the distributed optical fiber perimeter security intrusion signal identification method based on cross-model knowledge distillation provided by the invention is described in detail below with reference to the accompanying drawings and the embodiment. It should be noted that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The embodiment of the invention discloses a distributed optical fiber perimeter security intrusion signal identification method based on cross-model knowledge distillation, which comprises the following steps as shown in a flow chart in fig. 1:

step 1: collecting signals of different types of events by using a distributed optical fiber sensing system to obtain signal data sets of different types of events, screening the signal data sets, and constructing a typical event signal data set; the different types of events comprise various invasion events and various weather events;

step 2: extracting time-frequency characteristics of signals in a typical event signal data set and preprocessing to obtain a time-frequency characteristic data set;

step 3: constructing an HST-AT (Hierarchical Token-Semantic Audio Transformer) model, performing offline training on the HTS-AT model by adopting the time-frequency characteristic data set obtained in the step 2, and taking the optimal model as a teacher model; constructing a student model BC (BroadCasted) -ResNet, training the BC-ResNet by adopting the time-frequency characteristic data set obtained in the step 2, and performing cross-model knowledge distillation by using a teacher model during BC-ResNet training to obtain a final optimal model;

step 4: and identifying the time-frequency characteristic data set to be detected by using the final optimal model to obtain a classification result.

Wherein step 1 comprises:

step 1-1: and a phase-sensitive optical time domain reflectometer based on coherent detection is used as a distributed optical fiber acquisition system to acquire distributed optical fiber sensing signals in a perimeter security scene. The sampling rate of the acquisition system is sr, and the acquisition system can acquire an original signal waveform consisting of (n multiplied by sr) points within n seconds of any spatial point. And collecting original signal waveforms generated when different types of events occur at a certain spatial point, and cutting the original signal waveforms into 3 seconds of sample fragments to construct a sample data set of the different types of events.

Step 1-2: and (3) checking sample fragments in the sample data set, removing samples with the event response waveform fragments accounting for less than 50%, and screening to obtain a typical event signal data set.

In an embodiment, the application scenario of the embodiment is perimeter security, as shown in fig. 2; the four types of events are natural environment, climbing protection net, beating protection net and strong wind, the number of samples of different types in the data set is shown in table 1, and the number ratio of the data set for training and verification is 8:2.

table 1 number of samples for each class

Fig. 3 is a graph of four types of typical signals, wherein the sampling rate of the acquisition system is 1600, the length of the selected sample is 3 seconds, and the samples in the graph are all samples with the event response waveform accounting for 100%. When no event occurs, that is, the overall strength of the natural environment type signal is smaller, so that according to the strength change of the signal waveform, which section of the waveform is the event response waveform can be judged.

The step 2 comprises the following steps:

step 2-1: signals in a typical event signal data set are subjected to short-time Fourier transformation to obtain a frequency spectrum, and the obtained frequency spectrum is subjected to a set of Mel filters to obtain a Mel spectrogram, wherein the Mel spectrogram is the extracted time-frequency characteristic.

Wherein the specific implementation mode of obtaining the Mel spectrogram by passing the obtained frequency spectrum through a group of Mel filters is to assume that the frequency spectrum after short-time Fourier transform is X _t (k) Wherein t is the number of the time dimensionAnd l is the window overlap length of the short-time Fourier transform, k is the number (k=1, 2, …, N) in the frequency dimension, and the formula for calculating the Mel spectrogram is shown in formula (1). Wherein H is _m (k) Is the frequency response formula of the Mel filter, m is Mel scale, N is Mel filterA number of;

H _m (k) The equation (2) can be used to calculate f (m) which is the conversion equation of the mel scale and the actual frequency, as shown in equation (3).

In the embodiment, the type of the window used in the short-time fourier transform is hanning window, the window length is 25, the overlap is 40%, and 32 mel filters are set. These parameters can be adjusted according to the actual classification result, so that the spectrum can extract the characteristics of the original signal to the greatest extent.

Step 2-2: in order to accelerate the training of the subsequent neural network, the data set needs to be normalized according to the training batch before training, and the normalization formula is shown as formula (4), wherein x is as follows _i For the ith sample before normalization in the same training batch, mean (x) is the mean of the batch data, var (x) is the variance of the batch data, y _i Is normalized data. Fig. 4 is a graph of a pre-processed mel-frequency spectrum of four types of typical signals.

The step 3 comprises the following steps:

step 3-1: and (3) constructing an HST-AT model, performing offline training on the HTS-AT by adopting the time-frequency characteristic data set obtained in the step (2), and taking the optimal model as a teacher model.

The model structure of the HTS-AT is shown in fig. 5, wherein the HTS-AT model is mainly added with two modules to adapt to the audio classification task on the basis of a Swin transducer model. The first module is to add a windowing operation in the time dimension when dividing the input mel spectrum into tiles, and allow the tiles under the same window to be arranged together before the tiles are encoded into a one-dimensional sequence, so that the subsequent one-dimensional encoded sequence can carry certain two-dimensional structural information. And setting a proper window length, the length of the segmented image block and the image block size, and determining the encoded length of the single image block. The second module is a module for generating a prediction classification result, and in order to learn structural information in time dimension and frequency dimension by using a coding sequence, the HTS-AT model forms a two-dimensional image by a one-dimensional coding sequence according to the transverse and longitudinal proportion of an input spectrogram, acquires characteristic information by a two-dimensional convolution layer, and outputs the prediction classification result after an average pooling operation.

Step 3-2: and (3) constructing a student model BC-ResNet, training the BC-ResNet by adopting the time-frequency characteristic data set obtained in the step (2), and performing cross-model knowledge distillation by using a teacher model during BC-ResNet training to obtain a final optimal model.

The student model BC-ResNet designs two residual network modules, namely a Transition Block and a Normal Block, the module structures of which are shown in figure 6 respectively, and the specific operations of which are shown in formula (5) and formula (6).

y＝f ₂ (x)+f _BC (f ₁ (avgpool(f ₂ (x)))) (5)

y＝x+f ₂ (x)+f _BC (f ₁ (avgpool(f ₂ (x)))) (6)

Where x is the input of the module, y is the output of the module, f ₂ The method is a two-dimensional convolution operation and consists of two-dimensional convolution layers, a normalization layer and an activation function. As shown in fig. 6, avgpool is an average pooling operation, which is performed on the input of the module after the two-dimensional convolution operation in columns, and the dimension of the feature after the two-dimensional convolution operation in the frequency dimension is reduced to 1, namely, downsampling. f (f) ₁ Is a two-dimensional convolution operation directed to the downsampled features. f (f) _BC Is a broadcast residual operation, i.e. the convolved downsampled features are added row by row to the two-dimensional features. And setting proper convolution layer parameters, so that the transform Block performs downsampling operation on the input, and the Normal Block keeps the size of the input unchanged.

The BC-ResNet consists of a downsampling layer, a body structure and an output layer. The downsampling layer is a two-dimensional convolution layer, the main structure comprises four parts, wherein the first two parts are composed of a Transition Block and a Normal Block, the third part is composed of the Transition Block and five Normal blocks, and the fourth part is composed of the Transition Block and three Normal blocks. The output layer consists of two-dimensional convolution layers and an average pooling layer.

In particular to the example, the structure of the constructed BC-ResNet is shown in Table 2, where F and T represent the frequency dimension size and the time dimension size of the input features.

TABLE 2BC-ResNet Structure

Further, the specific implementation manner of using the teacher model to perform cross-model knowledge distillation in the BC-ResNet training in the step 3-2 is as follows: when using cross-model knowledge distillation training, the training loss is calculated as shown in equation (7):

Loss _Total ＝λLoss _d (ψ(Z _s /T),ψ(Z _t /T))+(1-λ)Loss _g (ψ(Z _s ),y) (7)

wherein lambda is an adjustable balance coefficient, loss _d And Loss of _g Respectively calculating loss functions used by the loss values of the student model according to the soft target and the real target, Z _s And Z _t Logits output of student model and teacher model respectively, ψ is an activation function, T is a temperature coefficient, y is a real label marked by a data set, namely a real target, ψ (Z _t T) is the output of the teacher model logits divided by the temperature coefficient and then through the activation function, namely the soft target; psi (Z) _S and/T) is the output of student model logits divided by the temperature coefficient and then through an activation function. The loss function selects cross informationThe entropy loss function and the optimizer adopt an Adam algorithm. A schematic diagram of distillation across model knowledge is shown in fig. 7.

Further, the soft target is specifically implemented by selecting a Softmax function as an activation function, ψ (Z _t /T)＝{p ^T ₁ ,p ^T ₂ ,…,p ^T _M And (3) when M is the total number of categories, the specific calculation formula of the soft target of the ith category is formula (8) when the temperature coefficient is equal to T. v _i Output of logits in class i for teacher model, v _k The logic output at class k for the teacher model.

Specifically, in the examples, recognition results using cross-model knowledge distillation training and not using the method were compared with a conventional convolutional network, and the comparison results are shown in table 3.

Table 3 identification results

Compared with the prior art, the method provided by the embodiment of the invention has the advantages that the feature extraction mode is simple, the method can be suitable for different complex scenes, and the recognition performance of the selected student model is better; compared with a student model 1D-CNN using a traditional training mode, the accuracy is further improved in the aspect of complex type recognition; compared with a method using a Transfomer model with large parameter, the method has the advantages of higher recognition speed and better compliance with actual engineering requirements.

The purpose, technical proposal and advantages of the invention can be more clearly understood by the description of the drawings shown in the embodiments of the invention. It should be noted that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All equivalent substitutions, modifications and the like within the spirit and principles of the method provided by the present invention should be included in the scope of the present invention.

Claims

1. A distributed optical fiber perimeter security intrusion signal identification method based on cross-model knowledge distillation is characterized by comprising the following steps:

2. The distributed optical fiber perimeter security intrusion signal identification method based on cross-model knowledge distillation as claimed in claim 1, wherein the step 1) is as follows:

a phase-sensitive optical time domain reflectometer based on coherent detection is used as a distributed optical fiber acquisition system to acquire distributed optical fiber sensing signals in a perimeter security scene; the sampling rate of the acquisition system is sr, and the acquisition system can acquire an original signal waveform consisting of n multiplied by sr points in n seconds of any spatial point; collecting original signal waveforms generated when different types of events occur at a certain spatial point, cutting the original signal waveforms into sample fragments with fixed time length, and constructing sample data sets of the different types of events; and (3) checking sample fragments in the sample data set, and removing sample fragments with the event response waveform fragments accounting for less than 50% to obtain a typical event signal data set.

3. The distributed optical fiber perimeter security intrusion signal identification method based on cross-model knowledge distillation according to claim 1, wherein the different types of events comprise natural environment, climbing protection net, beating protection net and strong wind.

4. The distributed optical fiber perimeter security intrusion signal identification method based on cross-model knowledge distillation as claimed in claim 1, wherein the step 2) is as follows:

5. The method for identifying distributed optical fiber perimeter security intrusion signals based on cross-model knowledge distillation according to claim 4, wherein the specific implementation manner of obtaining a mel spectrogram from the obtained frequency spectrum through a set of mel filters is as follows:

the spectrum after short-time Fourier transform is X _t (k) Where t is the number of the time dimension, l is the window overlapping length of the short-time Fourier transform; k is the number in the frequency dimension, k=1, 2, …, N, then the mel spectrogram Y is calculated _t The formula of (m) is formula (1); wherein H is _m (k) The frequency response formula of the Mel filter is that m is Mel scale, and N is the number of Mel filters;

6. the distributed optical fiber perimeter security intrusion signal identification method based on cross-model knowledge distillation as claimed in claim 1, wherein the specific implementation manner of constructing the HST-AT model in the step 3) is as follows:

7. The distributed optical fiber perimeter security intrusion signal identification method based on cross-model knowledge distillation as claimed in claim 1, wherein the specific implementation manner of constructing the student model BC-ResNet is as follows:

two residual network modules, namely a Transition Block and a Normal Block, are designed as shown in a formula (5) and a formula (6) respectively:

y＝f ₂ (x)+f _BC (f ₁ (avgpool(f ₂ (x)))) (5)

y＝x+f ₂ (x)+f _BC (f ₁ (avgpool(f ₂ (x)))) (6)

where x is the input of the module, y is the output of the module, f ₂ The method is a two-dimensional convolution operation and consists of two-dimensional convolution layers, a normalization layer and an activation function; the avgpool is an average pooling operation, the average pooling operation is carried out on the input of the module after the two-dimensional convolution operation according to the columns, and the dimension of the characteristic after the two-dimensional convolution operation in the frequency dimension is reduced to 1, namely the downsampling; f (f) ₁ To perform a two-dimensional convolution operation on the downsampled features, f _BC Operating for broadcast residual errors; setting proper convolution layer parameters, enabling a transform Block to perform downsampling operation on an input, and enabling a Normal Block to keep the size of the input unchanged;

the BC-ResNet consists of a downsampling layer, a main body structure and an output layer; the downsampling layer is a two-dimensional convolution layer, the main structure comprises four parts, wherein the first two parts are composed of a Transition Block and a Normal Block, the third part is composed of the Transition Block and five Normal blocks, and the fourth part is composed of the Transition Block and three Normal blocks; the output layer consists of two-dimensional convolution layers and an average pooling layer.

8. The distributed optical fiber perimeter security intrusion signal identification method based on cross-model knowledge distillation as claimed in claim 1, wherein the specific implementation mode of using a teacher model for cross-model knowledge distillation during BC-ResNet training is as follows:

when using cross-model knowledge distillation training, the training loss is calculated as shown in equation (7):

wherein lambda is an adjustable balance coefficient, loss _d And Loss of _g Respectively calculating loss functions used by the loss values of the student model according to the soft target and the real target, Z _s And Z _t Logits output of student model and teacher model respectively, ψ is an activation function, T is a temperature coefficient, y is a real label marked by a data set, namely a real target, ψ (Z _t T) is the output of the teacher model logits divided by the temperature coefficient and then through the activation function, namely the soft target; psi (Z) _S and/T) is output of student model logits after dividing by a temperature coefficient and through an activation function, the loss function is selected from a cross information entropy loss function, and the optimizer is selected from an Adam algorithm.

9. The distributed optical fiber perimeter security intrusion signal identification method based on cross-model knowledge distillation as claimed in claim 8, wherein the soft target is calculated by the following steps:

the activation function psi is selected from Softmax function, psi (Z _t /T)＝{p ^T ₁ ,p ^T ₂ ,…,p ^T _M And (3), wherein M is the total number of categories, and the specific calculation of the soft target of the ith category when the temperature coefficient is equal to T is shown in a formula (8):

wherein v is _i Output of logits in class i for teacher model, v _k The logic output at class k for the teacher model.