Disclosure of Invention
The invention provides a Vision Transformer-based distributed optical fiber sensor pattern recognition method for making up the defects in the prior art, and introduces an attention mechanism algorithm into the field of distributed optical fiber sensors
A method for mode identification of a distributed optical fiber vibration sensor is characterized by comprising the following steps:
step 1: and preparing the distributed optical fiber sensor system.
Step 2: and acquiring data signals, and constructing data sets of different events.
And step 3: and performing noise reduction processing on the signal data.
And 4, step 4: the signal data is converted to a time domain map and a time-frequency domain map of the corresponding event.
And 5: and constructing a deep learning network based on the Vision Transformer.
Step 6: and performing identification and classification.
Further, the step 1 specifically includes the following steps:
step 1.1: in the technical scheme of the distributed optical fiber sensor, an optical time domain reflectometer-based scheme is selected as the technical scheme of the distributed optical fiber sensor.
Step 1.2: preparing narrow line laser, coupler, acousto-optic modulator, first erbium-doped amplifier, band-pass filter, circulator, second erbium-doped amplifier, tunable optical attenuator, photoelectric detector, data acquisition card, personal computer and single-mode single-side optical fiber.
Step 1.3: and assembling the distributed optical fiber sensor system to prepare for event data acquisition under various application scenes.
Further, the step 2 specifically includes the following steps:
step 2.1: the distributed optical fiber sensing system is arranged in a scene according to the scene to be detected and applied. And acquiring scene corresponding event data according to different scenes. When data are collected, relevant parameters such as corresponding sampling frequency and pulse width are set according to the distributed optical fiber sensing system.
Step 2.2: and when the data set is collected, recording the data change condition of the distributed optical fiber sensing channel under the action of an event.
Step 2.3: and storing and backing up the channel data with the signal intensity changed under the action of the event of the last step.
Further, the step 3 specifically includes the following steps:
step 3.1: and (3) extracting the event data acquired in the step (2) from the channel.
Step 3.2: and filtering the collected different event data through a filter.
Step 3.3: and 3.2, performing noise reduction processing on the data in the step 3.2 through wavelet denoising, performing wavelet transformation on the signals by setting a specific wavelet basis function, decomposing the signals into a plurality of scales, removing or correcting part of scale components to reconstruct the signals according to the difference of noise and signal values on the scales, judging the noise by using a wavelet coefficient obtained after decomposition, wherein the wavelet coefficient of the noise signals is usually small, the noise can be removed through a method of setting a threshold, when the wavelet coefficient is smaller than the threshold, the noise signals are judged, and otherwise, the noise signals are judged to be effective signals.
Further, the step 4 specifically includes the following steps:
step 4.1: and (3) converting the data subjected to noise reduction pretreatment in the step (3) into time domain graphs corresponding to various events in batches according to the sampling frequency and time set during event data acquisition. The time domain graph mainly refers to the change situation of the signal intensity along with the time, the characteristics in the time domain signal are visual and obvious, and the intrusion events can be distinguished by counting certain regular change of the signal in the time domain within a certain time.
Step 4.2: and (3) converting the data subjected to noise reduction preprocessing in the step (3) into a time-frequency domain graph corresponding to various events through short-time Fourier transform according to the sampling frequency and time set during event data acquisition. The time-frequency domain diagram not only includes the spectrum characteristics in a certain time, but also includes the time variation of each frequency band.
Further, the step 5 specifically includes the following steps:
step 5.1: and (4) performing labeling operation on the time domain graph of the event processed in the step (4), and dividing the time domain graph into a training set, a checking set and a testing set according to the ratio of 8:1: 1.
Step 5.2: and (4) performing labeling operation on the event time-frequency domain graph processed in the step (4), and dividing the event time-frequency domain graph into a training set, a checking set and a testing set according to the ratio of 8:1: 1.
Step 5.3: and constructing a Vision Transformer deep learning image classification model based on a time domain graph and a time-frequency domain graph data set of the event data, and setting Vision Transformer network model parameters.
Further, the network model in step 5.3 specifically includes: an Embellding layer, a Transformer Encoder layer and an MLP Head layer.
In the Embedding layer, the following are included: convolutional layers, linear mapping layers, Class token layers, Position Embedding layers, and Dropout layers.
In the transform Encoder layer: the method comprises the following steps: layer Norm Layer, multi-head attention Layer, DropPath Layer, MLP Block Layer, wherein MLP Block Layer comprises full connection Layer, GELU activation function Layer, Dropout Layer.
In the MLP Head layer, include: a fully connected layer, a tanh activation function layer.
Further, the step 5.3 specifically comprises the following steps:
step 5.3.1: and initializing Vision Transformer network model parameters.
Step 5.3.2: the three-dimensional RGB image is flattened into a two-dimensional matrix, i.e., a sequence of vectors, using convolutional layers.
Step 5.3.3: and (4) flattening the two-dimensional matrix in the step 5.3.2, wherein a Flatten layer corresponds to the two-dimensional matrix. The flattening process does not affect the size of the batch, and the purpose is to change the high latitude array into a one-dimensional vector sequence.
Step 5.3.4: in order to enable supervised learning, the input picture of the neural network can be marked, and a Class token layer is introduced, wherein the Class token is a trainable parameter and has the same vector series format as the step 5.3.3. The Class token is spliced together with the vector sequence in step 5.3.3.
Step 5.3.5: in order to determine the Position relationship in the vector sequence, a Position Embedding layer is used, which uses a trainable parameter and is superimposed with the final vector sequence of step 5.3.4.
Step 5.3.6: the final sequence of vectors in step 5.3.5 is passed through a Dropout layer. The Dropout layer can reduce coupling between neurons, enable each neuron to extract proper characteristics by itself, and also can perform integration of networks, wherein the networks are different in each training process, and overfitting can be prevented.
Step 5.3.7: through a transform Encoder layer. The network layer is the core backbone part of the algorithm.
A further step 5.3.7 specifically comprises the steps of:
step 5.3.7.1: first, a Layer Normalization Layer is formed. Batch Normalization is to perform Norm processing on each channel of one Batch data, but Layer Normalization is to perform Norm processing on a specified dimension of a single data independent of Batch. The Layer Normalization Layer normalizes the data, that is, normalizes the data at an angle or a level to mean 0 and to have a variance of 1. The gradient can be enlarged through the Layer Normalization Layer, the problem of gradient disappearance is avoided, the training time of the neural network can be accelerated, the learning rate and the initialization weight can be higher easily, more loss functions can be supported, and the like.
Step 5.3.7.2: through the Multi-Head Attention layer. The multi-head attention layer is developed on the basis of the self-attention layer. Learning information from different modules can be combined using a multi-headed attention mechanism. And (3) respectively passing the input vector ai through Wq, Wk and Wv to obtain corresponding qi, ki and vi, and further dividing the obtained qi, ki and vi into h parts by the used head number h. The corresponding head formula is:
obtaining a corresponding result by using a self-attention mechanism for each head by obtaining Qi, Ki, Vi parameters corresponding to each head, wherein a corresponding formula in the self-attention mechanism is as follows:
and splicing the results obtained by each head, and fusing the spliced results through Wo (learnable parameter) to obtain the final result. The corresponding formula is:
step 5.3.7.3: the same Dropout layer is passed as in step 5.36.
Step 5.3.7.4: and connecting the data before the step 5.3.7.1 and the data after the step 5.3.7.3 through the authorized shortcut. Simply increasing the depth of the network cannot simply improve the effect of the network, but may damage the effect of the model due to gradient divergence, and introducing shortcut can solve the degradation problem in the depth model.
Step 5.3.7.5: the same Layer Normalization Layer as that in step 5.3.7.1 is passed, so as to enlarge the gradient, avoid the problem of gradient disappearance and accelerate the training time of the neural network.
Step 5.3.7.6: through the MLP Block layer. The MLP Block layer mainly comprises a full connection layer, a GELU activation function layer and a Dropout layer.
A further step 5.3.7.6 includes the steps of:
step 5.3.7.6.1: and (4) integrating the features extracted in the previous step of the data through a full connection layer, and mapping the feature representation learned by the multi-head attention mechanism into a sample mark space.
Step 5.3.7.6.2: pass through the GELU activation function layer. The function of the activation function is to add a nonlinear factor aiming at the condition that the expression capability of a linear model is not enough. The formula of the corresponding GELU activation function is:
GELU(X)=x×P(X<=x)=x×φ(x),x~N(0,1)
where X is the corresponding input value and X is a gaussian random variable with zero mean and unit variance. P (X < ═ X) is the probability that X is less than or equal to a given value X. A
Step 5.3.7.6.3: through the Dropout layer, coupling between neurons is reduced, preventing over-fitting.
Step 5.3.7.6.4: through the full link layer in step 5.3.7.6.1.
Step 5.3.7.6.5: over-fitting is prevented via Dropout at step 5.3.7.6.3.
Step 5.3.8: through the same Layer Normalization Layer as in step 5.3.7.1. The problem that the gradient disappears after the characteristics are extracted in the previous step is avoided, and the time for training the neural network is shortened.
Step 5.3.9: and (4) extracting the Class Token layer, and extracting the Class Token layer in the step 5.3.3 to obtain the marked sample information.
Step 5.3.10: and (4) integrating the features extracted in the previous step of the data through a full connection layer, and mapping the feature representation learned by the multi-head attention mechanism into a sample mark space.
Step 5.3.11: the function layer is activated via tanh. The Tanh activation function is centered on the origin, the convergence rate is faster, and the corresponding formula is:
step 5.3.12: and (4) integrating the characteristics in the steps through a full connection layer to prepare for the output result of the next step.
Step 5.3.13: and outputting the result.
Step 5.4: and (4) training the Vision Transformer network model obtained in the step 5.3.
Step 5.5: and (5) performing network tuning on the trained Vision Transformer network model, if the optimal parameters are found, saving the model with the best result as the model for final event recognition, and otherwise, skipping to the step 5.4 and then performing network model training.
The invention has the beneficial effects that:
the invention has simple preprocessing steps and does not carry out a large amount of data extraction operation on the data. The originality of the data is guaranteed and the loss of the data is prevented. The invention combines a Vision Transformer algorithm and a distributed optical fiber sensor event image for the first time and is used for event classification under scenes. The Vision Transformer algorithm is an attention-based algorithm, and has the advantages that the Vision Transformer algorithm is different from the conventional convolutional neural network, the identification effect on a large-scale data set is higher, the global features can be extracted at one time, and the like. In the invention, the time domain graph and the time domain graph in the distributed optical fiber sensing are respectively used as data sets of a Vision Transformer algorithm, so that the effect of comparison can be achieved.
Detailed description of the invention
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely a few embodiments of the invention, and not all embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the present invention, it should be noted that the terms "middle", "upper", "lower", "horizontal", "inner", "outer", and the like indicate orientations or positional relationships based on orientations or positional relationships shown in the drawings or orientations or positional relationships conventionally laid out when products of the present invention are used, and are only used for convenience in describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like are used merely to distinguish one description from another, and are not to be construed as indicating or implying relative importance.
Furthermore, the terms "horizontal", "vertical" and the like do not imply that the components are required to be absolutely horizontal or pendant, but rather may be slightly inclined. For example, "horizontal" merely means that the direction is more horizontal than "vertical" and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.
In the description of the present invention, it should be noted that the terms "disposed," "connected," and "connected" are to be construed broadly and may be, for example, fixedly connected, detachably connected, or integrally connected unless otherwise explicitly stated or limited. Either mechanically or electrically. They may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
Fig. 1-5 show an embodiment of the present invention, which takes an intrusion detection application of a highway bridge segment as an example, and a method for identifying a distributed optical fiber sensor pattern based on Vision Transformer, where the overall flow is as shown in fig. 1, and a specific signal processing and deep learning network structure is as shown in fig. 2.
Step 1: and preparing the distributed optical fiber sensing system. In this example, a distributed optical fiber sensing system based on a phase-sensitive optical time domain reflectometry (Φ -OTDR) technique is selected, and the main devices used in this system are: the device comprises a narrow-line laser, a coupler, an acousto-optic modulator, a first erbium-doped amplifier, a band-pass filter, a circulator, a second erbium-doped amplifier, a tunable optical attenuator, a balance detector, a data acquisition card, a personal computer, a single-mode single-side optical fiber and the like. The mechanism and working principle of the system are shown in figure 3. A narrow-line laser with the line width of 5kHz is used as a light source, the light source output by the narrow-line laser is decomposed into 95:5 light path branches through a coupler, the light path branches are arranged on the upper branch, continuous waves are modulated through an acousto-optic modulator, and an optical pulse sequence with the frequency shift of 60mhz is generated. And then the light source is amplified by the first erbium-doped amplifier to prevent the loss of the light source in the transmission process, noise is reduced by the 0.8nm passband filter, the light pulse is transmitted into the single-mode fiber after passing through the circulator, if different events act on the single-mode fiber, the Rayleigh backscattering trajectory is further amplified by the second erbium-doped amplifier, noise reduction is performed by the 0.8nm passband filter, the upper branch light path and the lower branch light path are combined when passing through the second coupler, the light source performs conversion of optical signals and electric signals by the photoelectric detector, and then event data is acquired by the bit data acquisition card 14. And finally storing the event signal through a personal computer or a storage device.
Step 2: and acquiring field data of the high-speed bridge road section. The distributed optical fiber sensing system is arranged on a high-speed bridge section, and six events, namely automobile horn, automobile impact (simulation), spade excavation, pedestrian walking and rain, are collected on site after the 2KM single-mode optical fiber surrounds the high-speed bridge. The sampling frequency of the distributed optical fiber sensing system is 6KHz, and the event acquisition time of each event is 5-8 min.
And step 3: and (2) according to the signal parameters acquired by the high-speed bridge section and the single-mode optical fiber event occurrence channel condition in the step (2), classifying the acquired signals according to event types, extracting time data in a channel in the single-mode optical fiber, and performing noise reduction by combining a wavelet denoising method.
And 4, step 4: and (3) converting the data processed in the step (3) into time domain graphs corresponding to the events collected on site within 3 seconds in batches according to the parameter condition during data collection, wherein the time domain graphs of the six events collected on site are shown in FIG. 4. According to the same method, the data processed in the step 3 are converted into time-frequency domain graphs corresponding to the data within 3 seconds in batches according to the parameter condition during data sampling. A time-frequency domain plot of the six live acquisition events is shown in fig. 5.
And 5: and (3) constructing an algorithm according to a Vision transform algorithm structure in the figure 2, and training and iterating the network according to the collected time domain graph and time frequency domain graph data sets of the highway bridge sections according to the proportion of a training set, a test set and a test set which is 8:1: 1. And storing the parameters corresponding to the optimal one time in multiple iterations as the pre-training weights in step 6.
Step 6: event identification and classification: and (5) after the pre-trained network weights in the step (5) are stored, inputting a new batch of data sets into a network algorithm to obtain a final event recognition result.
Finally, the above embodiments are only used for illustrating the technical solutions of the present invention and not for limiting, and other modifications or equivalent substitutions made by the technical solutions of the present invention by those of ordinary skill in the art should be covered within the scope of the claims of the present invention as long as they do not depart from the spirit and scope of the technical solutions of the present invention.