CN111572809A

CN111572809A - Remote helicopter rotor sound detection method based on time-frequency analysis and deep learning

Info

Publication number: CN111572809A
Application number: CN202010240367.6A
Authority: CN
Inventors: 王秋然; 郭磊; 林啸宇
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2020-08-25

Abstract

The invention discloses a helicopter rotor sound detection method based on time-frequency analysis and deep learning, and aims to solve the problem that a radar detection method has a blind area in helicopter detection. The method comprises the following four steps: firstly, preprocessing a sound signal, then, performing time-frequency analysis on the preprocessed signal to obtain a time-frequency image, then, detecting whether a target sound exists in the sound by using a detection network based on deep learning, and finally, storing the result into a database. The detection network firstly utilizes the convolutional neural network to extract the characteristics, and then the characteristics are sent to the cyclic neural network for detection. And after the detection is finished, judging whether the algorithm is finished, if not, continuously running the algorithm to detect the external sound. The invention is helpful for the supervision department to automatically detect and manage the long-distance helicopter flying illegally, and maintain the order and the safety.

Description

Remote helicopter rotor sound detection method based on time-frequency analysis and deep learning

Technical Field

The invention relates to a method for detecting rotor sound of a helicopter in a long-distance and stable flight state.

Background

The use of the present helicopter is widely applied to various fields, but the supervision of the helicopter becomes a problem in the use process. Especially in the field environment, a helicopter flying without approval may pose a threat to the flight safety in airspace, even homeland safety. Radar detection techniques fail due to the helicopter flying at low altitudes. The helicopter sound rotor has the characteristics of larger sound, longer propagation range, no barrier shielding and the like, and can be used for monitoring and detecting the helicopter sound rotor.

The aerodynamic sounds of helicopters originate mainly from the rotor system. Generally, in smooth flight, the rotating sound dominates the low frequency part of the spectrum. The rotating sound is a periodic harmonic noise whose spectrum is composed of harmonics of the orders of the rotor passing frequency.

At present, the voice detection algorithm is mainly applied to voice detection and voice detection in living environment, and is characterized by high signal-to-noise ratio and rich frequency domain components. The helicopter rotor sound that flies steadily in the field is characterized by low signal-to-noise ratio and relatively few frequency domain components. The detection effect of the existing method on the field stable flying helicopter rotor sound is not ideal.

Therefore, the research on the detection algorithm of the rotor sound of the helicopter flying in the field stably becomes an urgent problem to be solved for realizing the monitoring and detection of the helicopter flying illegally.

Disclosure of Invention

In order to solve the technical problems in the prior art, the embodiment of the invention provides a rotor sound detection algorithm based on time-frequency analysis and deep learning, which can realize detection of rotor sound of a helicopter flying stably in the field so as to facilitate monitoring of relevant supervision departments.

The invention provides a sound detection algorithm for detecting the sound of a helicopter rotor flying stably in the field, which comprises three parts: preprocessing, time-frequency analysis and network detection;

the preprocessing comprises sampling, filtering and fragmenting environmental sounds in sequence;

the environmental sound is sound in a field environment, wherein the environmental sound must include environmental noise and may include sound of a helicopter rotor flying smoothly;

generally, when the environmental noise contains the sound of the helicopter rotor, the sound energy of the helicopter rotor is lower than that of the environmental noise;

the sampling is to perform digital sampling on signals collected by a microphone, and the sampling frequency is more than 4 times of the frequency range to be measured;

the filtering is band-pass filtering, the minimum rotating speed of a rotor wing required by the takeoff of the helicopter is used as the lower boundary of the filter, and the maximum rotating speed of the rotor wing when the helicopter flies stably is used as the upper boundary of the filter;

the slicing is to divide an infinite sound signal into mutually overlapped sound signal segments, and the length of the mutual overlapping in time is not less than half of the length of a single sound signal segment.

The time-frequency analysis is to perform time-frequency analysis processing on the sound signal segment to form a time-frequency image;

the time-frequency image is a floating-point matrix after time-frequency transformation or a single-channel image converted after processing.

Optionally, the time-frequency analysis processing includes:

short-time Fourier transform, Wigner-Ville transform and Choi-Williams transform;

but not Mel frequency cepstral transform.

The detection network is a deep neural network formed by comprehensively utilizing a convolutional neural network and a cyclic neural network;

the input of the convolutional neural network is a time-frequency image, and the output of the convolutional neural network is a time-frequency characteristic;

the input of the recurrent neural network is a time-frequency characteristic, and the output of the recurrent neural network is a detection result presented in a one-hot coding probability form;

the time-frequency image is composed of sound frequency domain vectors which are arranged in time sequence;

the time-frequency characteristics are composed of characteristic vectors which are arranged in a time sequence, and the time sequence is consistent with the time sequence of the time-frequency images;

the time sequence is consistent, the sequence of the feature vectors in time is the same as that of the time positions corresponding to the time-frequency image, but the feature vectors can not exist at certain time points, and the frequency-domain vectors exist in the time-frequency image at the corresponding time positions.

In conclusion, the invention provides a sound detection algorithm based on a time-frequency analysis method and a deep learning method, which is used for detecting the sound of the rotor wing of the helicopter flying illegally in the field. Instead of Mel cepstrum transformation, other common time frequency transformation may be used. The deep learning method firstly utilizes the convolutional neural network to extract the characteristic vector of each moment and then utilizes the cyclic neural network to detect.

The technical scheme of the invention can detect the rotor sound of the helicopter flying illegally in the field, improve the working efficiency of relevant supervision and management departments and reduce the supervision cost.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly introduced below.

FIG. 1 is a flowchart of an embodiment of a rotor sound detection algorithm for field smooth flight according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating an embodiment of step S11 shown in FIG. 1;

FIG. 3 is a flowchart illustrating an embodiment of step S13 shown in FIG. 1;

FIG. 4 is a flowchart illustrating an embodiment of step S1301 shown in FIG. 3;

fig. 5 is a flowchart illustrating an embodiment of step S1303 illustrated in fig. 3;

detailed description of the preferred embodiments

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and the embodiments are only a part of the embodiments of the present invention, but not all the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, belong to the scope of the present invention.

The method is mainly applied to illegal detection of the field helicopter, and is convenient for relevant departments to supervise and manage.

The algorithm flow of rotor sound detection for field flight according to the embodiment of the present invention will be described with reference to fig. 1. The algorithm is divided into four parts. Step S11 is to preprocess the collected environmental sound, step S12 is to analyze the preprocessed data to obtain a time-frequency diagram, step S13 is to detect the time-frequency diagram by using a detection network, the obtained detection result is stored in the database 10, step S14 is to judge whether the algorithm is finished, if so, the algorithm is terminated, otherwise, the step S11 is returned.

Specifically, referring to fig. 2, a detailed description is given of an example flow of step S11. The preprocessing includes three steps, step S1101 is to digitally sample the sound, step S1102 is to band-pass filter the sampled data, and step S1103 is to slice the filtered data.

Specifically, for step S1101, at the time of sampling, the sampling frequency is higher than 4 times the expected target sound frequency.

For step S1102, during filtering, the minimum rotor speed required for takeoff of the helicopter is used as a lower boundary, and the maximum rotor speed during stable flight of the helicopter is used as an upper boundary;

for step S1103, the slicing is to divide the infinitely long sound signal into the sound signal segments overlapping each other, and the length of the overlapping is not less than the length of a single sound signal segment in time

Preferably, in slicing, the length of the overlap is the length of a single sound signal segment

In step S12, the preprocessed data is subjected to time-frequency analysis to obtain a time-frequency image. The time-frequency analysis method can be selected in various ways, but Mel frequency cepstrum transformation is not selected.

Preferably, the time-frequency analysis method selects short-time Fourier transform;

less preferably, the time-frequency analysis method selects the Choi-Williams transform.

Specifically, the time-frequency image is a floating-point matrix after time-frequency transformation, or a single-channel JPEG image converted after the floating-point matrix is processed.

An embodiment conversion method is

Where v is a locus value of the floating-point matrix, v_maxIs the largest number in the floating-point matrix, v_minIs the smallest value in the floating-point matrix and int (·) is the integer arithmetic.

For the time-frequency image obtained in step S12, detection processing is performed in step S13, and the detection result is written into the database 10.

Referring to fig. 3, a detailed description is given of an exemplary flow of step S13. The detection network comprises three parts, step S1301 utilizes the convolutional neural network to extract time-frequency characteristics, step S1302 rearranges and arranges the time-frequency characteristics extracted by the convolutional neural network, and step S1303 sends the rearranged time-frequency characteristics to the convolutional neural network to obtain a detection result, and the detection result is written into the database 10.

Specifically, for step S1301, referring to fig. 4, an embodiment of extracting a time-frequency feature from a convolutional neural network is described. The convolutional neural network processing flow is alternately constituted by a plurality of convolutional blocks S130101 and pooling S130102.

In one aspect, the convolution block S130101 is comprised of a variety of different convolution kernels.

On the other hand, as the processing depth increases, the number of channels of the convolution block S130101 increases by a multiple.

One embodiment of the convolution block S130101 is a convolution kernel of size 3 × 3 with a step size of 1 and a channel number of 30;

another implementation manner of the convolution block S130101 is formed by successively concatenating convolution kernels having a size of 3 × 1 and convolution kernels having a size of 1 × 3, where the step size of each convolution kernel is 1, and the number of channels is 60;

the pooling S130102 pools only the frequency domain dimension, the time dimension being unchanged. Or the pooling step of the time dimension is smaller than the frequency domain dimension.

Preferably, the pooling size is 3 × 3, the frequency-domain pooling step is 2, and the time-dimension pooling step is 1.

The time-frequency characteristics obtained after convolution have three dimensions, wherein the dimensions are f multiplied by t multiplied by c, wherein f is the frequency domain dimension, t is the time dimension, and c is the channel dimension.

Next, in step S1302, the time-frequency characteristics are rearranged. Converting a two-dimensional matrix with the dimension of f multiplied by c at the same time point into a one-dimensional vector with the dimension of c multiplied by f. The time-frequency feature dimension becomes the dimension (c × f) × t, where c × f is the frequency dimension and t is the time dimension.

And (5) sending the time-frequency characteristics reformed in the step (S1302) to a step (S1303) and detecting the target sound by using a recurrent neural network.

Specifically, referring to fig. 5, a flow of an embodiment of extracting time-frequency features from a recurrent neural network is described. The recurrent neural network processing flow is constituted by a plurality of recurrent units S130301 and an output layer S130302.

The loop unit S130301 has an output for each time instance of the input, the size of the output vector being between 1/4 and 1/2 of the input dimension.

One embodiment of the loop unit S130301 is a gated loop unit, with an output at each time instant, the size of the output vector being the input dimension 1/2.

The output layer S130302 is a fully connected layer, and the output is a two-dimensional vector encoded in one-hot form, representing the existence target probability and the nonexistence probability. Wherein the sum of the presence probability and the absence probability is 1. When the target probability is greater than the threshold value, indicating that a flying helicopter exists, otherwise, the target probability does not exist.

When the detection network is trained, a common random gradient descent method is adopted to train the convolutional neural network and the cyclic neural network simultaneously.

The above examples are intended to illustrate but not to limit the technical solutions of the present invention. Any modification and replacement without departing from the spirit and scope of the present invention should be covered in the claims of the present invention.

Claims

1. A voice detection method, said voice detection method characterized by: the detection of the sound signal of the helicopter rotor applied to smooth flight comprises the following steps: preprocessing, time-frequency analysis and network detection;

the preprocessing comprises sampling continuous sound signals, carrying out band-pass filtering, and decomposing the sound signals into mutually overlapped sound signal segments;

the time-frequency analysis is to analyze the sound signal by using a time-frequency analysis method and convert the sound signal into a time-frequency image;

the detection network detects whether the target sound exists in the time-frequency image by combining a convolutional neural network and a cyclic neural network, wherein the convolutional neural network extracts features, and the cyclic neural network detects according to the extracted features.

2. The sound detection algorithm of claim 1,

helicopter rotor sound signal, when steady flight, the rotatory noise of periodic vibration is the target sound source, and the additional ambient noise on the transmission path waits interference noise, and the energy of target sound is less than the interference sound energy.

3. The sound detection algorithm of claim 1,

sampling the sound signal at a frequency which is not less than 4 times of the periodic frequency range of the target to be detected;

the low-pass filtering takes the minimum rotating speed of a rotor required by the takeoff of the helicopter as a lower boundary and takes the maximum rotating speed of the rotor when the helicopter flies stably as an upper boundary;

the mutually overlapped sound signal segments are not less than half of the length of a single sound signal segment in time.

4. The sound detection algorithm of claim 1,

the time-frequency analysis method does not use Mel cepstrum frequency transformation, and can use short-time Fourier transformation, Wigner-Ville transformation and Choi-Williams transformation;

the time-frequency image, the floating point matrix after time-frequency transformation, or the single-channel JPEG image after the floating point matrix processing.

5. The sound detection algorithm of claim 1,

the detection method comprises the steps of firstly extracting features by using a convolutional neural network, inputting the features into the convolutional neural network for detection processing, and outputting a final detection result.

6. Sound detection algorithm according to claims 1 and 5,

in the detection method, during training, the convolutional neural network and the cyclic neural network are synchronously trained;

the detection method is characterized in that the characteristics extracted by the convolutional neural network are formed by vectors arranged in time sequence, and then all the vectors are sent to the cyclic neural network in time sequence;

and the output of the detection result is presented in a probability form of one-hot coding.