CN113782054A

CN113782054A - Method and system for automatically identifying lightning whistle sound waves based on intelligent voice technology

Info

Publication number: CN113782054A
Application number: CN202111109574.9A
Authority: CN
Inventors: 王桥; 袁静
Original assignee: National Institute of Natural Hazards
Current assignee: National Institute of Natural Hazards
Priority date: 2021-09-22
Filing date: 2021-09-22
Publication date: 2021-12-10
Anticipated expiration: 2041-09-22
Also published as: CN113782054B

Abstract

The invention provides a method and a system for automatically identifying lightning whistle sound waves based on an intelligent voice technology. The method comprises the following steps: audio data are intercepted from original waveform data of an SCM load VLF wave band to form an audio data set; performing detrending processing on the audio data set to obtain the audio data set subjected to detrending processing; extracting MFCCs audio features of lightning whistle sound waves from the audio data set after trend removing processing; training an LSTM neural network classifier by using MFCCs audio features; the lightning whistle sound wave is identified using a trained LSTM neural network classifier. The method and the system for automatically identifying the small lightning sound waves based on the intelligent voice technology can enable the automatic identification algorithm of the lightning whistle sound waves to be suitable for satellite-borne application.

Description

Method and system for automatically identifying lightning whistle sound waves based on intelligent voice technology

Technical Field

The invention relates to the technical field of remote sensing and telemetering, in particular to a method and a system for automatically identifying lightning whistle sound waves based on an intelligent voice technology.

Background

Lightning is a natural disaster with high frequency of events, occurring on average about 44 times per second worldwide, totaling about 14 hundred million times a year (Christian and Hugh, 2003). The lightning produces electromagnetic pulses with wide frequency band, which can be transmitted to the ionized layer and excite the electromagnetic whistle waves. The whistle sound wave is a dispersive state of an L shape with descending frequency along with time in an electromagnetic field time-frequency diagram recorded by a satellite due to the fact that the electromagnetic wave has phase velocity difference between high-frequency and low-frequency components in the propagation process, the high-frequency phase velocity generally reaches the height of the satellite in advance, and the low-frequency phase velocity generally reaches the satellite in a later mode (Barkhausen, 1930; Storey, 1953; Helliwell, 1965). A typical lightning whistle sound wave recorded by the Zhangheng first satellite of the first electromagnetic monitoring test satellite in China is shown in figure 1. Dispersion becomes greater when the lightning whistle sound wave travels a longer path, has a higher electron density, or has a stronger magnetic field strength (Carpenter and Anderson, 1992). The form of lightning whistle sound waves carries a great deal of spatial environment information, and is widely used in spatial environment monitoring and is an important research means for uncovering the coupling mechanism of the ring layer (Chen et al, 2017; Carpenter and Anderson, 1992; Singh et al, 2018; Oike et al, 2014; Bayuptat et al, 2012; Clilverd and Mark, 2002; Kishore et al, 2016; Z hlava et al, 2018; Clilverd et al, 2002; Parrot et al, 2019; Horne et al, 2013; Rodong, 2017).

In 2018, in 2 months, the first ZHeng I (ZH-1) of the electromagnetic satellite in China successfully emits, and has the capability of observing the sound waves of the lightning whistle at the space base. The ZH-1 satellite covers 65 degrees of south latitude and north latitude, and the detailed inspection mode observation is carried out in the mainland of China, the 1000km area around the mainland of China and two seismic zones (the Pacific seismic zone and the Eurasia seismic zone) all over the world, and the inspection mode is carried out in other areas. ZH-1 satellites have an in-orbit altitude of about 507km and are located near the ionosphere top and the plasma boundary, which is a region rich in ELF/VLF band fluctuation events such as lightning whistle waves, quasi-periodic radiation, etc. (Zhima et al, 2020). The inclination angle of the ZH-1 orbit is 97.4 degrees, the ZH-1 orbit belongs to a sun synchronous orbit, and the time of the intersection point descending is 2:00 in the afternoon; the orbit regression cycle is 5 days, namely the loci of the points under the stars are the same every 5 days; observations at a global spatial resolution of about 500km can be achieved within one regression cycle (Yuan Shinn et al, 2018). The satellite flies around the earth for a circle of about 94 minutes, most of the load starts to work within the latitude range of +/-65 degrees, the observation data are respectively stored according to the ascending orbit (night) and the descending orbit (day), and each half orbit (ascending/descending orbit) is observed for about 34 minutes; the spatial resolution of the adjacent ascending rail (or descending rail) in the same day is about 2000 km. The loaded induction magnetometer load (SCM) obtains the induction magnetic field data of an ionized layer through Faraday's law of electromagnetic induction, can capture global lightning whistle sound wave signals, and only obtains power spectrum data in a patrol mode. Up to now, ZH-1 has been observed on the orbit for more than 3 years, and collects a large amount of waveform and power spectrum data of global electromagnetic field, wherein 3 component X/Y/Z of SCM comprises 3 frequency bands ULF/ELF/VLF, frequency point ranges ULF:10Hz-200Hz, ELF:200Hz-2.2kHz, VLF:12.5Hz-20kHz, sampling rate of raw data is 51.2kHz, frequency point interval ULF of power spectrum data: 0.25Hz, ELF:2.5Hz, VLF: at 12.5Hz, the detailed view mode VLF waveform data 80ms contains 4096 points (wangtal., 2018; aster process, 2018), yielding a data volume of approximately 10G per day.

At present, space physics research based on lightning whistle sound waves mainly carries out deep analysis and inversion on related parameters of space physics environment of a single lightning whistle sound wave event recorded by a satellite, however, the lightning whistle sound wave event is usually submerged in massive electromagnetic field data observed by the satellite, the efficiency is low by completely depending on manual identification, the difficulty is high, and the research on the global space-time distribution rule and related parameters of the lightning whistle sound waves is very little.

In 2008, scholars at home and abroad begin to overcome difficulties by means of artificial intelligence technology, and a lightning whistle sound wave image automatic identification algorithm is developed initially. The current flow of the lightning whistle sound wave recognition algorithm is that firstly, band-pass filtering is carried out on observed waveform data, then, the waveform data are converted into a time-frequency diagram by utilizing fast Fourier transform, and finally, the L dispersion form in the time-frequency diagram is automatically recognized by means of machine learning or computer vision technology and the like. For example, Lichtenberger et al (2008) propose that the dependence on manual processing of a large amount of ground Very Low Frequency (VLF) observation data leads to huge technical bottlenecks for related researches when inverting electron density, and propose a method for automatically detecting a lightning whistle sound wave based on a sliding template matching technology, wherein a template of the method is manufactured to conform to the shape of the lightning whistle sound wave proposed by Bernard (Bernard, 1973). The algorithm is already applied to data processing of VLF ground observation stations of Marion and SANAE in a large scale, and has the defects that interference phenomena caused by lightning pulses, power line harmonics, artificial emission sources and the like need to be removed from a time-frequency diagram in advance, and the algorithm has high false alarm rate and undetected rate (Lichtenberger et al 2008). Zhou et al (2020) propose an automatic identification algorithm that is simple and fast by setting energy spectrum threshold and time width threshold for the lightning creak (tweek) phenomenon hidden in wuhan VLF ground observation data. However, the method is not suitable for identifying the lightning whistle sound wave from Zhang Heng satellite data, and because the Zhang Heng satellite electromagnetic field observation load has the characteristic of high sensitivity, the energy spectrum intensity of background noise of a time-frequency diagram is not greatly different from the energy spectrum intensity of a lightning whistle sound wave track, and the lightning whistle sound wave is difficult to be roughly positioned in a mode of setting an energy spectrum threshold value. The VLF electric wave research group (2009) of Stanford VLFgroup university initially carries out automatic identification on the phenomenon of the lightning whistle sound wave of an electromagnetic satellite, namely, a time-frequency image of a fixed time window is intercepted, a feature extraction function is completed by combining computer vision technologies such as denoising processing, grid division and average amplitude value calculation, and finally, the classification strategy of template matching is adopted to realize the rough positioning of the lightning whistle sound wave. Dhama et al (2014) think that the performance of the characteristic is limited by the number of grid divisions, and propose a rough positioning method of the lightning whistle sound wave based on connected domain analysis by utilizing the characteristics that the color change of the lightning whistle sound wave region is small and the lightning whistle sound wave region has obvious connection. The method has the defects that the feature robustness is not high, and the algorithm effect is greatly influenced by background noise. Oike et al (2014) and Fiser et al (2010) respectively manufacture lightning whistle sound wave templates in the daytime and at night by using observation data based on an Eckersley formula (Eckersley,1935), and then complete the lightning whistle sound wave identification and coarse positioning by adopting a template matching strategy of cross-correlation entropy. (Ahmad et al.2008) considers that the lightning whistle sound wave template is not in accordance with the actual situation, a feature extraction method for expressing various dispersion forms is provided by means of computer vision technologies such as edge extraction, and finally recognition is completed by using a classification algorithm based on decision tree rules.

In view of the huge breakthroughs obtained by the deep neural network in the aspects of extracting image features, fitting nonlinear functions and the like (LeCun, et al, 2015; Liu et al, 2018), Konan et al (2020) proposes two lightning whistle sound wave coarse positioning algorithms based on the deep neural network: based on a Sliding Deep Convolutional Neural Network (SDNN) algorithm and based on YOLOv3(You Only Look Once version 3rd) Neural Network algorithm. The SDNN neural network consists essentially of two parts: 3 convolutional layers and 2 sorting layers. The implementation process of the algorithm comprises the following steps: the time-frequency diagram with a certain fixed time width is intercepted, the convolution layer is utilized to extract image features, and then the classification layer is utilized to carry out identification, so that the rough positioning of the lightning whistle sound waves is realized (Konan et al, 2020). The algorithm has the functions of extracting high-robustness features and classifying with strong generalization capability, but the omission ratio is high because the positioning strategy based on fixed time width easily omits the lightning whistle sound waves of other time widths (Konan et al, 2020; Yuanjing et al, 2021). The lightning whistle sound wave detection algorithm of the YOLOv3 neural network comprises two main components: YOLOv3 host network and Non-Maximum Suppression (NMS) algorithm. The Yolov3 main network is mainly composed of 75 convolutional layers without full connection layers and is suitable for input images with any size; there is no pooling layer, and the scale invariant features can be transferred to the next layer; the difficulty of learning robust features is greatly reduced by adopting a residual structure; constructing a mathematical model of target positioning by using the context image information; outputting a plurality of prediction frames for rough positioning through the main network in the process, and finally filtering and optimizing the prediction frames output by the main network by adopting an NMS algorithm, thereby realizing the rough positioning of the sound wave area of the lightning whistle; the above advantages enable the YOLOv3 deep neural network to have higher precision, higher speed and higher efficiency than other coarse positioning algorithms (Konan et al, 2020; yuan-equi, 2021), but require high performance GPU devices to be configured and consume up to 233M of memory resources.

In a word, the mainstream lightning whistle sound wave recognition algorithm at present needs to convert original waveform data into a time-frequency graph, has strict requirements on computing power and storage equipment, is suitable for offline data, and cannot be directly applied to satellite borne.

Disclosure of Invention

The invention aims to solve the technical problem of providing a method and a system for automatically identifying lightning whistle sound waves based on an intelligent voice technology, so that an automatic identification algorithm of the lightning whistle sound waves is suitable for satellite-borne application.

In order to solve the technical problem, the invention provides a lightning whistle sound wave automatic identification method based on an intelligent voice technology, which comprises the following steps: audio data are intercepted from original waveform data of an SCM load VLF wave band to form an audio data set; extracting the MFCCs audio frequency characteristics of the lightning whistle sound waves from the audio data set; training an LSTM neural network classifier by using MFCCs audio features; the lightning whistle sound wave is identified using a trained LSTM neural network classifier.

In some embodiments, further comprising: after audio data are intercepted from original waveform data of an SCM load VLF wave band to form an audio data set, before MFCCs audio features of lightning whistle sound waves are extracted from the audio data set, the audio data set is subjected to detrending processing, and the audio data set subjected to detrending processing is obtained.

In some embodiments, audio data is truncated from raw waveform data in the SCM payload VLF band to form an audio data set, comprising: intercepting data from original waveform data by a time sliding window of 0.16s, wherein the data contains 8192 points and converting the data into an audio clip; then, performing repeated short-time Fourier transform on the intercepted data to obtain a time-frequency graph of the intercepted data; then, manually marking according to whether the time-frequency diagram has L dispersion morphological characteristics; finally the 10200-segment audio data set is obtained.

In some embodiments, the 10200 segment audio data set comprises: a lightning whistle sound wave data 5100 segment, and a non-lightning whistle sound wave data 5100 segment.

In some embodiments, detrending the audio data set to obtain the audio data set after detrending comprises: the detrending process is performed according to the following formula:

the original signal is the detrended signal.

In some embodiments, extracting MFCCs audio features of lightning whistle sound waves for the audio data set after the detrending process comprises: pre-emphasis, framing, windowing and fast Fourier transform are carried out on the audio data set subjected to trend removing processing, and a power spectrum of the voice signal is obtained; passing the energy spectrum through a group of Mel-scale triangular filter banks to obtain an energy coefficient; calculating logarithmic energy of the energy coefficient, and substituting the logarithmic energy into discrete cosine transform to obtain an L-order MFCC parameter; and carrying out dynamic difference on the MFCC parameters of the L order to obtain MFCC energy parameters expressed by the two-dimensional tensor.

In some embodiments, training the LSTM neural network classifier using MFCCs audio features includes: and training the LSTM neural network classifier by using the original data, the original data detrending, the original data and the MFCCs characteristic or the original data detrending and the MFCCs characteristic as the MFCCs audio frequency characteristic.

In addition, the invention also provides a lightning whistle sound wave automatic identification system based on the intelligent voice technology, and the system comprises: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the intelligent speech technology based lightning Whistle Sound wave automatic identification method according to the foregoing.

After adopting such design, the invention has at least the following advantages:

the study first applied mel-frequency cepstrum coefficients (MFCCs) and long-and-short neural networks (LSTM) to automatic identification of lightning whistle sound waves. The MFCCs takes human auditory features into account, maps linear spectra into Mel nonlinear spectra based on auditory perception, and then converts the Mel nonlinear spectra to cepstrum, which is more consistent with the auditory characteristics of human ears (Davis et al, 1980). Since the lightning whistle sound wave can be heard by the human ear through the player, it means that MFCCs have a distinct advantage in extracting the auditory characteristics of the lightning whistle sound wave; the LSTM neural network incorporates time-dimensional information suitable for processing and predicting important events in waveform sequences (Hochreiter and Schmidhuber, 1997).

Drawings

The foregoing is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood, the present invention is further described in detail below with reference to the accompanying drawings and the detailed description.

FIG. 1 is a data processing flow diagram for automatically identifying ZH-1 satellites recording lightning whistle sound waves;

FIG. 2 is a schematic of the original waveform of the VLF magnetic field and the detrending process;

FIG. 3 is a schematic diagram of an MFCCs feature parameter extraction flow;

FIG. 4 is a graph of frequency versus linear frequency;

FIG. 5 is a schematic diagram of the structure of an LSTM cell;

FIG. 6A is a correctly identified lightning whistle sound wave;

FIG. 6B is an unrecognized lightning whistle sound wave;

FIG. 6C is a lightning whistle sound wave after the detrending process of FIG. 6B;

FIG. 7A is a correctly identified non-lightning whistle sound wave;

FIG. 7B is an unrecognized non-lightning whistle sound wave;

FIG. 7C is a non-lightning whistle sound wave after the detrending process of FIG. 7B;

FIG. 8 is a boxed graph of the results of the LSTM classifier;

FIG. 9 is a time trace of a waveform data feature;

FIG. 10 is an abstract feature of different classifiers;

fig. 11 is a time series of different LSTM network hidden features.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

The method takes magnetic field waveform data observed by Zhang Heng I satellite induction type Magnetometer) Search Coil Magnetometer, SCM) as a research object, provides a lightning whistle sound wave automatic identification algorithm based on an intelligent voice technology, and mainly comprises the following 2 aspects:

breaking the conventional lightning whistle sound wave research mainly based on visual analysis, and starting the precedent of analyzing the lightning whistle sound wave by hearing. The original observation data is analyzed from the intelligent voice perspective for the first time, and a lightning whistle sound wave voice data set is created. The original time sequence waveform data is taken as a research object, the sound wave shape data of the lightning whistle detected by the SCM can be played in a voice mode, and then the sound similar to the whistle can be clearly heard, which means that the frequency of the sound wave is just in the audible range of human ears (Wicks et al,2016), and the fact that the lightning whistle sound wave is reasonably and feasible to be analyzed from the intelligent voice angle is proved.

The ZH-1 satellite is the first earthquake electromagnetic monitoring test satellite in China, the revisit period is 5 days, about 15 orbits of observation are carried out every day (Shen et al, 2018a, b), an induction type magnetometer SCM carried by the satellite is used for observing a changing magnetic field ((Cao et al, 2018; Wang et al, 2018), the induction magnetic field data of an ionized layer is obtained through a Faraday electromagnetic induction law, the load has two working models, namely a patrol mode and a detailed survey mode, VLF waveform data 80ms in the daily production data about 10G detailed survey mode comprises 4096 points (Wang et al, 2018; Fan et al, 2018; etc.), and the automatic identification of lightning by whistle sound waves based on original waveforms is required to be realized in the face of the challenge of mass data.

The lightning whistle sound wave identification scheme based on the original waveform mainly comprises three parts, as shown in figure 1: data sorting, data preprocessing and a lightning whistle sound wave automatic identification algorithm based on intelligent voice.

(1) Data collation

Data collection herein is primarily from detailed data from the 8 month SCM load VLF band from ZH-1 satellites 2018. Firstly, intercepting data from original waveform data by a time sliding window of 0.16s, wherein the data contains 8192 points and is converted into an audio fragment; then, performing repeated short-time Fourier transform on the intercepted data to obtain a time-frequency graph of the intercepted data; then, manually marking according to whether the time-frequency diagram has L dispersion morphological characteristics; finally, a 10200 audio data set (section 5100 of lightning whistle sound wave data, section 5100 of non-lightning whistle sound wave data) is obtained. Note that the time-frequency diagram in this document is only for checking whether there is a lightning whistle sound wave and does not participate in the calculation of the recognition algorithm.

(2) Data pre-processing

In order to effectively avoid interference caused by noise and signal instability and enhance the waveform characteristics of the lightning whistle sound waves, firstly, trend removing processing is carried out on original waveform data, as shown in formula (1):

wherein s (n) is the original signal, and S (k) is the detrended signal. The results are shown in FIG. 2: FIG. 2(a) is an original waveform containing a lightning whistle sound wave; the result of trend processing is shown in fig. 2 (b); fig. 2(c) is waveform data without a lightning whistle sound wave, and the result of the detrending process is shown in fig. 2 (d).

(3) Lightning whistle sound wave MFCCs audio frequency feature extraction

Since the fizz of the lightning whistle sound can be clearly heard by the human ear, the MFCCs designed according to the auditory mechanism of the human ear can extract the sound characteristics of the lightning whistle sound, and the extraction process will be described in detail in the second section.

(4) LSTM neural network classifier.

The link mainly comprises two processes of training a neural network and applying the neural network. The training of the neural network refers to extracting MFCCs characteristics on a training sample set and training the LSTM neural network by using the characteristics; the application of the neural network refers to extracting the MFCCs from the test set, inputting the MFCCs into the trained LSTM network to obtain the final recognition result, and the implementation process thereof will be described in detail in the third section.

1 lightning whistle sound wave MFCCs audio frequency feature extraction algorithm

The MFCCs feature extraction process is shown in fig. 3, and mainly includes pre-weighting, framing and windowing, fast fourier transform, Mel-filter bank, logarithm operation, Discrete Cosine Transform (DCT), and dynamic difference.

1.1 Pre-emphasis, Framing, windowing, and fast Fourier transform

Pre-emphasis processing: the aim is to emphasize the high frequency part of the speech and increase the resolution of the high frequency part.

s(n)＝s_n-μs_n-1 (2)

In the formula s_nIs the original signal, s (n) is the processed signal, the value of the parameter μ is between 0.9 and 1.0, and μ is 0.97 since the SCM sampling rate is 51.2 kHz.

Framing treatment: first, N sampling points are grouped into an observation unit, which is called a frame. Here N is 512. In order to avoid excessive variation between two adjacent frames, an overlap region is provided between two adjacent frames, the overlap region includes M sampling points, and M is usually about 1/2 or 1/3 of N. The corresponding time lengths are:

512/51200×1000＝10ms (3)

windowing treatment: the window is 40ms long and 8ms shifted to window the signal to avoid short duration speech segment edges (Jibbs effect). The windowing is defined as follows:

s_ω(n)＝s(n)×ω(n) (4)

where ω (n) is a window function, s_ω(n) is the windowed signal, where Hamming window is selected for windowing, and ω is defined as shown in equation (5):

different values will result in different hamming windows, with 0.46 being chosen by default.

Fast Fourier transform: since the characteristics of the signal are usually difficult to see by the transformation of the signal in the time domain, it is usually observed by transforming it into an energy distribution in the frequency domain, and different energy distributions represent the characteristics of different voices. Performing fast fourier transform on each frame signal subjected to framing and windowing to obtain a frequency spectrum of each frame, and performing modulo square on the frequency spectrum of the speech signal to obtain a power spectrum of the speech signal, as shown in formula (6):

wherein s is_ω(N) is the windowed signal, X (k) is the signal obtained after fast Fourier transform, and N represents the number of Fourier transform points.

1.2Mel Filter Bank

The energy spectrum is passed through a set of Mel-scale triangular filter banks using a triangular filter with M filters centered at f (M). M is the number of filters, and M is selected as a default value of 26, H_m(k) Representing the energy spectrum weight. Wherein H_m(k)：

Wherein f (m) satisfies:

2Mel(f(m))＝Mel(f(m-1))+Mel(f(m+1)) (8)

the extracted cepstrum parameters from the scaled frequency domain have a non-linear correspondence with frequency, as shown in fig. 4, and can be approximated by equation (9):

wherein f is the frequency.

1.3 logarithmic operation, discrete cosine transform DCT

And (3) logarithmic operation: and filtering the spectral coefficients X (k) obtained by FFT by using a sequential triangular filter to obtain a group of energy coefficients m1, m2 and m3 … …. The span of every third filter in the filter bank is equal on the Mel scale. All filters cover a range from 0Hz to half the sampling frequency as a whole, and the formula for calculating the energy coefficient s (m) is as follows:

wherein X (k) is a signal obtained after fast Fourier transform, H_m(k) Represents the energy spectrum weight, and M is the number of filters.

Calculating the logarithmic energy of the filter bank output energy coefficient, wherein the formula is as follows:

s′(m)＝lns(m),0≤m≤M (11)

where s (m) is an energy coefficient and s' (m) is a logarithmic energy coefficient.

Discrete cosine transform: the goal is to remove the correlation between the signals in each dimension and map the signals into a low dimensional space. And substituting the logarithmic energy into discrete cosine transform to obtain the Mel-scale Cepstrum parameter of L order.

Where c (n) is a cepstral coefficient, L is an order that is generally selected from 8-13 in MFCC, where L is selected to be 13.

1.4 dynamic Difference

The standard cepstral parameters MFCC only reflect the static characteristics of the speech parameters, and the dynamic characteristics of speech can be described by the differential spectrum of these static characteristics. Therefore, the recognition performance of the system can be effectively improved by combining the dynamic and static characteristics of the voice. The calculation formula of the difference parameter is as follows:

wherein d is_tDenotes the t-th first order difference, C_tDenotes the t-th cepstral coefficient, L denotes the order of the cepstral coefficient, K denotes the first derivativeThe time difference of (3) may be 1 or 2.

Finally c (n), d_t(K-1) and d_t(K ═ 2) stitching results in a 16 × 39 two-dimensional tensor, each row of which represents the energy value of a frame. The frame energy consists of 39-dimensional MFCC parameters (13-dimensional MFCC cepstrum coefficients + 13-dimensional first-order difference parameters + 13-dimensional second-order difference parameters). The MFCCs characteristics of the sub-graphs in FIG. 2 are extracted respectively according to the method described above, and the sub-graphs are drawn into a frame energy graph. The abscissa represents the number of MFCC cepstrums and the ordinate represents time. The observation of the frame energy diagram shows that: the lightning whistle sound wave and the non-lightning whistle sound wave have strong discriminative power on the MFCCs, such as the third column (the dashed rectangle area) of the MFCCs feature map has a significant difference.

2LSTM neural network classifier algorithm

The method uses data which are composed of 51200 data point samples per second, a sequence which is formed by time sequence exists between every two data points, when a lightning whistle sound wave occurs, the information of the sequence is gradually and suddenly changed into violent, and finally gradually tends to be gentle. Given the ability of LSTM networks to model time series information, the LSTM networks are used herein to classify and model MFCCs characteristics of lightning whistle waves. The basic structure of the LSTM unit is shown in fig. 7, and the LSTM unit includes: forgetting gate f, input gate i and output gate o (Hochreiter and Schmidhuber, 1997).

Forget the door: can decide which information should be discarded or kept in the information output by the previous LSTM unit, and the forgetting gate is defined as follows:

f_t＝δ(W_t·[h_t-1，x_t]+b_f) (14)

where σ denotes the Sigmoid function value, W_tRepresents a weight matrix, h_t-1Representing the output, x, of the upper layer of the LSTM neural network_tRepresenting an input, b_fIndicating the amount of offset. f. of_tThe value range of the element(s) is 0 to 1, which indicates the degree of forgetting, 0 indicates total forgetting, and 1 indicates total remembering.

An input gate: for updating the cell state, the input gates are defined as follows:

i_t＝δ(W_i·[h_t-1，x_t]+b_i) (15)

C_t＝tanh(W_c·[h_t-1，x_t]+b_c) (16)

wherein i_tRepresenting an input state quantity, C_tRepresents the pair i_tScreening. The forgetting gate and the input gate determine the state information of the current neural network layer, namely:

C_t＝f_i×C_t-1+i_t×C_t (17)

an output gate: the information to be output to the next LSTM cell, the output gate is defined as follows:

o_t＝δ(W_o·[h_t-1，x_t]+b_i) (18)

h_t＝o_t×tanh(C_t) (19)

wherein o is_tIs to hide the state information h_t-1And currently entered information x_tInputting the result into a Sigmoid function; the unit state C of the current time_tInput to the tanh function to obtain tanh (C)_t) Then adding tanh (C)_t) And o_tMultiplying to obtain the information h to be carried in the hidden state_t。

Some noise interference in the lightning whistle sound waves can be weakened through the forgetting gate, the input gate can select a part playing an important role in the historical signals, and finally the key information influencing classification is output by combining the current information and the historical information.

3. Experiments and analysis

3.1 Experimental flow and LSTM neural network model parameter setting

The experimental process comprises data sorting, MFCCs characteristic extraction, LSTM model training and index value evaluation, and 1000 repeated experiments are carried out. The detailed steps are as follows:

(1) data set: a sample set NWD containing 5100 lightning whistle wave waveform sample sets WD and 5100 non-lightning whistle wave.

(2) Training set: respectively randomly selecting 50% from the sample sets WD and NWD as training samples to construct a training set.

(3) And (3) test set: the remaining samples of the sample set WD and NWD are grouped into a test set.

(4) Feature extraction: extracting the audio features of the training set by four different feature extraction methods, wherein the four features are respectively as follows: original waveform data characteristics, represented by origin; performing detrending processing on the Original waveform data, and expressing the detrended features by origin _ Detrend; the characteristics of Original waveform data after being processed by MFCCs are expressed by origin _ MFCC; the Original waveform data is firstly subjected to detrending processing, and then the characteristics after the MFCCs processing are adopted and are represented by Original _ Detrend _ MFCC.

(5) Training process: and (4) respectively training the LSTM classification model by using the four characteristics mentioned in the step (4) to obtain four different LSTM classifiers.

(6) The testing process comprises the following steps: and (3) extracting features on the test set by adopting four different feature extraction modes in the step (4), inputting the four features into four different LSTM classification models respectively for recognition, and outputting a recognition result.

(7) Evaluation: four indexes are adopted for evaluating the recognition effect: precision (Precision), Recall (Recall), F1 value, and ROC area (AUC-ROC) (Yuanjing et al, 2021).

The hyper-parameters required for training the LSTM neural network by using different input features are also different, and for each feature classifier, the hyper-parameters of the LSTM neural network model are obtained by using a cross-folding method, as shown in table 1:

TABLE 1 parameters of LSTM neural network based on four different characteristics

TABLE 2 expression of the LSTM neural network on the training and test sets based on four different characteristics

On a training set and a test set of each experiment, four different feature extraction methods are used for extracting image features, four different classifiers are obtained through training, the performance of each classifier is evaluated by adopting precision, recall rate, F1 and AUC-ROC, and the detailed definitions of the four indexes refer to literature information of Yuanjing and the like (2021). Because the training set and the test set of each time are different, the effect of the lightning whistle sound wave recognition algorithm provided by the method is difficult to fully evaluate by using four evaluation indexes at a single time, and therefore, the experiment is carried out for 1000 times, and the following evaluation strategy is formulated on the basis of the four evaluation indexes:

(1) partial recognition result display

(2) And (3) evaluating the overall recognition accuracy: and (3) an evaluation strategy for averaging the evaluation indexes of 1000 experiments.

(3) Stability evaluation and difference evaluation: and evaluating the evaluation indexes of 1000 experiments by using a box diagram to evaluate the classification stability. In order to evaluate whether the different feature classifiers have obvious difference, a T test is adopted for difference evaluation. A threshold p of 0.05, i.e., less than 0.05, indicates a significant difference, and a threshold p of 0.05 indicates no significant difference.

3.2 partial recognition results presentation

The oscillogram of the partial recognition result and the corresponding time-frequency graph are plotted as shown in fig. 6 and 7, wherein the oscillogram is the recognition result, and the time-frequency graph is only used for visualizing whether the lightning whistle wave exists in the waveform. Fig. 6 is the recognition result of the lightning whistle sound wave, fig. 6a is the correctly recognized lightning whistle sound wave, fig. 6b is the unrecognized lightning whistle sound wave, and the reason why the recognition is unsuccessful is that: the trend characteristic of the lightning whistle sound wave is not obvious due to weak energy and strong background interference of the lightning whistle sound wave, and the trend of the whistle sound wave is submerged by the interference after the trend is removed, so that the false image that the lightning whistle sound wave does not exist is caused, as shown in fig. 6 (c). Fig. 7 is a partial recognition result of a non-lightning whistle wave, fig. 7(a) is a correctly recognized non-lightning whistle wave, and fig. 7(b) is a misrecognized non-lightning whistle wave, the reason for misrecognization being: the original waveform data has strong interference signals and trend characteristics similar to those of lightning whistle sound waves, for example, at the black frame of the time-frequency diagram of fig. 7(b), the result of detrending the corresponding waveform is shown in fig. 7 (c).

3.3 evaluation of Total recognition accuracy

1000 accuracy (Precision), Recall (Recall), an F1 value (F1socre), an ROC area (AUC-ROC) value, time consumption (Cost time) and memory consumption (Cost memory) are respectively obtained after 1000 experiments, and mean value calculation is respectively carried out on the obtained values to evaluate the whistle wave recognition effect based on the intelligent voice technology, as shown in Table 3.

TABLE 31000 average effect after experiment

In table 3, Original + LSTM indicates that an LSTM classifier is trained using an Original waveform, Original _ Detrend + LSTM indicates that the LSTM classifier is trained after a detrending process is performed on the Original waveform, Original _ MFCC + LSTM indicates that MFCCs features are extracted from the Original waveform, and Original _ Detrend _ MFCC + LSTM indicates that the MFCCs features are extracted after the detrending process is performed on the Original waveform, and finally the LSTM classifier is trained using the features. By observing table 3, it was found that: the recognition algorithm of the LSTM classifier (origin + LSTM) was trained directly with the raw waveform data with the least time and memory consumption, 2.08s and 82.790MB respectively, but the algorithm performed the worst on four criteria of classification accuracy, recall, F1 value and AUC-ROC. The recognition algorithm (origin _ Detrend _ MFCC + LSTM) proposed herein performs best on four metrics, up to 0.967, 0.842, 0.900 and 0.907, respectively, and reduces the amount of 0.16s audio data per segment from 8192 to 684(16 × 39) due to the MFCCs feature, making its time and memory consumption similar to origin + LSTM, up to 2.24s and 83.026 MB. The Original _ Detrend + LSTM algorithm uses a two-layer LSTM network for better classification results, resulting in more time and memory loss for the algorithm. It is worth noting that the currently best time-frequency diagram-based lightning whistle sound wave recognition algorithm adopts YOLOV3 deep convolutional neural network (Yuanjing et al, 2021), which consumes 6.71s of time cost on the CPU and 233MB of memory resources.

In conclusion, in the lightning whistle sound wave identification based on the original waveform, the classification effect of the lightning whistle sound wave identification algorithm combining the MFCCs audio feature extraction and the LSTM neural network technology is optimal. Compared with a time-frequency graph-based recognition algorithm, the time cost and the memory resource consumption are minimum.

3.4 evaluation of stability and Difference

This subsection will perform stability and differentiation evaluations on the classification effects of different LSTM classifiers.

(1) Evaluation of stability: for 1000 data of each index, a box plot was plotted, as shown in fig. 8. The 1000-group data distribution diagram of the recognition accuracy (Precision) of lightning whistle sound waves is shown in the Precision diagram of fig. 8: the horizontal axis is the different feature classifiers and the vertical axis is the accuracy. It can be found that the Precision box adopting the origin _ Detrend _ MFCC feature classifier has a lower height than the origin, origin _ Detrend and origin _ MFCC feature classifiers, which indicates that the feature classifier performs more stably on the Precision index; the position of the box body is higher than that of the Original, Original _ Detrend and Original _ MFCC feature classifiers, which shows that the feature classifiers perform better on Precision indexes. The same conclusions can be drawn from the above method by observing the boxes for Recall, F1score and AUC-ROC of FIG. 8. In summary, the classifier presented herein is optimal and most stable over four evaluation indices.

(2) And (3) difference evaluation: in order to test whether the performances of different classifiers have obvious differences, quantitative evaluation is carried out on the significance differences by adopting a T test method of two independent samples, the higher the significance P value is, the smaller the difference is, the generally adopted threshold value is 0.05, and the meaning is that if the difference is less than 0.05, the obvious differences are considered to exist; if the difference is more than 0.05, no obvious difference exists between the two experiments. The results are shown in Table 4.

TABLE 4T test

Observing that the value of the first row and second column in the Precision's T value check table of Table 4 is 0, it states: the method adopts the Original feature classifier and the Original _ detrrend feature classifier to have obvious difference on the precision index; continuing to observe and find that the precision of the Original _ Detrend _ MFCCs feature classifier is obviously different from the precision of the other two feature classifiers; but the T-test value of the same feature classifier on the precision index is 1, which indicates that the same feature classifier does not show significant difference on the precision index, for example, the value of the third row and the third column is 1. The same conclusion can be obtained by observing the T value test table for Recall, the T value test table for F1score and the T value test table for AUC-ROC in Table 4 in the above-mentioned manner. In conclusion, the recognition algorithm provided by the invention has obvious differences with other recognition algorithms in four evaluation indexes of precision, recall rate, F1 value and AUC-ROC, and the recognition effect is remarkably improved by the algorithm provided by the invention.

Discussion 4

The above experiments show that the lightning whistle sound wave automatic identification algorithm provided by the method has a certain effect. The original waveform feature extraction and the LSTM neural network in the algorithm scheme have very important influence on the automatic recognition result of the lightning whistle sound wave, and the influence generated by the automatic recognition result is deeply discussed and analyzed in this chapter.

4.1 extracting waveform features

In order to analyze the time traces of the four characteristics, the subsection randomly selects the audio data of 10 lightning whistle sound wave samples and 10 non-lightning whistle sound wave samples, and draws the time traces of the four waveform characteristics as shown in fig. 9.

Fig. 9(a) is a time series of original waveform data; FIG. 9(b) is a time series of detrending of raw waveform data; fig. 9(c) is a time series after MFCCs feature extraction on raw waveform data; performing MFCCs feature extraction on fig. 9(b) to obtain a 1639-dimensional feature map, and developing the third column features of the feature map in chronological order to obtain the result shown in fig. 9 (d); where W represents a lightning whistle sound wave waveform sample and non W represents a non-lightning whistle sound wave waveform sample. Observing fig. 9(a) reveals: the time track containing the original waveform of the lightning whistle sound wave and the track of the non-lightning whistle sound wave are mixed together, so that the classification difficulty is increased; after the trend removing processing is carried out on the original waveform data, the intra-class difference of the sample track containing the lightning whistle sound wave is reduced, as shown in fig. 9(b), but the classification difficulty is still higher; after the MFCCs features are extracted from the original waveform data, the sample tracks containing the lightning whistle waves have separability, but are partially mixed together and have large internal class differences, as shown in fig. 9 (c); the original waveform is subjected to the detrending processing and the MFCCs are extracted, and the time trace is drawn as shown in FIG. 9(c), and the following results are found: the time traces containing the lightning whistle waves and the traces of the non-lightning whistle waves have obvious separability.

4.2 Abstract mapping of LSTM neural networks

The hidden information characteristics of the last moment of the output gate of the LSTM neural network contain time series abstract characteristics, and the abstract characteristics comprise historical information, trend information and the like of the time series, and have a key influence on the final classification result. This subsection randomly selects 200 lightning Whistle sonic waveform samples (WD) and 200 non-lightning Whistle sonic waveform samples (NWD) in the test set, and plots these samples into FIG. 10 by the abstract features of the LSTM classifier for four different features, where W represents the lightning Whistle sonic waveform samples and NonW represents the non-lightning Whistle sonic waveform samples.

Based on these sample data, the intra-class and inter-class differences of the abstract features of different LSTM networks were calculated to obtain the results shown in table 5.

TABLE 5 four classifier Abstract feature Difference

By observing fig. 10 and table 5, it can be found that: the LSTM classifier based on the MFCCs has the abstract characteristics that the intraclass difference of lightning whistle waves is 0.06609, and the intraclass difference of non-lightning whistle waves is 0.00024, which shows that the similar samples are strong in aggregation; meanwhile, the difference between the lightning whistle sound wave and the non-lightning whistle sound wave is 0.26357, which shows that the difference between different classes is strong. In summary, the algorithm provided herein has the characteristics of small intra-class difference and large inter-class difference, which means that the algorithm is easier to implement accurate classification.

4.3 Effect of network architecture on LSTM Classification results

From the experimental results, it is found that different network structures have different effects on the classification effect of the LSTM neural network, and then, discussion will be made on two aspects of the performance and the abstract feature separability of the LSTM network.

(1) Performance evaluation

This subsection performs ten-fold cross-validation on LSTM networks of different network architectures and calculates cross-validation scores and time consumption required for training, with the results shown in table 6.

TABLE 6 LSTM network of different hyper-parameters

It was found by observing table 6 that the use of a single layer LSTM network with Dropout added (LSTM network C) scored higher in cross-validation than the other two LSTM networks to 0.947, with the minimum average time consumption of the classifier at 43.673 s. Due to the fact that the neural network generates an overfitting phenomenon, due to the fact that Dropout is added, a Vote function can be achieved, the co-adaptability among neurons can be reduced, and the accuracy and the generalization capability of the network can be improved.

(2) Feature separable angle analysis

In order to qualitatively compare whether the abstract features of the LSTM networks under different hyper-parameters have separability, the section inputs 60 lightning whistle waves and 60 non-lightning whistle waves into the LSTM networks of different hyper-parameters respectively to extract hidden information features, and draws the time-varying tracks thereof as shown in fig. 11. Where W represents lightning whistle waveform sample data and non W represents non-lightning whistle waveform sample data.

The time trace of FIG. 11(a) is from a two-layer LSTM network with an added deactivation layer; the time trace of fig. 11(b) is from a single layer LSTM network with the addition of an inactivation layer; the time trace of fig. 11(c) is from a single layer LSTM network with the deactivation layer removed. Observing fig. 11, it is found that there is overlap in the characteristics of W and non W of LSTM network-a, and the overall discrimination is not very high; the W and non W feature discrimination of the LSTM network-B is relatively high, and the condition of feature interleaving of different classes still exists; in contrast, the features of W and NonW of the LSTM network-C are distributed mainly in two different regions with less feature interleaving. The phenomenon shows that the LSTM network-C can improve the discrimination of lightning whistle sound waves and non-lightning whistle sound waves and has strong lightning whistle sound wave identification capability.

During ZH-1 satellite operation, its onboard SCM produces a data volume of about 10GB per day, the vast majority of which is blank data without the presence of lightning whistle waves. How to realize satellite-borne real-time recognition of lightning whistle sound waves, return more meaningful data and reduce storage pressure becomes important. The automatic recognition of the lightning whistle sound wave based on the voice recognition technology has become possible in view of the frequency range of the lightning whistle sound wave within the human ear hearing range.

The exploration and research of an automatic identification algorithm of satellite-borne lightning whistle sound waves are carried out on SCM data of a ZH-1 satellite. According to the characteristic that the lightning whistle sound wave can be heard by human ears, an MFCCs characteristic extraction mode is adopted to enhance the auditory characteristic of the lightning whistle sound wave, a shallow layer long-term memory (LSTM) recurrent neural network is adopted to classify the characteristic, the classification result is higher than 90% in precision, F1socre and AUC value indexes, and meanwhile, the method is compared with a lightning whistle sound wave detection algorithm based on a YOLOv3 neural network to find that: for each 0.16s of raw data, 150.11MB of storage space and 4430ms of time consumption can be saved, the possibility of satellite-borne lightning whistle sound wave identification is increased, but the accuracy rate is reduced by only 3.24% compared with YOLOv3 due to the reduction of the storage space and the time consumption. The MFCCs are designed by simulating the auditory characteristics of human ears, but due to the nonlinear correspondence of Hz-Mel frequencies, the filters used in the low-frequency region are more in number and densely distributed, while the filters used in the medium-high region are less and sparsely distributed. So that the accuracy of the calculation of MFCCs decreases as the frequency increases. Lightning produces strong broadband radio waves, especially in the Very Low Frequency (VLF) band of 300Hz to 20kHz, and therefore the MFCCs characteristics are further improved by adding medium-high frequency filters, thereby improving the calculation accuracy of the high frequency part.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the present invention in any way, and it will be apparent to those skilled in the art that the above description of the present invention can be applied to various modifications, equivalent variations or modifications without departing from the spirit and scope of the present invention.

Claims

1. A lightning whistle sound wave automatic identification method based on intelligent voice technology is characterized by comprising the following steps:

audio data are intercepted from original waveform data of an SCM load VLF wave band to form an audio data set;

extracting the MFCCs audio frequency characteristics of the lightning whistle sound waves from the audio data set;

training an LSTM neural network classifier by using MFCCs audio features;

the lightning whistle sound wave is identified using a trained LSTM neural network classifier.

2. A lightning whistle sound wave automatic identification method based on intelligent voice technology according to claim 1, characterized by further comprising:

after audio data are intercepted from original waveform data of an SCM load VLF wave band to form an audio data set, before MFCCs audio features of lightning whistle sound waves are extracted from the audio data set, the audio data set is subjected to detrending processing, and the audio data set subjected to detrending processing is obtained.

3. A lightning whistle sound wave automatic identification method based on intelligent voice technology according to claim 1 or 2, characterized in that audio data is intercepted from original waveform data of SCM load VLF band to form audio data set, including:

intercepting data from original waveform data by a time sliding window of 0.16s, wherein the data contains 8192 points and converting the data into an audio clip;

then, performing repeated short-time Fourier transform on the intercepted data to obtain a time-frequency graph of the intercepted data;

then, manually marking according to whether the time-frequency diagram has L dispersion morphological characteristics;

finally the 10200-segment audio data set is obtained.

4. A smart speech technology based lightning whistle sound wave automatic identification method according to claim 3 characterised in that the 10200 section audio data set comprises: a lightning whistle sound wave data 5100 segment, and a non-lightning whistle sound wave data 5100 segment.

5. A lightning whistle sound wave automatic identification method based on intelligent voice technology according to claim 1 or 2, characterized in that, the audio data set is subjected to detrending processing, and the audio data set after the detrending processing is obtained, the method comprises the following steps:

the detrending process is performed according to the following formula:

wherein s (n) is the original signal, and S (k) is the detrended signal.

6. A lightning whistle sound wave automatic identification method based on intelligent speech technology according to claim 1 or 2, characterized in that extracting MFCCs audio features of lightning whistle sound waves from audio data sets after de-trending comprises:

pre-emphasis, framing, windowing and fast Fourier transform are carried out on the audio data set subjected to trend removing processing, and a power spectrum of the voice signal is obtained;

passing the energy spectrum through a group of Mel-scale triangular filter banks to obtain an energy coefficient;

calculating logarithmic energy of the energy coefficient, and substituting the logarithmic energy into discrete cosine transform to obtain an L-order MFCC parameter;

and carrying out dynamic difference on the MFCC parameters of the L order to obtain MFCC energy parameters expressed by the two-dimensional tensor.

7. A method for automatic identification of smart voice technology based lightning whistle sound waves of claim 2 wherein training the LSTM neural network classifier using MFCCs audio features includes:

and training the LSTM neural network classifier by using the original data, the original data detrending, the original data and the MFCCs characteristic or the original data detrending and the MFCCs characteristic as the MFCCs audio frequency characteristic.

8. The utility model provides a lightning whistle sound wave automatic identification system based on intelligence voice technology which characterized in that includes:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the intelligent speech technology based lightning whistle sound wave automatic recognition method according to any one of claims 1 to 7.