CN114553639B

CN114553639B - Morse signal detection and identification method

Info

Publication number: CN114553639B
Application number: CN202210157096.7A
Authority: CN
Inventors: 宿绍莹; 张毅; 鲍庆龙
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2022-02-21
Filing date: 2022-02-21
Publication date: 2024-02-27
Anticipated expiration: 2042-02-21
Also published as: CN114553639A

Abstract

The invention discloses a Morse signal detection and identification method, which comprises the steps of carrying out short-time Fourier transform on an audio time domain signal to obtain a first standard time-frequency image; performing energy sorting and feature extraction on the first standard time-frequency image, and determining the frequency point of the Morse signal; digitally filtering the audio time domain signal based on the frequency point of the Morse signal to obtain an interference-free signal, performing short-time Fourier transform to obtain a second standard time-frequency image, and obtaining a second effective time-frequency image; and carrying out image enhancement processing on the second effective time-frequency image, extracting a characteristic sequence, obtaining the characteristic sequence arranged according to the time domain, and then carrying out message prediction. The invention is applied to the field of signal detection and recognition, solves the contradiction problem of detection and recognition work in the aspect of resolution, effectively improves the recognition accuracy in the environment with low signal to noise ratio, and verifies the robustness and the practicability.

Description

Morse signal detection and identification method

Technical Field

The invention relates to the technical field of signal detection and identification, in particular to a Morse signal detection and identification method.

Background

The Morse signal is an aligned combination of dots, dashes and spaces, uniquely representing letters, numbers and punctuation marks. Because of simple coding, high transmission efficiency, low bandwidth cost and adaptability to severe emergency communication environments, the Morse telegraph is widely used in the fields of aviation, navigation, weather and the like. For a long time, the Morse telegraph is manually listened to, the accuracy is difficult to ensure under severe and complex communication environments, and the long-time work can have a plurality of negative effects on the physical and mental health of the crews. Therefore, realizing automatic reception of Morse telegrams has urgent application requirements.

Automatic reception of the Morse signal is divided into detection and identification. Detection is a precondition for identification, and is to determine whether a Morse signal exists in received wireless audio and obtain a time-frequency position of the Morse signal. The identification is to restore the detected Morse signal to the corresponding message information, which is the final target of automatic reception.

The Morse signal detection method mainly comprises a time domain method, a frequency domain method and a time-frequency domain method. Envelope detection is the earliest method, has strong real-time performance, does not need to know signal frequency, but has poor noise resistance. The phase-locked loop method can track signals on the premise of knowing the frequency point of the target signal, but is sensitive to interference. The Kalman filtering method and the self-adaptive Kalman filtering method can effectively remove interference in the time domain, but the frequency of a target signal is required to be known, and a certain time delay exists when the sampling points are more. The frequency domain analog filtering method and the frequency spectrum variance method detect the Morse signal based on the frequency domain, but it is difficult to distinguish the single frequency signal from the target signal. The time-frequency analysis method gives consideration to the time domain and frequency domain characteristics of signals, is similar to the manual detection method, adopts discrete Gabor transformation in the prior art to capture high-frequency telegraph signals on a time-frequency surface, and also proposes a three-dimensional spectrogram formed based on short-time Fourier transformation, and adopts a digital image processing method to detect the telegraph signals, thereby proving the feasibility and anti-interference performance. Currently, there are three time-frequency analysis and detection methods with good effects. The two methods extract the fragments with signals in the time-frequency diagram through energy sorting, and then design a machine learning classifier or a deep neural network classifier to detect the Morse signals. In another method, target detection based on deep learning is directly performed on an audio data time-frequency chart. The channel environments of the methods are all broadband, and various radio signals are far apart. However, when the target Morse signal is destructively interfered, a large amount of interference signals exist in a narrow-band environment, and the problems of extremely low signal-to-noise ratio and adjacent frequency interference are faced, so that the time resolution of the time-frequency diagram is sacrificed to obtain higher frequency resolution. In this case, the time domain features of the target signal are hardly seen in the time-frequency diagram, even the target signal is not seen, and the deep learning method is basically disabled.

The Morse signal recognition method can be classified into a conventional machine learning method and a deep learning method. The traditional machine learning method comprises the steps of firstly estimating the time domain duration of points, strokes and intervals, respectively classifying the time domain duration, and then performing table lookup decoding according to the arrangement and combination of the time domain duration and the time domain duration, and properly adding an error correction function. The calculation of the duration time, whether the waveform of the signal is directly tracked in the time domain or a time-frequency spectrogram is adopted, requires preprocessing work, and the recognition result is limited by the preprocessing effect. The most widely applied classification stage is a traditional machine learning model, such as a support vector machine, k-means clustering, fuzzy C-means clustering and the like, and the method only uses code length as a characteristic, has no context information, lacks robustness for obvious code length deviation, inevitably carries out additional post-processing, comprises table look-up and error correction, and increases the recognition time. Recently, the application of a deep learning method such as a classical speech recognition algorithm of a Hidden Markov Model (HMM) combined with a Deep Neural Network (DNN) and a classical text recognition network (CRNN) in Morse recognition avoids time consumption and error accumulation in stages, and the recognition performance is further improved. In particular, the CRNN identification network can achieve 90% of identification accuracy, but the network needs a large number of training samples, and the accuracy can be gradually reduced when the signal-to-noise ratio is below-10 db.

In addition, in the prior art, it is assumed that only a target Morse signal is extracted from the time band at the beginning of recognition, or that the Morse signal time-frequency diagram detected in the detection stage is directly used as the input of the recognition stage, and no processing is performed between the detection and the recognition. The recognition stage and the detection stage respectively have high requirements on time resolution and frequency resolution, and if the frequency band is recognized when the Morse signal output by the detection stage is directly applied, the recognition accuracy is seriously reduced, and the automatic receiving effect is affected.

Disclosure of Invention

Aiming at the problems that in the prior art, the automatic receiving effect of the Morse signal is poor, and the accuracy and the speed of detection and identification under the environments of adjacent channel interference and low signal to noise ratio cannot meet the practical requirements, the invention provides the Morse signal detection and identification method, which can effectively detect and identify the Morse signal under the complex channel environment, and the identification accuracy is superior to the current most advanced method while the timeliness is ensured.

In order to achieve the above object, the present invention provides a method for detecting and identifying a Morse signal, comprising the steps of:

step 1, obtaining an audio time domain signal;

step 2, performing short-time Fourier transform on the audio time domain signal to obtain a first standard time-frequency image with high frequency resolution;

Step 3, energy sorting and feature extraction are carried out on the first standard time-frequency image, a first effective time-frequency image and a feature vector are obtained, and frequency points of Morse signals in the first effective time-frequency image are determined based on the feature vector;

step 4, digital filtering is carried out on the audio time domain signal based on the frequency point of the Morse signal to obtain an interference-free signal;

step 5, carrying out short-time Fourier transform on the interference-free signal to obtain a second standard time-frequency image with high time resolution, and carrying out energy sorting on the second standard time-frequency image to obtain a second effective time-frequency image;

step 6, carrying out image enhancement processing on the second effective time-frequency image, and then extracting a characteristic sequence to obtain a characteristic sequence arranged according to a time domain;

and 7, carrying out message prediction based on the feature sequences arranged according to the time domain, and outputting.

In another embodiment, in step 2, the performing short-time fourier transform on the audio time domain signal specifically includes:

wherein X (m) is an audio time domain signal, X _n (e ^j2πf ) A time-frequency matrix which is an audio time-domain signal; omega (n-m) is a window function with gradually sliding high frequency resolution, namely short-time Fourier transformation parameters with high frequency resolution; j is an imaginary unit, f is frequency, m is time of an audio signal sequence, and n is time of a time-frequency matrix obtained by short-time Fourier transform;

Obtaining the square of a time-frequency matrix mode of the audio time domain signal to obtain a short-time Fourier spectrogram, namely a first standard time-frequency image, wherein the short-time Fourier spectrogram is as follows:

P _n (f)＝|X _n (e ^j2πf )| ²

wherein P is _n (f) Is a first standard time-frequency image.

In another embodiment, in step 3, the energy sorting and feature extraction are performed on the first standard time-frequency image to obtain a first effective time-frequency image and a feature vector, which specifically includes:

superposing time axis energy data of the same frequency point of the time-frequency matrix of the audio time domain signal to obtain a first relation curve of frequency and energy, wherein the first relation curve is as follows:

wherein F (F) is a first relationship, n ₀ For the time starting point of the time-frequency matrix, n _k Is the time end point of the time-frequency matrix;

constructing a first self-adaptive positioning threshold based on the mean value and the standard deviation of the first relation curve, wherein the first self-adaptive positioning threshold is as follows:

T＝μ+Cσ

wherein f is a first self-adaptive positioning threshold, mu is the mean value of a first relation curve, sigma is the standard deviation of the first relation curve, and C is a first positioning coefficient;

intercepting a part of energy which is continuously above a first self-adaptive positioning threshold value from a first relation curve as a first effective signal frequency band;

and intercepting the first effective signal frequency band from the first standard time-frequency image based on the first effective signal frequency band to obtain a first effective time-frequency image, and taking the signal bandwidth, the windowing standard deviation, the average rectangular similarity and the central distribution law of the connected region of the first effective time-frequency image as feature vectors.

In another embodiment, in step 3, the determining, based on the feature vector, a frequency point of a Morse signal in the first effective time-frequency image specifically includes:

and inputting the feature vector into a trained random forest classifier, classifying each modulation signal and obtaining the frequency point of the Morse signal in the first effective time-frequency image.

In another embodiment, step 4 is specifically:

and taking the frequency point of the Morse signal as the center frequency, and filtering all interference signals and noise except 15Hz above and below the center frequency in the audio time domain signal to obtain the interference-free signal.

In another embodiment, in step 5, the performing short-time fourier transform on the non-interference signal to obtain a second standard time-frequency image with high time resolution specifically includes:

wherein X '(m) is an interference-free signal, X' _n (e ^j2πf ) Is a time-frequency matrix without interference signals; omega' (n-m) is a window function sliding gradually with high time resolution, namely short time Fourier transform parameters with high time resolution; j is an imaginary unit, f is a frequency, m is an audio signal sequenceTime, n is the time of a time-frequency matrix obtained by short-time Fourier transform;

obtaining the square of a time-frequency matrix mode of the interference-free signal to obtain a short-time Fourier spectrogram, namely a second standard time-frequency image, wherein the short-time Fourier spectrogram is as follows:

P′ _n (f)＝|X′ _n (e ^j2πf )| ²

Wherein P' _n (f) Is a second standard time-frequency image.

In another embodiment, in step 5, the energy sorting is performed on the second standard time-frequency image to obtain a second effective time-frequency image, which specifically includes:

superposing the time axis energy data of the same frequency point of the interference-free signal time-frequency matrix to obtain a second relation curve of frequency and energy, wherein the second relation curve is as follows:

wherein F' (F) is a second relationship, n ₀ For the time starting point of the time-frequency matrix, n _k Is the time end point of the time-frequency matrix;

constructing a second self-adaptive positioning threshold based on the mean value and the standard deviation of the second relation curve, wherein the second self-adaptive positioning threshold is as follows:

T′＝μ′+C′σ′

wherein T 'is a second self-adaptive positioning threshold, mu' is the mean value of a second relation curve, sigma 'is the standard deviation of the second relation curve, and C' is a second positioning coefficient;

intercepting a part of energy which is continuously above a second self-adaptive positioning threshold value from a second relation curve as a second effective signal frequency band;

and intercepting the second effective signal frequency band from the second standard time-frequency image based on the second effective signal frequency band to obtain a second effective time-frequency image, and storing the second effective time-frequency image as a proper size.

In another embodiment, in step 6, a convolutional neural network layer added with a convolutional attention mechanism is used for feature sequence extraction, and the process is as follows:

Convolutional layer extraction of second effective time frequencyThe feature map F of the image and the channel attention feature map M based on the channel attention module _C (F) The method comprises the following steps:

in sigma _C 、W ₀ 、W ₁ The weights of the channel activation function and the MLP are respectively MaxPool (F) and AvgPool (F), the feature map F is respectively subjected to global maximum pooling processing and average pooling processing in the channel attention module,respectively carrying out global maximum pooling treatment and channel description after average pooling treatment on the feature map F in a channel attention module;

map channel attention profile M _C (F) Multiplying the corresponding elements with the feature map F one by one to obtain F ', and taking F' as the input of the spatial attention module to obtain a spatial attention feature map M _s (F') is:

in sigma _S 、f ^7×7 The space activation function and the convolution layer with 7 x 7 convolution kernel are respectively, the MaxPool (F ') and the AvgPool (F ') are respectively the feature map F ' to carry out global maximum pooling treatment and average pooling treatment in the space attention module,the method comprises the steps that a global maximum pooling treatment and an average pooling treatment are carried out on a feature diagram F' in a spatial attention module respectively, and channel description is carried out;

map of spatial attention profile M _s And (F ') and F' corresponding elements are multiplied one by one to generate a feature map, wherein the feature map is as follows:

Where F' is the final generated feature map, i.e., the feature sequence arranged in the time domain.

In another embodiment, in step 7, the message prediction is performed based on the feature sequences arranged according to the time domain, and the message is output.

And identifying the feature sequences arranged according to the time domain by adopting a double-layer bidirectional gating circulation unit to obtain the predicted message information.

The invention provides a Morse signal detection and identification method, which is used for carrying out self-adaptive energy sorting on an audio signal based on a time-frequency diagram and detecting a target signal frequency point according to characteristics. And carrying out narrow-band digital filtering on the original audio signal by taking the frequency point as the center frequency, re-extracting the frequency band of the target signal on the time-frequency diagram after noise reduction, and carrying out pseudo-color image enhancement. And finally, carrying out identification to realize end-to-end decoding, effectively detecting and identifying the Morse signal in a complex channel environment, solving the contradiction problem of detection and identification work in terms of resolution, effectively improving the identification accuracy in a low signal-to-noise ratio environment, and verifying the robustness and the practicability of the signal-to-noise ratio environment.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an ED-FE-CCBC structure in accordance with an embodiment of the present invention;

FIG. 2 is a flow chart of a detection and identification method according to an embodiment of the invention;

fig. 3 is a schematic diagram of a spectrum segment in an embodiment of the present invention, where (a) is a time-frequency segment example of other modulated signals, and (b) is a time-frequency segment example of a Morse signal;

fig. 4 is a schematic diagram of an energy-time variation curve according to an embodiment of the present invention, where (a) is an example of an energy-time variation curve of other modulated signals, and (b) is an example of an energy-time variation curve of a Morse signal;

FIG. 5 is a time-frequency image before filtering and denoising in an embodiment of the present invention;

FIG. 6 is a filtered denoised time-frequency image according to an embodiment of the present invention;

FIG. 7 is a block diagram of an attention mechanism module in an embodiment of the invention;

FIG. 8 is a schematic diagram of a feature sequence arranged by time domain in an embodiment of the present invention;

FIG. 9 is a block diagram of a two-layer BiGRU and CTC splice in an embodiment of the invention;

FIG. 10 is a schematic diagram of Morse signal detection in a complex environment under an example of an embodiment of the present invention;

FIG. 11 is a graph showing the variation of word accuracy of simulation data with SNR for an example in accordance with an embodiment of the present invention;

fig. 12 is a schematic diagram of a variation of sample accuracy of simulation data with signal-to-noise ratio according to an example of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that all directional indicators (such as up, down, left, right, front, and rear … …) in the embodiments of the present invention are merely used to explain the relative positional relationship, movement, etc. between the components in a particular posture (as shown in the drawings), and if the particular posture is changed, the directional indicator is changed accordingly.

Furthermore, descriptions such as those referred to as "first," "second," and the like, are provided for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implying an order of magnitude of the indicated technical features in the present disclosure. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.

In the present invention, unless specifically stated and limited otherwise, the terms "connected," "affixed," and the like are to be construed broadly, and for example, "affixed" may be a fixed connection, a removable connection, or an integral body; the device can be mechanically connected, electrically connected, physically connected or wirelessly connected; either directly or indirectly, through intermediaries, or both, may be in communication with each other or in interaction with each other, unless expressly defined otherwise. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.

In addition, the technical solutions of the embodiments of the present invention may be combined with each other, but it is necessary to be based on the fact that those skilled in the art can implement the technical solutions, and when the technical solutions are contradictory or cannot be implemented, the combination of the technical solutions should be considered as not existing, and not falling within the scope of protection claimed by the present invention.

The method solves the problems that the automatic receiving effect of the Morse signal is poor, and the accuracy and the speed of detection and identification in the adjacent channel interference and low signal to noise ratio environment can not meet the practical requirements. The embodiment combines machine learning and deep learning to provide a Morse signal detection and identification method, which is realized by the ED-FE-CCBC structure in fig. 1, and takes a narrowband radio audio signal as input, and the method blindly detects a target Morse signal inside and realizes the output of end-to-end message information. In the detection stage, each modulation signal in the time-frequency diagram is separated by utilizing a self-adaptive threshold energy sorting method, the time-frequency domain characteristics and the frequency domain characteristics are selected for extraction, and the Morse signal detection is completed by applying a random forest classification method. In the identification stage, a preprocessing process of filtering and re-extraction is added, a digital filter is used for removing interference, short-time Fourier transform with high time resolution, target frequency band interception and gray scale time-frequency diagram pseudo-colorization are carried out, and a proper size is generated to serve as input of an identification network. Based on the improvement of the CRNN, a CCBC network structure integrating a convolution attention mechanism is provided, and the end-to-end decoding output is realized. Real-time performance can be ensured, and the accuracy is better than that of the current most advanced method.

Referring to fig. 2, a method for detecting and identifying a Morse signal disclosed in this embodiment specifically includes the following steps:

step 1, obtaining an audio time domain signal;

In this embodiment, referring to fig. 1, the ed-FE-CCBC structure is composed of three parts, namely, data collection (corresponding to the above step 1), signal detection (corresponding to the above step 2-3), and signal identification (corresponding to the above step 4-7). In the ED-FE-CCBC structure, ED represents shorthand of a signal detection stage energy detection method energy detection method, FE represents shorthand of a signal recognition stage filtering and re-extraction method, and CCBC is shorthand of a recognition network CBAM+CNN+BiGRU+CTC.

In the data collection process in step 1, the audio time domain signal may be real audio data. In this embodiment, two source data sets are prepared, and the frequency environments are all common frequency bands of radio stations with frequency ranging from 0Hz to 5000 Hz. In order to verify the practicability of the detection and identification method in the embodiment, the credibility of the signal identification model is enhanced, the professional reporting and training terminal can be utilized to collect real audio data, signals such as AM, FM, morse and voice signals with unknown strength, carrier frequency, number and information can randomly appear in the audio of the terminal, and a transmitting end can control the carrier frequency and the number of the transmitted Morse signals. Recording audio data of the training terminal in real time, and intercepting the audio data at equal intervals by taking time as a unit to generate an audio time domain signal as input information of the signal detection part.

In order to study the performance of the recognition model under different signal to noise ratios and simulate real data, the application program can generate ten thousand Morse signals, and the Morse signals are randomly added into Gaussian white noise with different signal to noise ratios. The experimental process of the simulation data can omit a digital filter and the previous processing thereof, and directly carry out short-time Fourier transformation and signal separation.

In this embodiment, a traditional machine learning method is used to detect a Morse signal, perform short-time fourier transform on an audio time domain signal, perform adaptive threshold energy sorting based on a time-frequency image, respectively extract feature vectors of sorted signals, apply a random forest classifier to sort, realize Morse signal detection, and output a corresponding carrier frequency (i.e., a frequency point), which is abbreviated as an energy detection method (ED).

The time-frequency diagram can reflect the relationship of energy, frequency and time at the same time, and is beneficial to extracting the effective characteristics of signals. The signal time spectrum can be obtained by applying a short-time Fourier transform (STFT) formula, and then the signal time spectrum is uniformly mapped to a pixel value interval [0, 255] suitable for image processing, so that a standard time-frequency image to be detected is obtained. Therefore, in the implementation process of step 2, the audio time domain signal is subjected to short-time fourier transform with high frequency resolution, specifically:

P _n (f)＝|X _n (e ^j2πf )| ²

wherein P is _n (f) Is a first standard time-frequency image.

In the time-frequency diagram, the energy of the frequency band with signals is obviously enhanced compared with the energy of background noise, so that the signal separation is realized by applying the energy rapid positioning method in the embodiment, the time axis energy data of the same frequency point of the time-frequency matrix are overlapped to obtain a relation curve of frequency and energy, the self-adaptive positioning threshold value is constructed by calculating the mean value and standard deviation of the curve, the part of the energy which is continuously above the threshold value is taken as an effective signal frequency band, and the corresponding frequency band of each signal is intercepted in the original time-frequency diagram. Therefore, in the implementation process of step 3, energy sorting and feature extraction are performed on the first standard time-frequency image to obtain a first effective time-frequency image and a feature vector, which are specifically as follows:

wherein F (F) is a first relationship, n ₀ For the time starting point of the time-frequency matrix, n _k Is the time-frequency momentA time-out point;

T＝μ+Cσ

wherein T is a first adaptive positioning threshold, μ is a mean value of a first relationship curve, σ is a standard deviation of the first relationship curve, C is a first positioning coefficient, and in this embodiment, the first positioning coefficient C is set to 0.8 through multiple tests.

And intercepting a frequency band corresponding to a part of energy which is continuously above the first self-adaptive positioning threshold value from the first relation curve as a first effective signal frequency band. And intercepting a time-frequency image of a corresponding frequency band in the first standard time-frequency image based on the first effective signal frequency band to obtain a first effective time-frequency image. In this embodiment, four features of three dimensions in the first effective time-frequency image are selected to form feature vectors to distinguish the Morse signal, where the four features are respectively: the signal bandwidth, the windowing standard deviation, the average rectangular similarity and the central distribution law of the connected region are relatively stable in performance under the narrow-band environment, and the extraction speed is high.

In this embodiment, the signal bandwidth is the frequency band length of the effective signal, as shown in fig. 3, where the Morse signal bandwidth is generally within tens of hertz, and even only a few hertz after interception through the threshold under the condition of low signal-to-noise ratio, the radio signals with large bandwidths, such as the target signal, the voice signal, and the like, can be distinguished.

For the windowing standard deviation, the corresponding F (F) can be calculated according to the time-frequency matrix corresponding to the effective signal frequency band, and the corresponding frequency point F when the maximum value is obtained _i Further, an energy-time curve F (t) corresponding to the frequency point can be obtained, and in order to reduce the influence of the signal amplitude, the energy values are all mapped to [0, 255]Interval. As shown in fig. 4, F (t) fluctuation of the Morse signal is strong as compared with other signals. The curve time axis is equally divided into M windows, standard deviation of energy values is calculated respectively, and then average is calculated, so that the curve time axis can be used as a measurement index of the fluctuation degree of signals.

For average rectangular similarity, the Morse signal appears as a plurality of intermittent regular rectangular areas on the time-frequency plot (as shown in fig. 3), and the average rectangular similarity is higher than other modulated signals.

For the distribution law of the centers of the connected regions, the y coordinate value (representative frequency) of the center point of each connected region is averaged Maximum value y _max Minimum value y _min The intervals [ a, b ] can be calculated]The method comprises the following steps:

the proportion of the number of the central points in the interval to the total number of the central points is the central distribution law. The centers of the rectangular blocks of the Morse signal are basically distributed on a horizontal line (as shown in figure 2), and the center distribution is higher than that of other modulation signals.

After the first effective time-frequency image and the feature vector are determined, a random forest algorithm is adopted in the embodiment to determine the frequency point of the Morse signal in the first effective time-frequency image. The random forest algorithm builds a plurality of independent sub-decision trees as building blocks of the classification task, each sub-decision tree uses different methods to generate binary questions for classification purposes, votes on predictions of input data, and determines final classification results by a polling process. Compared with the traditional decision tree classification algorithm, the random forest classifier more effectively reduces the variance by combining the small decision tree with the random feature subset, thereby preventing the overfitting in the training process. Therefore, for determining the frequency point of the Morse signal in the first effective time-frequency image based on the feature vector in the step 3, the specific implementation process is as follows:

and inputting the feature vectors into a trained random forest classifier, classifying each modulation signal, and obtaining the feature vectors corresponding to the Morse signals, thereby obtaining the frequency points of the Morse signals in the first effective time-frequency image. The corresponding Morse telegraph frequency point in the signal detection result and the audio time domain signal is obtained.

The time characteristics of the time-frequency diagram of the target signal obtained in the signal detection stage are obviously distorted, and the time characteristics cannot be directly applied to the identification work. If the short-time fourier transform is performed with high time resolution to intercept the signal frequency band (the frequency resolution becomes several hundred hz), aliasing phenomenon shown in fig. 5 may exist due to adjacent frequency Morse interference, which cannot be recognized.

The digital filter can filter out all signals outside the specified bandwidth, and has obvious effect on improving the signal-to-noise ratio. Therefore, in the implementation process of step 4, the audio time domain signal is digitally filtered based on the frequency point of the Morse signal to obtain the interference-free signal, which is specifically:

fast narrow-band filtering with Morse signal frequency point as center frequency: and taking the frequency point of the Morse signal obtained by signal detection as the center frequency, and filtering all interference signals and noise except 15Hz above and below the center frequency in the audio time domain signal to obtain an interference-free signal.

Then, short-time Fourier transform with high time resolution is carried out, and an effective signal area is intercepted again by utilizing a signal sorting method in a detection stage, so that the image rule is clear, and the problem that high detection rate and high recognition rate cannot be achieved is effectively solved by filtering and denoising as can be observed from FIG. 6. Therefore, in the implementation process of step 5, the non-interference signal is subjected to short-time fourier transform to obtain a second standard time-frequency image with high time resolution, which specifically includes:

Wherein X '(m) is an interference-free signal, X' _n (e ^j2πf ) Is a time-frequency matrix without interference signals; omega' (n-m) is a window function sliding gradually with high time resolution, namely short time Fourier transform parameters with high time resolution; j is an imaginary unit, f is frequency, m is time of an audio signal sequence, and n is time of a time-frequency matrix obtained by short-time Fourier transform;

P′ _n (f)＝|X′ _n (e ^j2πf )| ²

wherein P' _n (f) Is a second standard time-frequency image.

In step 5, the specific implementation process of energy sorting the second standard time-frequency image to obtain the second effective time-frequency image is as follows:

T′＝μ′+C′σ′

And intercepting a frequency band corresponding to a part of energy which is continuously above a second self-adaptive positioning threshold value from a second relation curve as a second effective signal frequency band. And intercepting a time-frequency image of a corresponding frequency band in the second standard time-frequency image based on the second effective signal frequency band, and storing the time-frequency image as a proper size to obtain a second effective time-frequency image.

Image enhancement is mainly to make the processed image more suitable for human visual system or computer recognition system, and enhance the reconfigurability of the image. The pseudo-color image is processed by converting the gray scale of black gray into different colors, and the more the number of separated layers is, the more the extracted information is, so that the purpose of image enhancement is achieved, and the pseudo-color image is an image enhancement technology with obvious visual effect and less complexity. Therefore, in the implementation process of step 5, the image enhancement processing on the second effective time-frequency image is specifically:

the gray map is pseudo-colorized by selecting a viridis map that forms a gradient color by changing the saturation of the color, values that are close to each other have similar colors, and values that are further apart have more different colors. The virdis mapping utilizes the color space as much as possible, can maintain uniformity, and is suitable for the situation of linear distribution of numerical distribution. In most cases, this gradual change may be preferred.

In this embodiment, in the implementation process of feature sequence extraction in step 6, a convolutional neural network layer added with a convolutional attention mechanism is used for feature sequence extraction. The attention mechanism in the computer vision is derived from the research on the human vision, and the computer network can pay attention to useful information like the human vision by training and adjusting parameter configuration, so that the interference of invalid information is reduced, and the efficiency and the accuracy are improved. The convolution attention mechanism module (CBAM) combines the channel and the spatial attention mechanism with each other according to the sequence, so that the extraction capability of the convolution layer on key features can be effectively improved, and unnecessary features are restrained. In addition, the CBAM is a lightweight plug-and-pull pendant type module, can be seamlessly integrated into any CNN architecture (accessed after each layer of CNN) and ignores the cost to perform end-to-end training, and meets the quick decoding requirement of Morse telegrams. The CBAM structure in this embodiment is shown in fig. 7, and the process of extracting the feature sequence is as follows:

feature map F epsilon R extracted by convolution layer through second effective time-frequency image ^C×H×W First, through the channel attention mechanism module, which content has important roles is distinguished. The feature map F is respectively subjected to global maximum pooling and average pooling of space to obtain two channel descriptions of 1 multiplied by C And->Sending the information into a shared network formed by a multi-layer perceptron (Multilayer Perceptron, MLP) with a single hidden layer, summing and combining the information element by element after output, activating the information by a Sigmoid function, and generating a channel attention characteristic diagram M _C (F) The method comprises the following steps:

in sigma _C 、W ₀ 、W ₁ The weights of the channel activation function and the MLP are respectively MaxPool (F) and AvgPool (F), the feature map F is respectively subjected to global maximum pooling processing and average pooling processing in the channel attention module,and carrying out global maximum pooling treatment and channel description after average pooling treatment on the feature map F in the channel attention module respectively.

Map channel attention profile M _C (F) And multiplying the corresponding elements with the feature map F one by one to obtain F ', and taking F' as the input of the spatial attention module. The spatial attention mechanism focuses on which places of the picture have important roles, and F' enters the maximum pooling layer and the average pooling layer to obtainAnd->Convolution dimension reduction is 1 channel number, and space attention characteristic diagram M is generated by activation _s (F) The method comprises the following steps:

in sigma _s 、f ^7×7 The space activation function and the convolution layer with 7 x 7 convolution kernel are respectively, the MaxPool (F ') and the AvgPool (F ') are respectively the feature map F ' to carry out global maximum pooling treatment and average pooling treatment in the space attention module, The method comprises the steps that a global maximum pooling treatment and an average pooling treatment are carried out on a feature diagram F' in a spatial attention module respectively, and channel description is carried out;

map of spatial attention profile M _s (F ') and F ' are multiplied one by one to finally generate a feature map F ', and the whole CBAM process can be expressed as:

in the method, in the process of the invention,representing the multiplication of the corresponding elements one by one, F "is the final generated feature map, i.e. the feature sequence arranged in the time domain.

In this embodiment, specific settings of the CBAM-CNN module are shown in table 1, the combination layer is a convolution layer, a regularization layer, an activation function, an attention mechanism module, and a maximum pooling layer in sequence, two continuous combination layers are designed, the input picture size is uniform to 32, the width is indefinite, and only the maximum width is set. Through the network, the frequency direction dimension (height) of the Morse signal time-frequency diagram becomes 1, the time direction dimension (width) becomes one fourth of the original picture, and the channel direction dimension is 64, so that the characteristic sequence shown in FIG. 8 is formed.

TABLE 1 concrete arrangement of CBAM-CNN modules

In the specific implementation process in the step 7, a double-layer bidirectional gating circulation unit is adopted to identify the feature sequences arranged according to the time domain, so as to obtain the predicted message information. Gating loop units (GRUs) are a variant of Long and Short Term Memory (LSTM) networks that combine LSTM input gates and forget gates into update gates, while combining data unit states and hidden states, enabling simplification of the network architecture. Like LSTM network, GRU can capture the information of the preamble, process the variable length sequence, inhibit gradient and disperse and gradient explosion phenomenon, etc. notably; the model parameters are fewer because the structure is simpler, the overfitting risk in the training process can be reduced, and the basis of ensuring the prediction accuracy is ensured And the convergence speed is improved. The code word prediction at any moment of the Morse telegram has important association with the preceding and following codes, but the GRU network can only sense the preamble information transmitted in time positive sequence in one way and cannot consider future code words. Two-way gating circulation unit (BiGRU) trains two GRU networks with opposite directions and independent parameters simultaneously, has stronger memory and screening capacity of context logic relationship, and can predict the current message more accurately. Each sequence output of the BiGRU corresponds to a character element, in practical situations, the word length of a Morse signal is uncertain, the corresponding time length of each character is different, and character alignment is difficult to achieve. The CTC provides a loss function calculation method without alignment, and the loss function calculation method is added to the output layer of the last BiGRU layer, so that the uncertain pressure of network prediction is relieved, and the end-to-end automatic alignment output is realized. We set two layers of BiGRU and CTC splice, the structure is shown in figure 9, the characteristic sequence x is input ₁ ，x ₂ ，…，x _n The predicted message information can be directly output.

The detection and recognition method in this embodiment is further described below with reference to specific examples.

And the detection part is connected with the terminal and the computer, records the audio data in real time by the application computer, cuts every 10s, and sequentially sends the audio data into the detection system. Because the data such as the signal to noise ratio and the like cannot be known specifically, 65 sections of audio data with different complexity degrees are collected, signals are separated and four characteristics are extracted through an energy sorting method, characteristic vectors and labels are manufactured, a random forest classifier is trained, after parameters are adjusted, the frequency resolution of a time-frequency diagram is set to be 6-9 Hz, and detection of Morse signals in a complex environment can be basically realized.

In the example audio, four paths of Morse signals and four paths of other modulation signals are shared, the frequency points of the Morse signals are respectively 2000 Hz, 2100 Hz, 2770 Hz and 2800Hz, and the detection flow and the result output are shown in FIG. 10.

The audio time domain signal diagram and the time frequency diagram can observe that the signal to noise ratio of four paths of Morse signals is low, especially the signal at 2800Hz has adjacent frequency interference and low self energy, and the adjacent frequency interference and the self energy cannot be observed at all in the audio time frequency diagram, so that framing is not performed, and the target detection method fails. Under the condition of a small amount of audio samples, the neural network classifier cannot learn enough features and requires more training time, the random forest classifier almost does not need training time, and a small amount of training data can be used for evaluating the importance of each feature on classification problems to obtain classification results. Experiments show that the energy detection method in the embodiment can successfully detect Morse signals under various signal-to-noise ratios and adjacent frequency conditions under a complex narrow-band environment, and the output frequency point has slight deviation from the original carrier frequency, but does not influence the follow-up work.

The input data of the identification network are divided into two types, namely 10560 simulation Morse signal time-frequency diagrams, into a training set, a verification stage and a test set 1 according to proportion, and the frequency, the code speed and the signal-to-noise ratio are randomly distributed according to a preset range. The 400 real-recorded audio detection filtered time-frequency segments are used as a test set 2. The specific parameter settings are shown in table 2, and the time-frequency diagram width of the two data are respectively determined according to the report length and the duration.

Table 2 parameter settings

Experiment operation environment: CPU (Central processing Unit)CoreTMi9-10900K@3.70GHZ,GPU NVIDIA GeForce RTX3080; code is written and run under the Windows system tensorflow2.4 framework. The optimization mode selects RMSProp, and the learning rate is set to be 0.001 and is reduced to 0.0001; the batch_size is set to 100 for 100 iterations.

Three indexes are applied to measure the recognition performance, namely: word (symbol) accuracy, sample accuracy, decoding time. The word accuracy is the percentage of all decoded correct characters (word space appears as space characters) to the total characters, the sample accuracy is the percentage of the total number of samples to the full correct samples, and the decoding time is the time required to identify a single sample.

The method of the embodiment is compared with the CRNN identification network with the optimal performance in the existing algorithm, and the network and the input picture are respectively as follows: (1) CRNN network, grey scale time frequency diagram; (2) CCBC network, pseudo-color time-frequency diagram. The change curves of the word accuracy and the sample accuracy of the simulation data along with the signal-to-noise ratio are shown in fig. 11 and 12. The accuracy of the CCBC network is always superior to that of the CRNN network, particularly, the degree of distinguishing between signals and surrounding noise is reduced under the environment that the signal-to-noise ratio is lower than-16 dB, the dotted rectangular outline is gradually deformed, and the accuracy of the CRNN network is obviously reduced. In the method of the embodiment, the pseudo color processing enhances the time-frequency diagram level, and the important characteristic region can be accurately locked by combining with the CBAM module, so that the character accuracy is kept above 80%.

The total accuracy of the two test sets and the decoding time are shown in table 3. The accuracy difference of the two methods in the test set 1 is a few percent; in the test set 2, the real recording audio signal environment is complex, and the conditions of frequency offset, code length deviation, adjacent frequency interference, signal energy imbalance and the like possibly exist, so that the CCBC network accuracy advantage is more outstanding and is about 10% higher than the CRNN network. Compared to CRNN, CCBC architecture is more complex, but the recognition time is essentially uniform. And the total delay of the system can meet the real-time requirement of engineering application by adding the time (about 0.8 s) required by detection and filtering. The results show that ED-FE-CCBC has stronger applicability under the condition of low signal-to-noise ratio and real environment.

Table 3 total accuracy of decoding for two test sets and decoding time

The size of the network input picture is identified, the accuracy and the decoding time are influenced, the larger the size is, the higher the accuracy is, and the longer the decoding time is. And setting the simulation data time-frequency diagram to be different in width, and testing the influence of the simulation data time-frequency diagram on algorithm results. Under the environment of-20 dB, even if the time is sacrificed to widen the picture, the word accuracy is always lower than 90%, and the practical requirement cannot be met, so that the signal-to-noise ratio of-18 dB and above is taken as the experimental environment to select the proper size. As shown in table 4, at a picture width of 200, the coding accuracy reaches a peak, and then remains substantially maintained but increases in time. And the width 200 is the optimal size after measuring the accuracy and the algorithm consuming time, and the corresponding character quantity is calculated to obtain the conclusion that each minimum unit (the time length of a point) of the input picture at least needs to contain two pixel points.

The accuracy and time-consuming of the algorithm of table 4

In summary, the embodiment proposes a Morse signal automatic detection and identification method based on ED-FE-CCBC structure, which is applied to a complex narrowband channel, takes radio audio as input, and outputs Morse signal message in real time. The simulation data and the real data are used for experiments, and the result shows that the performance of the system is obviously superior to that of the existing method, the contradiction problem of detection and identification work in the aspect of resolution is solved, the identification accuracy in the environment with low signal to noise ratio is effectively improved, and the robustness and the practicability are verified. Under the condition that the starting and stopping positions of the messages are not known, the radio station signals are automatically received, the audio can be intercepted according to time, the head and tail of the radio station signals can be broken, the message information is lost, the problems of splicing and re-segmentation of continuous same-frequency Morse time-frequency diagrams are studied in the future, and the radio station Morse signals are received in real time without loss.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structural changes made by the description of the present invention and the accompanying drawings or direct/indirect application in other related technical fields are included in the scope of the invention.

Claims

1. The Morse signal detection and identification method is characterized by comprising the following steps:

step 1, obtaining an audio time domain signal;

the energy sorting and feature extraction are carried out on the first standard time-frequency image to obtain a first effective time-frequency image and a feature vector, specifically:

wherein F (F) is a first relationship, n ₀ For the time starting point of the time-frequency matrix, n _k For the time end point of the time-frequency matrix, P _n (f) Is a first standard time-frequency image;

T＝μ+Cσ

wherein T is a first self-adaptive positioning threshold, mu is the mean value of a first relation curve, sigma is the standard deviation of the first relation curve, and C is a first positioning coefficient;

intercepting a first effective time-frequency image in a first standard time-frequency image based on a first effective signal frequency band, and taking the signal bandwidth, windowing standard deviation, average rectangular similarity and a connected region center distribution law of the first effective time-frequency image as feature vectors;

step 5, performing short-time Fourier transform on the interference-free signal to obtain a second standard time-frequency image with high time resolution, and performing energy sorting on the second standard time-frequency image to obtain a second effective time-frequency image, wherein the method specifically comprises the following steps:

wherein F '(F) is a second relationship, P' _n (f) Is a second standard time-frequency image;

T′＝μ′+C′σ′

intercepting a second effective signal frequency band from a second standard time-frequency image based on the second effective signal frequency band to obtain a second effective time-frequency image, and storing the second effective time-frequency image as a proper size;

2. The Morse signal detecting and identifying method of claim 1, wherein in step 2, the performing short-time fourier transform on the audio time domain signal comprises:

wherein X (m) is an audio time domain signal, X _n (e ^j2πf ) A time-frequency matrix which is an audio time-domain signal; omega (n-m) is a window function with gradually sliding high frequency resolution, namely short-time Fourier transformation parameters with high frequency resolution; j is an imaginary unit, f isThe frequency, m, is the time of the audio signal sequence, and n is the time of a time-frequency matrix obtained by short-time Fourier transform;

P _n (f)＝|X _n (e ^j2πf )| ² 。

3. The method for detecting and identifying a Morse signal according to claim 1 or 2, wherein in step 3, the determining, based on the feature vector, a frequency point of the Morse signal in the first effective time-frequency image is specifically:

4. The Morse signal detecting and identifying method according to claim 1 or 2, wherein step 4 specifically comprises:

5. The method for detecting and identifying a Morse signal according to claim 4, wherein in step 5, the performing short-time fourier transform on the non-interference signal obtains a second standard time-frequency image with high time resolution, specifically:

wherein X '(m) is an interference-free signal, X' _n (e ^j2πf ) Is a time-frequency matrix without interference signals; omega' (n-m) is a window function sliding gradually with high time resolution, namely short time Fourier transform parameters with high time resolution; j is an imaginary unit, f is frequency, m is time of the audio signal sequence, and n is obtained by short-time Fourier transform Time of the time-frequency matrix of (2);

P′ _n (f)＝|X′ _n (e ^j2πf )| ² 。

6. the Morse signal detecting and identifying method according to claim 1 or 2, wherein in step 6, a convolutional neural network layer added with a convolutional attention mechanism is adopted for feature sequence extraction, and the process comprises the following steps:

the convolution layer extracts a feature map F of the second effective time-frequency image, and obtains a channel attention feature map M based on the channel attention module _C (F) The method comprises the following steps:

in sigma _S 、f ^7×7 Convolution with 7 x 7 convolution kernel and spatial activation function, respectively The layers MaxPool (F '), avgPool (F ') are respectively the global maximum pooling process and average pooling process of the feature map F ' in the spatial attention module,the method comprises the steps that a global maximum pooling treatment and an average pooling treatment are carried out on a feature diagram F' in a spatial attention module respectively, and channel description is carried out;

7. The method for detecting and identifying a Morse signal according to claim 1 or 2, wherein in step 7, the message prediction is performed based on the feature sequences arranged according to the time domain, and the message prediction is output as follows: