CN117219125A

CN117219125A - Marine mammal sound signal imitation hidden scoring method based on audio fingerprint

Info

Publication number: CN117219125A
Application number: CN202311464609.XA
Authority: CN
Inventors: 姜帅; 李玉芳; 曹润琪; 王猛; 蒋嘉铭; 施威; 牛秋娜; 龙奇
Original assignee: Qingdao University of Science and Technology
Current assignee: Qingdao University of Science and Technology
Priority date: 2023-11-07
Filing date: 2023-11-07
Publication date: 2023-12-12
Anticipated expiration: 2043-11-07
Also published as: CN117219125B

Abstract

The invention relates to the technical field of bionic concealed underwater sound communication, in particular to a marine mammal sound signal imitating concealed scoring method based on audio fingerprints. The method utilizes the audio fingerprint technology to calculate the similarity of the audio fingerprints, takes the similarity as a score of the concealment of the bionic signals, applies the audio fingerprint technology to the concealment evaluation of the bionic signals, utilizes the advantages of the accuracy, the reliability, the robustness and the like of the audio fingerprint technology to evaluate the concealment effect of the bionic signals more comprehensively and accurately, and provides a new analysis method and evaluation tool for the concealment evaluation of the bionic signals.

Description

Marine mammal sound signal imitation hidden scoring method based on audio fingerprint

Technical Field

The invention relates to the technical field of bionic concealed underwater sound communication, in particular to a scoring method for concealing an imitation marine mammal sound signal based on an audio fingerprint.

Background

With the development of the underwater acoustic communication technology, besides reliability, communication rate and networking, the safety and concealment of the underwater acoustic communication are also increasingly important. The conventional method mostly adopts a low detection probability (Low Probability of Detection, LPD) technology to realize hidden underwater sound communication, and unlike the conventional LPD hidden communication technology, the bionic hidden underwater sound communication technology utilizes marine organism sound inherent in the ocean or artificial synthesized analog sound as a communication signal.

At present, research on the bionic concealed underwater sound communication technology at home and abroad is limited to evaluation of performance standards such as anti-interference performance, communication rate and error rate, and unified standards are not formed on evaluation of bionic effect and concealment, and the bionic concealed underwater sound communication technology is to disguise a secret signal into marine organism sound, so that a non-partner is confused to judge that the received sound signal is ignored as marine organism noise, the purpose of concealed communication is achieved by using a disguised idea, and the capability of avoiding being detected is important as a safe communication mode. Therefore, the concealment of the bionic signal and the bionic effect are very important to the bionic concealed underwater sound communication technology.

Audio fingerprinting (Audio fingerprinting technology) is to extract unique digital features from a piece of audio in the form of identifiers by a specific algorithm, and is similar to human fingerprints, and is mainly used for identifying massive voice samples or tracking and locating the positions of samples in a database. As a core algorithm of the content automatic identification technology, the audio fingerprint has been widely applied to aspects such as music identification, voice identification, voiceprint model, security verification, integrity verification, copyright protection, etc. However, no report has been found to date whether the audio fingerprint technology is applied to disguise evaluation of bionic disguise underwater sound communication technology.

Disclosure of Invention

The invention aims at providing the method for scoring the concealment of the simulated marine mammal sound signal based on the audio fingerprint, which utilizes the audio fingerprint technology to calculate the similarity of the audio fingerprint and uses the similarity as the score of the concealment of the bionic signal, applies the audio fingerprint technology to the evaluation of the concealment of the bionic signal, utilizes the advantages of the accuracy, the reliability, the robustness and the like of the audio fingerprint technology to evaluate the concealment effect of the bionic signal more comprehensively and accurately, and provides a new analysis method and an evaluation tool for the evaluation of the concealment of the bionic signal.

In order to achieve the above purpose, the present invention provides the following technical solutions: the invention provides an audio fingerprint-based marine mammal voice signal imitation concealment scoring method, which comprises the following steps of:

s1, preprocessing the audio: preprocessing the original marine mammal sound audio and the simulated marine mammal sound audio by using Adobe audio;

s2, designing and realizing an extraction algorithm of the power spectrum characteristics of the audio signal, and extracting the power spectrum characteristics meeting the requirements;

s3, designing and realizing an audio fingerprint generation algorithm to generate a required audio fingerprint;

s4, fingerprint matching is carried out on the bionic signal audio fingerprint and the original signal audio fingerprint obtained in the step S3, the audio similarity is calculated, and the calculated result is used as the score of the concealment of the marine mammal-like sound signal;

s5, designing and constructing a marine mammal sound signal imitation hidden scoring model and a CLF model;

s6, training a CLF model, inputting a bionic sound signal into the model, and taking the concealment score obtained in the step S4 as a label;

s7, using a trained CLF model, performing signal concealment scoring on the simulated marine mammal crys.

Preferably, step S1 specifically includes:

the Adobe audio software is used for carrying out noise reduction, sound enhancement, echo cancellation and click sound removal operations on the original marine mammal sound audio and the marine mammal sound imitation audio, and then the audio signal is digitized to improve the quality, accuracy and applicability of the audio signal.

Preferably, step S2 specifically includes:

s2.1, designing and realizing a front-end processing part of a power spectrum characteristic extraction algorithm, wherein the front-end processing comprises framing, windowing, conversion and short-time Fourier transform (STFT) operation;

wherein, the windowing process selects Hamming window as STFT, the audio signal is divided into frames when the window function moves, the overlapping part of two adjacent frames is set to be 50% of the frame length, and the frame length is selected to be 5-20 ms according to the change of the voice tone period value of the person;

short-time fourier transform transforms an audio signal onto a power spectrum that contains both time and frequency domain characteristics of the signal, and for a given signal x (t), its STFT is calculated by the following equation:

；

s2.2 designing and implementing a feature extraction portion of a Power Spectrum feature extraction Algorithm

Selecting a power peak point in the power spectrum as a characteristic, capturing a peak value of the power spectrum by using a maximum value filter, wherein the filter adopts diamond-shaped structural elements, and the threshold value is selected to be 1/3 of the maximum power of the signal;

after extracting the power spectrum feature points, whether the audio fingerprints are equal or not depends not on whether the power values of the frequencies at specific time points are equal but on the time domain and frequency domain structural features of the audio signals, so that the final output result is a frequency (time) pair corresponding to the power spectrum feature points.

Preferably, the step S3 specifically includes:

s3.1 arithmetically ordering the sequences S (times) according to the times, and initializing the associated distance l=16 and the step size i=0, creating a set finger print ();

s3.2, acquiring the frequency F_base and the time T_base of the current peak point in the sequence S, and acquiring the frequency F_step and the time T_step of the next peak point in the sequence S with the interval smaller than the correlation distance L;

s3.3, calculating to obtain a time offset T_delta between two peak points;

s3.4, calculating a hash value by using a hash function Md5 and selecting the first 16 digits of the hash value, wherein the formula is as follows:

；

s3.5, adding the calculated hash value and the time T_step of the next peak point to the set finger print () to generate the final audio fingerprint.

Preferably, step S4 specifically includes:

when the similarity of the audio signals is calculated, firstly, the audio fingerprint algorithm is used for obtaining the audio fingerprints of the original signals and the bionic signals, a plurality of hash value and time offset pairs are obtained, then the audio fingerprints are matched, when each fingerprint of the original signal audio is compared with the fingerprint of the bionic signal audio, the number of the hash values matched under different time offset differences is counted, the similarity percentage between the two audios is calculated according to the number of the matched hash values and the total fingerprint number, and finally, the percentage scoring standard is used for taking the similarity of the audio fingerprints as the concealment score of the marine mammal-imitated sound signal.

Preferably, step S5 specifically includes:

the CLF model is obtained by mixing a CNN neural network and an LSTM neural network, and comprises an input layer, four CNN layers, two LSTM layers, two full-connection layers and an output layer;

wherein, the first two kernel_size is set to 3, stride is set to 2, the last two kernel_size is set to 2, stride is set to 1 in the four CNN layers; the first LSTM has its hidden_size set to 64 and the second LSTM has its hidden_size set to 32;

ReLU () is used as an activation function for all network layers.

Preferably, the step S6 specifically includes:

when the model is trained, the input characteristic data are all bionic audio data sets and a part of original audio data, the corresponding audio scores are used as labels, namely prediction results, and the end point condition of training is that the preset times of circulation is completed, or the weight is lower than a certain threshold, or the prediction error rate is lower than a certain threshold; if one of the ending conditions is met, training is completed;

using MSELoss () as a loss function, adam () as an optimizer.

The invention has the beneficial effects that:

(1) According to the method for scoring the concealment of the simulated marine mammal sound signal based on the audio extraction, the audio fingerprint similarity is calculated by utilizing an audio fingerprint technology, and the similarity is used as a score of the concealment of the simulated marine mammal sound signal. The method is used for evaluating the concealment effect of the bionic signal more comprehensively and accurately by utilizing the advantages of the accuracy, the reliability, the robustness and the like of the audio fingerprint technology, and provides a novel analysis method and an evaluation tool for evaluating the concealment of the bionic signal.

(2) According to the invention, a CNN-LSTM hybrid network is used for training a CLF bionic audio signal scoring model, the model is an end-to-end scoring model, the performance of the proposed model method is evaluated by using the existing data set, an optimal structure is found, and the proposed method is verified. The results show that the scores obtained by training using the neural network have remarkable accuracy and practicability.

Drawings

FIG. 1 is an overall flow chart of the present invention;

FIG. 2 is a flow chart of an audio fingerprint extraction algorithm of the present invention;

FIG. 3 is a flow chart of a hashing algorithm of the present invention;

FIG. 4 is a similarity calculation flow chart of the present invention;

FIG. 5 is a block diagram of a model CLF of the covert scoring of the present invention;

FIG. 6 is a CLF model training flow chart of the present invention;

fig. 7 is a graph of CLF model test set loss value lines of the present invention.

Detailed Description

In order to make the technical means, the inventive features and the effects achieved by the present invention easy to understand, the technical solutions in the embodiments of the present invention will be further clearly and completely described below with reference to the drawings in the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention relates to a marine mammal voice signal imitation concealment scoring method based on audio fingerprints, which is shown in fig. 1 and comprises the following steps of:

s1, preprocessing the original marine mammal sound audio and the simulated marine mammal sound audio by using Adobe audio, including a series of operations such as noise reduction, sound enhancement, echo cancellation, click sound removal and the like on an audio signal, and then digitizing the audio signal, so that the quality, accuracy, applicability and the like of the audio signal are improved.

S2, designing and realizing an extraction algorithm of the power spectrum characteristics of the audio signal, and extracting the power spectrum characteristics meeting the requirements, wherein the extraction algorithm comprises the following steps of:

s2.1, designing and realizing a front-end processing part of a power spectrum characteristic extraction algorithm, wherein the front-end processing comprises framing, windowing, conversion, short-time Fourier transform (STFT) and other operations, a Hamming window is selected in the windowing process to make the STFT, compared with a rectangular window, the problem of spectrum leakage can be better overcome, an audio signal is divided into frames when a window function moves, the overlapped part of two adjacent frames is set to be 50% of the frame length, and the frame length is properly selected to be 5-20 ms according to the change of the voice tone period value of a person. Short-time fourier transforms transform an audio signal onto a power spectrum that may contain both signal time and frequency domain features, and for a given signal x (t), its STFT may be calculated by the following equation:

。

s2.2, designing and realizing a feature extraction part of a power spectrum feature extraction algorithm, selecting a power peak point in a power spectrum as a feature, capturing a peak value of the power spectrum by using a maximum filter, wherein the filter adopts diamond-shaped structural elements, and the threshold value is selected as 1/3 of the maximum power of a signal. After extracting the power spectrum feature points, the specific values are not important, whether the audio fingerprints are equal or not is not dependent on whether the power values of the frequencies at specific moments are equal, and the time domain and frequency domain structural features of the audio signals are more concerned. The result of the final output is therefore the frequency (time) corresponding to the power spectrum feature point, as shown in fig. 2.

S3, designing and realizing an audio fingerprint generation algorithm, and generating a required audio fingerprint, wherein the method specifically comprises the following steps:

s3.1 arithmetically sorts the sequence S (time) by time, and initializes the association distance l=16 and the step size i=0, creating a set finger print ().

S3.2, acquiring the frequency F_base and the time T_base of the current peak point in the sequence S, and acquiring the frequency F_step and the time T_step of the next peak point in the sequence S with the interval smaller than the correlation distance L.

And S3.3, calculating the time offset T_delta between the two peak points.

；

s3.5 adds the calculated hash value and the time t_step of the next peak point to the set finger print () to generate the final audio fingerprint, as shown in fig. 3.

S4, fingerprint matching is carried out on the bionic signal audio fingerprint and the original signal audio fingerprint obtained through S3, the audio similarity is calculated, the calculated result is used as the score of the concealment of the marine mammal-like sound signal, and the specific algorithm steps are as follows:

when calculating the similarity of the audio signals, firstly, the audio fingerprint algorithm is used for obtaining the audio fingerprints of the original signals and the bionic signals, a plurality of pairs (hash values and time offsets) are obtained, then, the audio fingerprints are matched, when each fingerprint of the audio of the original signals is compared with the fingerprint of the audio of the bionic signals, the number of the hash values matched under different time offset differences is counted, finally, the similarity percentage between the two audios is calculated according to the number of the matched hash values and the total fingerprint number, the percentage scoring standard is used, the similarity of the audio fingerprints is used as the concealment score of the marine mammal-like sound signals, and the process is shown in fig. 4.

S5, designing and constructing a model-CLF model for simulating crying signals of marine mammals

The CLF model obtained by mixing the CNN neural network and the LSTM neural network comprises an input layer, four CNN layers, two LSTM layers, two full-connection layers and an output layer. Wherein, the first two kernel_size is set to 3, stride is set to 2, and the last two kernel_size is set to 2, stride is set to 1 in the four CNN layers. The first LSTM has its hidden_size set to 64 and the second LSTM has its hidden_size set to 32. Using ReLU () as an activation function for all network layers, the CFL model structure is shown in fig. 5.

S6, training a CLF model, inputting a bionic sound signal into the model, and taking the concealment score obtained in the step S4 as a label, wherein the method comprises the following steps of:

training a CLF model, wherein when the model is trained, the input characteristic data are all bionic audio data sets and a part of original audio data, the corresponding audio scores are used as labels, namely prediction results, and the end point condition of training is that the preset times of circulation is completed, or the weight is lower than a certain threshold value, or the prediction error rate is lower than a certain threshold value. If one of the ending conditions is met, training is complete. Using MSELoss () as the loss function, adam () as the optimizer, CLF model training procedure is shown in fig. 6, and the loss value curve of the test set in training is shown in fig. 7.

S7, performing marine mammal sound signal imitating concealment scoring by using a trained CLF model.

The invention relates to a method for scoring concealment of simulated marine mammal sound signals based on audio fingerprints, which applies an audio fingerprint technology to concealment evaluation of marine mammal sound bionic signals, and designs and realizes an audio fingerprint extraction algorithm suitable for the marine mammal sound signals. The algorithm uses STFT (short time Fourier transform) to extract a power spectrum that can contain both signal time domain and frequency domain features and selects power peak points therefrom as features to generate an audio fingerprint. After the power spectrum characteristic points are extracted, the specific values are not important, the extracted frequency spectrum characteristic points correspond to frequency, then the MD5 hash function is used for calculating a hash value through the correlation distance between the signal time domain and the signal frequency domain, and finally the hash value and the corresponding time offset form the audio fingerprint.

Obtaining audio fingerprints of an original signal and a bionic signal based on the audio fingerprint algorithm, obtaining a plurality of pairs (hash value and time offset), then matching the audio fingerprints, counting the number of hash values which can be matched under different time offset differences when each fingerprint of the original signal audio is compared with the fingerprint of the bionic signal audio, and finally calculating the similarity percentage between the two audios according to the matched number of hash values and the total fingerprint number, and taking the audio similarity as the concealment score of the bionic audio signal.

The hidden scoring model for simulating the marine mammal sound signals is realized on the basis of the scoring method, and is obtained by mixing a CNN neural network and an LSTM neural network, wherein the model comprises an input layer, four CNN layers, two LSTM layers, two full-connection layers and an output layer; when the model is trained, the input characteristic data are all bionic audio data sets and a part of original audio data, and the audio scores obtained by the scoring method are used as labels, namely prediction results.

According to the method for scoring the concealment of the simulated marine mammal sound signal based on the audio extraction, the audio fingerprint similarity is calculated by utilizing an audio fingerprint technology, and the similarity is used as a score of the concealment of the simulated marine mammal sound signal. The method is used for evaluating the concealment effect of the bionic signal more comprehensively and accurately by utilizing the advantages of the accuracy, the reliability, the robustness and the like of the audio fingerprint technology, and provides a novel analysis method and an evaluation tool for evaluating the concealment of the bionic signal.

According to the invention, a CNN-LSTM hybrid network is used for training a CLF bionic audio signal scoring model, the model is an end-to-end scoring model, the performance of the proposed model method is evaluated by using the existing data set, an optimal structure is found, and the proposed method is verified. The results show that the scores obtained by training using the neural network have remarkable accuracy and practicability.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that modifications may be made to the technical solutions described in the foregoing embodiments, or equivalents may be substituted for some of the technical features thereof; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. The marine mammal sound signal imitation hidden scoring method based on the audio fingerprint is characterized by comprising the following steps of:

2. The marine mammal-like sound signal concealment scoring method based on audio fingerprints as claimed in claim 1, wherein the step S1 is specifically:

3. The marine mammal-like sound signal concealment scoring method based on audio fingerprints of claim 1, wherein step S2 is specifically:

s2.1, designing and realizing a front-end processing part of a power spectrum characteristic extraction algorithm, wherein the front-end processing comprises framing, windowing, conversion and STFT (short time Fourier transform) operation;

；

4. The marine mammal-like sound signal concealment scoring method based on audio fingerprints of claim 1, wherein step S3 is specifically:

s3.3, calculating to obtain a time offset T_delta between two peak points;

；

5. The marine mammal-like sound signal concealment scoring method based on audio fingerprints of claim 1, wherein step S4 is specifically:

6. The marine mammal-like sound signal concealment scoring method based on audio fingerprints of claim 1, wherein step S5 is specifically:

ReLU () is used as an activation function for all network layers.

7. The marine mammal-like sound signal concealment scoring method based on audio fingerprints of claim 1, wherein step S6 is specifically:

using MSELoss () as a loss function, adam () as an optimizer.