CN108806718B

CN108806718B - Audio identification method based on analysis of ENF phase spectrum and instantaneous frequency spectrum

Info

Publication number: CN108806718B
Application number: CN201810585686.3A
Authority: CN
Inventors: 王志锋; 王静; 左明章; 叶俊民; 闵秋莎; 田元; 夏丹; 陈迪; 罗恒; 宁国勤
Original assignee: Central China Normal University
Current assignee: Central China Normal University
Priority date: 2018-06-06
Filing date: 2018-06-06
Publication date: 2020-07-21
Anticipated expiration: 2038-06-06
Also published as: CN108806718A

Abstract

The invention belongs to the technical field of digital audio signal processing, and discloses an audio identification method based on analysis of an ENF phase spectrum and an instantaneous frequency spectrum, which comprises the steps of preprocessing a signal to be detected, extracting characteristics of the ENF signal, analyzing the phase spectrum and the instantaneous frequency spectrum of the ENF signal, and extracting the fluctuation characteristics, the phase spectrum and the frequency spectrum fitting parameter characteristics of the ENF signal; performing feature fusion by a Discriminant Correlation Analysis (DCA) method to maximize the correlation among different feature sets; and finally, performing model construction on the fused features by using a deep random forest, and performing transfer learning on the trained model. The invention uses the characteristic level fusion technology to process the characteristic data, reduces the characteristic dimension and improves the identification gap, and uses the deep learning method to train the model, thereby greatly improving the accuracy of the passive tampering detection of the digital audio.

Description

Audio identification method based on analysis of ENF phase spectrum and instantaneous frequency spectrum

Technical Field

The invention belongs to the technical field of digital audio signal processing, and particularly relates to an audio identification method based on analysis of an ENF phase spectrum and an instantaneous frequency spectrum.

Background

Currently, the current state of the art commonly used in the industry is such that:

with the development of computer and internet related technologies, people rely more on the use of digital multimedia data. The advantage that the digital multimedia data is easy to store, edit and transmit brings convenience and fun to daily life of people. For example, people can simply and quickly use audio editing software to splice, add noise, change and other operations on digital audio files without any professional knowledge, which is a popular entertainment mode in the internet era. However, the technology is developed into a double-edged sword, and some lawbreakers are allowed to have an opportunity to multiply. Lawbreakers can maliciously tamper with digital audio and spread it widely, and are only imperceptible by the senses. If such digital audio files are applied to court recording testimony, false news dissemination and other occasions, serious consequences may be caused, and the legal justice and social trust order are damaged. Therefore, it is important to guarantee the authenticity and integrity of the digital audio and to perform tamper detection on the digital audio. Digital audio tampering detection is an important branch of digital audio forensics and is widely applied in the fields of judicial forensics, news justice, scientific discovery and the like.

Among the current digital audio tampering detection methods, the most effective method is a detection method based on the consistency of the power grid frequency, which has become almost a common standard for digital audio identification in the last decade, and has received attention from academic researchers and law enforcement agencies worldwide. The principle is that if the recording device records audio downwards when the recording device is connected to a power grid, the audio signal necessarily carries information of power grid Frequency (ENF). This not only makes ENF a watermark signal that is naturally embedded in the audio signal, but also can be used as a time stamp. ENF components (ENFC) embedded in an audio file may be extracted by band pass filtering. The digital audio tampering detection by utilizing the stability and uniqueness of the ENFC generally has two research ideas, the first is to compare the extracted ENFC with data in a power grid frequency database of a power supply department, determine whether audio recording time is consistent with the declared audio recording time, establish and store an ENF signal database in a large range with high difficulty and high cost, and no ENF database with high practical value exists at present. Grigoras first established an ENF reference database locally in romania. Liuyuming and the like analyze the North America power grid detection system and provide a method for establishing standard power grid frequency; and secondly, extracting certain characteristics in the ENF signal and carrying out consistency or regularity analysis. Grigoras originally proposed an ENF-based audio tampering detection algorithm, which mainly compares the fluctuation of ENF in the audio to be detected with the data of a reference year, so as to judge whether the audio is tampered. The Grigoras validation then analyzes the audio signal with a short time window, allowing for a more detailed and accurate comparison with a database. On the basis of Grigoras research, Rodrai guez and the like propose a method without using an ENF standard database, detect audio tampering by taking the consistency of ENF phase change as a characteristic, and select a boundary value to perform classification decision on the characteristic. Hover Jian et al, on the basis of Rodr i guez, constructs a new characteristic quantity to detect the discontinuity of the ENF phase using an ideal sinusoidal signal as a reference signal. Hoyongjia et al then improved the above method, proposed a method to directly calculate the maximum offset of ENF without additional reference signals, and in addition utilized multi-feature combinations to pinpoint the tampered area. Esquef and the like propose a TPSW (Two-Pass spread-Window) method to estimate the ENF background change level according to the transient frequency mutation of the tamper point ENF caused by the tampering operation, and a peak point of the actual transient frequency change exceeding the background change level is called the tamper point.

In summary, the problems of the prior art are as follows:

the research of digital audio passive tampering detection based on ENF has some problems:

1) an unapproved ENF comparison database. Comparing an ENF component in a signal to be detected with an ENF database to judge whether the voice signal is not reliable or not after being tampered;

2) most methods do not extract key characteristic data in the voice signal, and can directly make a decision on whether the voice signal is tampered or not;

3) ignoring the correlation among the feature sets, and not further processing the extracted original feature data;

4) most of the existing methods have low automation degree, poor effect and poor adaptability to different database signals.

The difficulty and significance for solving the technical problems are as follows:

an authoritative ENF comparison database is established, so that the cost is high, the management is difficult, and the significance of actual operation is not great; extracting key feature data in a speech signal to directly make a decision on whether the data is tampered or not is a problem that researchers want to overcome.

The method selects a phase spectrum and an instantaneous frequency spectrum which are sensitive to signal truncation in an ENF component of a signal as characteristics to carry out tamper detection; the invention uses the voice signals of three databases to carry out experiments, and uses a deep learning method to carry out model construction by deep random forest, thereby ensuring that the adaptivity and the automation degree of the scheme can be applied to the actual situation.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides an audio identification method based on analysis of an ENF phase spectrum and an instantaneous frequency spectrum. The invention extracts the phase and frequency characteristics by extracting the ENFC in the voice signal and analyzing the phase spectrum and the frequency spectrum of the ENFC. And performing feature fusion on the phase spectrum features and the frequency spectrum features by using a DCA method, and performing model construction on the fusion features by using a deep random forest, so that the obtained model can make a decision on whether any signal to be detected is falsified, and automatic detection of voice signal insertion and deletion operations is realized. According to the method, the representative phase and instantaneous frequency characteristics in the ENF component are fused, and the model is trained by using a deep learning method, so that the model capable of being automatically detected is obtained, the detection efficiency is improved, and the automation of digital audio tampering detection is realized.

The invention is realized in such a way that a digital audio true-false identification method based on analysis of an ENF phase spectrum and an instantaneous frequency spectrum comprises the following steps: firstly, preprocessing a signal to be detected, including down-sampling and narrow-band filtering, to obtain a narrow-band signal with the standard Frequency of power grid Frequency (ENF) as the center; then, extracting the characteristics of the ENF signal, analyzing the phase spectrum and the instantaneous frequency spectrum of the ENF signal, and extracting the phase spectrum fluctuation characteristics, the phase spectrum and the frequency spectrum fitting parameter characteristics of the ENF signal; feature fusion is carried out through a Discriminant Correlation Analysis (DCA) method, the correlation among different feature sets is maximized, the correlation among classes is eliminated, and the correlation in the classes is limited; and finally, model construction is carried out on the fused features by applying a deep random forest, and the trained model is subjected to transfer learning, namely after the model is stored, a decision can be made whether any signal to be tested is falsified. The method is used for tampering detection based on the ENF marking signal in the signal to be detected, extracting the phase and frequency characteristics of the ENF signal affected by tampering, performing DCA characteristic fusion on the extracted characteristic set, and training and classifying the fused characteristics by applying a deep random forest method to obtain a classification model.

The method specifically comprises the following steps:

step 1: preprocessing a signal to be detected;

step 2: extracting the characteristics of a phase spectrum and a frequency spectrum of an ENF component in the signal;

and step 3: performing feature fusion on the extracted multiple feature sets by using a DCA method;

and 4, step 4: and (3) performing model construction on the fused features by using a deep random forest, and making a decision on the signal to be detected.

Further, step 1 specifically comprises the following steps:

step 1.1: for the signal x [ n ] to be measured]Preprocessing, including down-sampling and removing DC component to obtain x_d[n]；

Step 1.2: down-sampled signal x from step 1.1_d[n]The ENF component x in the signal is obtained through a band-pass filter with the center frequency at the ENF standard frequency_ENFC[n]。

Further, step 2 specifically comprises the following steps:

step A1: for x_ENFC[n]Performing DFT-based¹Estimating the phase spectrum, and extracting a phase spectrum fluctuation characteristic F;

step A2: for x_ENFC[n]Performing Hilbert-based instantaneous frequency spectrum estimation;

step A3: respectively performing curve fitting on the phase spectrum and the frequency spectrum, and extracting the fitting characteristics of the phase spectrum

And instantaneous frequency spectrum fitting characteristics

Further, in step A1, for x_ENFC[n]Performing DFT-based¹For the phase spectrum estimation of (2), first for x_ENFC[n]The signal is subjected to a conventional N-point Discrete Fourier Transform (DFT) based on DFT⁰To obtain an estimated phase

Based on DFT¹Phase estimation at DFT⁰Based on the phase estimation, calculating x_ENFC[n]Approximate first derivative at point n:

x′_ENFC[n]＝f_d(x_ENFC[n]-x_ENFC[n-1])

combining approximate first derivative sums

Performing higher-order phase estimation, performing linear interpolation on the estimation result to obtain a phase spectrum estimation result, and extracting a phase spectrum fluctuation characteristic F;

in step A2, for x_ENFC[n]An instantaneous frequency estimation based on Hilbert transform is performed, firstly obtaining x_ENFC[n]The analytic function of (2):

x^(a) _ENFC[x]＝x_ENFC[x]+i*Η{x_ENFC[x]}，

wherein

H represents Hilbert transform; instantaneous frequency of H { x_ENFC[n]Rate of change of phase angle, estimating instantaneous frequency f [ n ] of ENF signal]For f [ n ]]Removing oscillation and boundary effects, constructing x_ENFC[n]An instantaneous frequency spectrum;

in step A3, according to x_ENFC[n]Fitting the curves of the phase spectrum and the frequency spectrum by Sum of Sines and Gaussian respectively;

sum of Sines expression form:

gaussian expression form:

wherein the expression parameters are the fitting characteristics,

further, step 3 specifically includes:

the goal of feature fusion is to combine the relevant information in two or more feature vectors into one more discriminative information than any single input feature vector, or in the case of too many feature dimensions, to reduce the feature dimension but to achieve an accuracy similar to that of a high-dimensional feature by feature fusion. And (3) performing feature fusion on the phase feature set and the frequency feature set obtained in the step (2) by applying Discriminant Correlation Analysis (DCA), performing feature fusion by maximizing pairwise correlation between the two feature sets, and limiting the correlation in the class. The transformation matrix of the feature sets is computed by maximizing the covariance matrix among the feature sets while ensuring the diagonalization of the scatter matrix within the classes.

Further, step 4 specifically includes:

step 4.1: applying a deep random forest to construct a model of the fused features;

deep random forest is a deep neural network model, which can be used for classification. The fusion characteristic part is used for training the deep random forest, the training process of the deep random forest is different from that of the traditional random forest, model parameters such as the number of layers and the like can be automatically determined according to the change of precision and the limit of the number of layers, the training is stopped when the training precision is not improved or the number of layers reaches the maximum value, and the classification result at the moment is taken as the final classification precision.

Step 4.2: and (4) after the model is stored, determining whether any signal to be detected is tampered.

The number of layers and the structural parameters of the deep random forest obtained after the training process of the deep random forest is completed form the fusion feature classification model obtained by the invention, and the classification and decision can be carried out on the fusion features of any signals to be detected.

Another object of the present invention is to provide a computer program for implementing the digital audio authentication method based on analysis of the ENF phase spectrum and the instantaneous frequency spectrum.

Another object of the present invention is to provide a digital audio signal processing system for implementing the digital audio authenticity identification method based on analysis of the ENF phase spectrum and the instantaneous frequency spectrum.

It is another object of the present invention to provide a computer-readable storage medium including instructions that, when executed on a computer, cause the computer to perform the digital audio authentication method based on analysis of an ENF phase spectrum and an instantaneous frequency spectrum.

In summary, the advantages and positive effects of the invention are：

The method analyzes a phase spectrum and an instantaneous frequency spectrum which are sensitive to signal truncation in the ENF signal, respectively extracts an effective characteristic set, and processes the extracted characteristic set;

the feature level fusion technology is used for processing feature data, so that the feature dimension is reduced, the identification gap is increased, a deep learning method is applied for model training, and the accuracy of passive tampering detection of digital audio is greatly increased;

the invention has high stability and strong robustness for complex environment recording and noisy speech.

The invention provides a broad algorithm for the accuracy and automation of the passive tampering detection of the digital audio.

The experimental data used by the method are from 500 voices (including original voices and tampered voices) in three different databases, MAT L AB is used for leading in the voice signals, ENF component consistency fluctuation characteristics are extracted through the step 1 of the method, according to the step 2, 5 sin kernels and 5 Gaussian kernels are used for fitting phase fluctuation and instantaneous frequency fluctuation, according to the step 3, the phase fluctuation characteristics and the frequency fluctuation characteristics are respectively used as a characteristic set for DCA characteristic fusion, two-dimensional fusion characteristics are obtained, labels are added to the characteristics, depth random forests are used for cross-folding cross validation of the fusion characteristics, and finally the classification accuracy reaches 99.8%.

Drawings

Fig. 1 is a flowchart of a digital audio authenticity identification method based on analysis of an ENF phase spectrum and an instantaneous frequency spectrum according to an embodiment of the present invention.

FIG. 2 is a DFT-based design provided by an embodiment of the present invention¹A phase spectrum feature extraction flow chart;

fig. 3 is a flow chart of the extraction of instantaneous frequency spectrum features based on Hilbert transform according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, the method for identifying digital audio authenticity based on analysis of an ENF phase spectrum and an instantaneous frequency spectrum provided by the present invention includes the following steps:

step 1: preprocessing a signal to be detected;

the specific implementation comprises the following substeps:

In the embodiment, the resampling frequency f of the signal is set in consideration of the balance of the frequency aliasing effect, the signal information loss and the signal-to-noise ratio of the signal (the signal-to-noise ratio of the signal can be improved by oversampling)_dSet as 1000HZ or 1200HZ (put the standard ENF frequency at ω₀＝π/10rad/sample)。

The present embodiment uses a linear zero-phase FIR filter of order 10000 for narrow-band filtering to prevent phase delay. The center frequency is at the ENF standard frequency, the bandwidth is 0.6HZ, the passband ripple is 0.5dB, and the stopband attenuation is 100 dB. The higher order filter is used in order to obtain an ideal narrow band signal. Zero padding (zero padding) refers to adding zeros at the end of the time domain signal to increase the signal length, and the use of zero padding before DFT can improve the frequency resolution and help find the peak point on the frequency spectrum more accurately.

the specific implementation comprises the following substeps:

as shown in fig. 2, first, x is aligned_ENFC[n]Subjecting the signal to conventional N-point Discrete Fourier Transform (DFT) to obtain X (k), and let k_peakInteger index, which is the maximum of | X (k) | per frame, is referred to as DFT-based⁰Phase estimation of (2):

calculating the ENF signal x_ENFC[n]At point nApproximate first derivative of (d):

x′_ENFC[n]＝f_d(x_ENFC[n]-x_ENFC[n-1]) (2)

to x'_ENFC[n]Performing DFT⁰And phase estimation, namely obtaining | X '(k) |, and multiplying | X' (k) | by a scale coefficient F (k).

Thus, DFT can be obtained⁰[k]X (k) and DFT¹[k]＝F(k)|X′(k)|。x_ENFCN estimated frequency value of

ENFC is a narrow band signal that can be written as: x is the number of_ENFC[n]＝acos(ω₀n+φ₀) Wherein ω is₀＝2πf_ENFC/f_d，φ₀Represents x_ENFCInitial phase of (a), and f_ENFCIt is ENF the actual frequency. According to mathematical calculation, the following can be obtained:

wherein

Theta represents x'_ENFCThe estimated phase of X' (k) is linearly interpolated to obtain a more accurate value. Based on DFT¹The estimated phase spectrum of the method is:

the calculation feature quantity F describes the phase fluctuation feature of the ENFC. Order to

Is the corresponding n-th_bThe estimated phase of the frame is determined,

wherein 2 is not more than n_b≤N_Block，

To represent

From n_b2 to N_BlockAverage value of (a).

as shown in FIG. 3, for signal x_ENFC[n]And carrying out discrete Hilbert transform. First of all obtain x_ENFC[n]The analytic function of (2): x is the number of^(a) _ENFC[x]＝x_ENFC[x]+i*Η{x_ENFC[x]Therein of

Η represents the Hilbert transform. Instantaneous amplitude is H { x_ENFC[n]Amplitude, instantaneous frequency is h { x_ENFC[n]Rate of change of phase angle. Estimating an instantaneous frequency f [ n ] of an ENF signal]. In the process of using Hilbert transform, f [ n ] is obtained due to numerical approximation]There is a certain parasitic oscillation that needs to be further coupled to fn]Low-pass filtering is carried out to remove oscillation. Due to the boundary effect of frequency estimation, f [ n ] is removed]2000 sampling points at head and tail, and f [ n ] obtained]I.e. an instantaneous frequency spectrum estimate of the ENFC.

And instantaneous frequency spectrum fitting characteristics

In the embodiment, different analytical expressions are respectively used for fitting the discrete data point groups according to the characteristics of the ENF phase distribution and the instantaneous frequency distribution. The criteria for selecting an analytical expression for a phase or frequency curve selection are: the expression can be used for respectively fitting the original signal curve and the edited signal curve and reflecting the difference between the original signal curve and the edited signal curve on parameters. Based on the standard, the present embodiment selects two fitting expressions of Sum of Sines and Gaussian to fit the phase curve and the frequency curve, respectively, where the parameters of the expressions are the fitting parameter characteristics.

The analytical expression Sum of Sines is suitable for fitting the phase spectrum and is of the form:

where a is the amplitude, b is the frequency, c is the phase constant of each sine wave term, n is the number of this sequence, and the range is 1. ltoreq. n.ltoreq.9. Order to

Fitting features to the phase spectrum, namely:

the analytical expression Gaussian is suitably used to fit the peak, and is of the form:

wherein a is the amplitude of the peak value, b is the position of the peak value, c is related to the side lobe of the peak value, n is the number of fitted peak values, and the value range is 1-8. Order to

Fitting features to the frequency spectrum, namely:

and (3) performing feature fusion on the phase feature set and the frequency feature set obtained in the step (2) by applying Discriminant Correlation Analysis (DCA). DCA performs efficient feature fusion by maximizing pairwise correlation between two feature sets, while eliminating inter-class correlation and limiting intra-class correlation. Meanwhile, the feature dimension can be reduced, and the difference in recognition results is reduced. DCA is a feature level fusion applying a summation method, with the advantage of reducing feature dimensions while reducing gaps in recognition results.

Suppose X ∈ R^p×nAnd Y ∈ R^q×nTwo matrices are represented, each containing n training feature vectors from different patterns. If the samples in the data matrix are collected from c independent classes. Thus n columns in the data matrix may be divided into c independent groups, where nⁱColumn belongs to i^thClass I

Let x_ij∈ X denotes a group represented by^thJ in class^thThe corresponding feature vector of the sample.

And

respectively represent x_ijAt i^thMean value in class and over the entire feature set, i.e.

The inter-class scatter matrix is defined as

Wherein

If the feature number is greater thanNumber of classifications (p)>>c) Calculating a covariance matrix

Will ratio calculation

It is easier. By pairs

Can be efficiently obtained

Therefore, only the covariance matrix of c × c dimension needs to be found

The feature vector of (2). If the classes can be well distinguished, then

Will be a diagonal matrix because

Is a symmetric semi-positive definite matrix, which can be diagonalized by the present invention through transformations:

p is a matrix of orthogonal eigenvectors,

a diagonal matrix with non-negative real eigenvalues sorted in descending order. Q_(c×r)A matrix of r eigenvectors from matrix P corresponds to the first r largest non-zero eigenvalues. Comprises the following steps:

by such mapping can be obtainedTo S_bxThe r important feature vectors: q → phi_bxQ

(Φ_bxQ)^TS_bx(Φ_bxQ)＝Λ_(r×r), (13)

W_bx＝Φ_bxQΛ^-1/2Is a kind of can unify S_bxWhile reducing the transformation of the data matrix dimension X from the p dimension to the r dimension. Namely:

x' is the projection of X in space, the interspecies scatter matrix is I, and the classes are all separable. Note that there are at most c-1 generalized eigenvalues here, so the upper limit of r is c-1, and the other upper limit of r consists of the rank of the data matrix, i.e., r ≦ min (c-1, rank (X), rank (Y)).

A method similar to that described above processes the second feature set Y and finds the transformation matrix W_byUnifying the inter-class scatter matrix S of the second modality_byWhile reducing the dimension of the data matrix Y from the q dimension to the r dimension.

Φ′_bxAnd Φ'_byThe updates of (c) are all non-square orthogonal matrices of r × c, albeit with S_b′_x＝S_b′_yI, matrix

And

are all strictly diagonal matrices

Where the elements on the diagonal are close to 1 and the elements on the off-diagonal are close to 0. This allows the center of the class to have minimal correlation before, so the classes can be well separated. It is then necessary to have features in the same feature set have only non-zero correlation with corresponding features in another feature set. To achieve this goal, the present invention requires diagonalizing the inter-class scatter matrix of the transform matrix, i.e., S'_xy＝X′Y′^T. Diagonalization S using Singular Value Decomposition (SVD)_x′_y。

Wherein X ' and Y ' are both r, S '_xy(r×r)Is not simplified. Is a diagonal matrix and the elements on the main diagonal are all non-zero values. Let W_cx＝UΣ^-1/2，W_cy＝VΣ^-1/2The method comprises the following steps:

(UΣ^-1/2)^TS′_xy(VΣ^-1/2)＝I， (19)

it is connected with a covariance matrix S 'between feature sets'_xy. The feature set is then transformed:

wherein

The final transformation matrices for X and Y, respectively. It is readily demonstrated that the inter-class scatter matrix of the transformed feature set is still diagonal and, therefore, can be separated between classes.

The inter-class scatter matrix of (1) is:

in formula (14) is known

And U is an orthogonal matrix having:

here too, it can be shown

Is a diagonal matrix. Deriving a set of transformed features

The covariance between the representative features is a principal diagonal strictly symmetric matrix, indicating that the correlation between different features in a single feature set is minimal. Set of transformation features

Representing the covariance between the samples, is a block diagonal matrix, indicating that the samples have a higher correlation with the samples in the same class.

firstly, the invention needs to scan the data in multiple granularities to enlarge the data volume of the sample and sample the data through a sliding window. With a window size of 100 and a step size of 1, a set of 301 samples with a characteristic number of 100 are obtained after sampling, but all the samples are originally one sample, so that the number of the samples is expanded. Training is then performed using one random forest and one fully random forest. The generation of the decision tree in the completely random forest is completed step by randomly selecting an attribute as a division attribute without calculating the kini index or entropy gain. Assuming that three classifications are needed in the invention, 301 sets of feature information with three dimensions are respectively generated after one random forest and one completely random forest, and 1806-dimensional data are generated after combination. In the generation and test processes of the two random forests and the completely random forest, a k-fold cross validation mode is used for prediction, firstly, a k-1 group is used, and then 300 groups of data are used for training the random forest, the test is carried out in the other group of data distribution areas k-1, then the test set is averaged, the output of the random forest is obtained, each group of data is tested once, and k groups of output can still be obtained after the cycle is carried out for k times. Of course, different serial port sizes and different step sizes can be set when the sliding window is used for feature extraction, and then the serial ports are combined together after passing through the random forest and the completely random forest.

In the cascading forest, the output (3 x 4-12 dimensional data) of two complete random forests and two common random forests and the original data (3618 dimensional data output after multi-granularity scanning) are connected in series to be used as the input (12+ 3618-3630 dimensional data) of the next layer, because the output of the previous layer is connected in series each time, the input of each layer has 3630 dimensional data, and therefore, the parameter of the random forest is corrected, therefore, the layer number of the deep random forest is not set by the invention, the invention is determined according to the change of precision and the limit of the layer number, when the training precision is not improved or the layer number reaches the maximum value, the training is stopped, and the classification result at the moment is used as the final classification precision.

Step 4.2: after the model is stored, a decision can be made whether any signal to be detected is tampered.

The invention is further described below in conjunction with specific examples/experiments/simulation analyses.

The computer instructions may be stored on or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g., from one website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DS L) or wireless (e.g., infrared, wireless, microwave, etc.) means to another website site, computer, server, or data center via a solid state storage medium, such as a solid state Disk, or the like, (e.g., a solid state Disk, a magnetic storage medium, such as a DVD, a SSD, etc.), or any combination thereof.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A digital audio true-false identification method based on analysis of an ENF phase spectrum and an instantaneous frequency spectrum is characterized in that the digital audio true-false identification method based on analysis of the ENF phase spectrum and the instantaneous frequency spectrum comprises the following steps:

firstly, preprocessing a signal to be detected, including down sampling and narrow-band filtering, to obtain a narrow-band signal with the power grid frequency ENF standard frequency as the center; then, extracting the characteristics of the ENF signal, analyzing the phase spectrum and the instantaneous frequency spectrum of the ENF signal, and extracting the phase spectrum fluctuation characteristics, the phase spectrum and the frequency spectrum fitting parameter characteristics of the ENF signal;

feature fusion is carried out by a discriminant correlation analysis DCA method, the correlation among different feature sets is maximized, the correlation among classes is eliminated, and the correlation in the classes is limited;

finally, model construction is carried out on the fused features by applying a deep random forest, and the trained model is subjected to transfer learning; after the model is stored, making a decision on whether any signal to be detected is tampered;

the digital audio authenticity identification method based on analysis of the ENF phase spectrum and the instantaneous frequency spectrum specifically comprises the following steps:

step 1: preprocessing a signal to be detected;

and 4, step 4: and performing model construction on the fused features by using a deep random forest, and making a decision on the signal to be measured.

2. The method of claim 1, wherein the digital audio authentication method based on analysis of the ENF phase spectrum and the instantaneous frequency spectrum,

the method comprises the following steps:

3. The method for authenticating digital audio according to claim 1, wherein the method for authenticating digital audio based on analysis of the ENF phase spectrum and the instantaneous frequency spectrum comprises the following steps:

And instantaneous frequency spectrum fitting characteristics

4. The method for authenticating digital audio based on analysis of ENF phase spectrum and instantaneous frequency spectrum as claimed in claim 2, wherein in step a1, x is subjected to_ENFC[n]Performing DFT-based¹For the phase spectrum estimation of (2), first for x_ENFC[n]The signal is subjected to a conventional N-point Discrete Fourier Transform (DFT) based on DFT⁰To obtain an estimated phase

x′_ENFC[n]＝f_d(x_ENFC[n]-x_ENFC[n-1])

combining approximate first derivative sums

x^(a) _ENFC[x]＝x_ENFC[x]+i*H{x_ENFC[x]}，

wherein

sum of Sines expression form:

gaussian expression form:

wherein the expression parameters are the fitting characteristics,

5. the method for authenticating digital audio according to claim 1, wherein the method for authenticating digital audio based on analysis of the ENF phase spectrum and the instantaneous frequency spectrum comprises the following steps:

performing feature fusion on the phase feature set and the frequency feature set obtained in the step 2 by applying Discriminant Correlation Analysis (DCA), performing feature fusion by maximizing pairwise correlation between the two feature sets by the DCA, and limiting the correlation in the class; the transformation matrix of the feature set is calculated by maximizing the covariance matrix among the feature sets, and the diagonalization of the intra-class scatter matrix is performed at the same time.

6. The method for authenticating digital audio based on analysis of ENF phase spectrum and instantaneous frequency spectrum as claimed in claim 1, wherein the step 4 comprises:

step 4.1: and (3) applying a deep random forest to construct a model of the fused features: the fusion characteristic part is used for training a deep random forest, in the training process of the deep random forest, the number of layers model parameters are automatically determined according to the change of precision and the number of layers limitation, the training is stopped when the training precision is not improved or the number of layers reaches the maximum value, and the classification result is used as the final classification precision;

step 4.2: and (3) after the model is stored, deciding whether any signal to be detected is tampered: and constructing a fusion feature classification model by using the layer number and the structure parameters of the deep random forest obtained after the training process of the deep random forest is completed, and classifying and deciding the fusion features of any signal to be detected.

7. A computer program for implementing the digital audio authenticity identification method based on analysis of ENF phase spectrum and instantaneous frequency spectrum according to any one of claims 1 to 6.

8. A digital audio signal processing system for implementing the digital audio authenticity identification method based on the analysis of the ENF phase spectrum and the instantaneous frequency spectrum according to any one of claims 1 to 6.

9. A computer-readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the digital audio authenticity identification method based on analysis of ENF phase spectrum and instantaneous frequency spectrum according to any of claims 1 to 6.