CN108806718B - Audio identification method based on analysis of ENF phase spectrum and instantaneous frequency spectrum - Google Patents

Audio identification method based on analysis of ENF phase spectrum and instantaneous frequency spectrum Download PDF

Info

Publication number
CN108806718B
CN108806718B CN201810585686.3A CN201810585686A CN108806718B CN 108806718 B CN108806718 B CN 108806718B CN 201810585686 A CN201810585686 A CN 201810585686A CN 108806718 B CN108806718 B CN 108806718B
Authority
CN
China
Prior art keywords
enf
signal
spectrum
phase
enfc
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810585686.3A
Other languages
Chinese (zh)
Other versions
CN108806718A (en
Inventor
王志锋
王静
左明章
叶俊民
闵秋莎
田元
夏丹
陈迪
罗恒
宁国勤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central China Normal University
Original Assignee
Central China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central China Normal University filed Critical Central China Normal University
Priority to CN201810585686.3A priority Critical patent/CN108806718B/en
Publication of CN108806718A publication Critical patent/CN108806718A/en
Application granted granted Critical
Publication of CN108806718B publication Critical patent/CN108806718B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique

Abstract

The invention belongs to the technical field of digital audio signal processing, and discloses an audio identification method based on analysis of an ENF phase spectrum and an instantaneous frequency spectrum, which comprises the steps of preprocessing a signal to be detected, extracting characteristics of the ENF signal, analyzing the phase spectrum and the instantaneous frequency spectrum of the ENF signal, and extracting the fluctuation characteristics, the phase spectrum and the frequency spectrum fitting parameter characteristics of the ENF signal; performing feature fusion by a Discriminant Correlation Analysis (DCA) method to maximize the correlation among different feature sets; and finally, performing model construction on the fused features by using a deep random forest, and performing transfer learning on the trained model. The invention uses the characteristic level fusion technology to process the characteristic data, reduces the characteristic dimension and improves the identification gap, and uses the deep learning method to train the model, thereby greatly improving the accuracy of the passive tampering detection of the digital audio.

Description

Audio identification method based on analysis of ENF phase spectrum and instantaneous frequency spectrum
Technical Field
The invention belongs to the technical field of digital audio signal processing, and particularly relates to an audio identification method based on analysis of an ENF phase spectrum and an instantaneous frequency spectrum.
Background
Currently, the current state of the art commonly used in the industry is such that:
with the development of computer and internet related technologies, people rely more on the use of digital multimedia data. The advantage that the digital multimedia data is easy to store, edit and transmit brings convenience and fun to daily life of people. For example, people can simply and quickly use audio editing software to splice, add noise, change and other operations on digital audio files without any professional knowledge, which is a popular entertainment mode in the internet era. However, the technology is developed into a double-edged sword, and some lawbreakers are allowed to have an opportunity to multiply. Lawbreakers can maliciously tamper with digital audio and spread it widely, and are only imperceptible by the senses. If such digital audio files are applied to court recording testimony, false news dissemination and other occasions, serious consequences may be caused, and the legal justice and social trust order are damaged. Therefore, it is important to guarantee the authenticity and integrity of the digital audio and to perform tamper detection on the digital audio. Digital audio tampering detection is an important branch of digital audio forensics and is widely applied in the fields of judicial forensics, news justice, scientific discovery and the like.
Among the current digital audio tampering detection methods, the most effective method is a detection method based on the consistency of the power grid frequency, which has become almost a common standard for digital audio identification in the last decade, and has received attention from academic researchers and law enforcement agencies worldwide. The principle is that if the recording device records audio downwards when the recording device is connected to a power grid, the audio signal necessarily carries information of power grid Frequency (ENF). This not only makes ENF a watermark signal that is naturally embedded in the audio signal, but also can be used as a time stamp. ENF components (ENFC) embedded in an audio file may be extracted by band pass filtering. The digital audio tampering detection by utilizing the stability and uniqueness of the ENFC generally has two research ideas, the first is to compare the extracted ENFC with data in a power grid frequency database of a power supply department, determine whether audio recording time is consistent with the declared audio recording time, establish and store an ENF signal database in a large range with high difficulty and high cost, and no ENF database with high practical value exists at present. Grigoras first established an ENF reference database locally in romania. Liuyuming and the like analyze the North America power grid detection system and provide a method for establishing standard power grid frequency; and secondly, extracting certain characteristics in the ENF signal and carrying out consistency or regularity analysis. Grigoras originally proposed an ENF-based audio tampering detection algorithm, which mainly compares the fluctuation of ENF in the audio to be detected with the data of a reference year, so as to judge whether the audio is tampered. The Grigoras validation then analyzes the audio signal with a short time window, allowing for a more detailed and accurate comparison with a database. On the basis of Grigoras research, Rodrai guez and the like propose a method without using an ENF standard database, detect audio tampering by taking the consistency of ENF phase change as a characteristic, and select a boundary value to perform classification decision on the characteristic. Hover Jian et al, on the basis of Rodr i guez, constructs a new characteristic quantity to detect the discontinuity of the ENF phase using an ideal sinusoidal signal as a reference signal. Hoyongjia et al then improved the above method, proposed a method to directly calculate the maximum offset of ENF without additional reference signals, and in addition utilized multi-feature combinations to pinpoint the tampered area. Esquef and the like propose a TPSW (Two-Pass spread-Window) method to estimate the ENF background change level according to the transient frequency mutation of the tamper point ENF caused by the tampering operation, and a peak point of the actual transient frequency change exceeding the background change level is called the tamper point.
In summary, the problems of the prior art are as follows:
the research of digital audio passive tampering detection based on ENF has some problems:
1) an unapproved ENF comparison database. Comparing an ENF component in a signal to be detected with an ENF database to judge whether the voice signal is not reliable or not after being tampered;
2) most methods do not extract key characteristic data in the voice signal, and can directly make a decision on whether the voice signal is tampered or not;
3) ignoring the correlation among the feature sets, and not further processing the extracted original feature data;
4) most of the existing methods have low automation degree, poor effect and poor adaptability to different database signals.
The difficulty and significance for solving the technical problems are as follows:
an authoritative ENF comparison database is established, so that the cost is high, the management is difficult, and the significance of actual operation is not great; extracting key feature data in a speech signal to directly make a decision on whether the data is tampered or not is a problem that researchers want to overcome.
The method selects a phase spectrum and an instantaneous frequency spectrum which are sensitive to signal truncation in an ENF component of a signal as characteristics to carry out tamper detection; the invention uses the voice signals of three databases to carry out experiments, and uses a deep learning method to carry out model construction by deep random forest, thereby ensuring that the adaptivity and the automation degree of the scheme can be applied to the actual situation.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides an audio identification method based on analysis of an ENF phase spectrum and an instantaneous frequency spectrum. The invention extracts the phase and frequency characteristics by extracting the ENFC in the voice signal and analyzing the phase spectrum and the frequency spectrum of the ENFC. And performing feature fusion on the phase spectrum features and the frequency spectrum features by using a DCA method, and performing model construction on the fusion features by using a deep random forest, so that the obtained model can make a decision on whether any signal to be detected is falsified, and automatic detection of voice signal insertion and deletion operations is realized. According to the method, the representative phase and instantaneous frequency characteristics in the ENF component are fused, and the model is trained by using a deep learning method, so that the model capable of being automatically detected is obtained, the detection efficiency is improved, and the automation of digital audio tampering detection is realized.
The invention is realized in such a way that a digital audio true-false identification method based on analysis of an ENF phase spectrum and an instantaneous frequency spectrum comprises the following steps: firstly, preprocessing a signal to be detected, including down-sampling and narrow-band filtering, to obtain a narrow-band signal with the standard Frequency of power grid Frequency (ENF) as the center; then, extracting the characteristics of the ENF signal, analyzing the phase spectrum and the instantaneous frequency spectrum of the ENF signal, and extracting the phase spectrum fluctuation characteristics, the phase spectrum and the frequency spectrum fitting parameter characteristics of the ENF signal; feature fusion is carried out through a Discriminant Correlation Analysis (DCA) method, the correlation among different feature sets is maximized, the correlation among classes is eliminated, and the correlation in the classes is limited; and finally, model construction is carried out on the fused features by applying a deep random forest, and the trained model is subjected to transfer learning, namely after the model is stored, a decision can be made whether any signal to be tested is falsified. The method is used for tampering detection based on the ENF marking signal in the signal to be detected, extracting the phase and frequency characteristics of the ENF signal affected by tampering, performing DCA characteristic fusion on the extracted characteristic set, and training and classifying the fused characteristics by applying a deep random forest method to obtain a classification model.
The method specifically comprises the following steps:
step 1: preprocessing a signal to be detected;
step 2: extracting the characteristics of a phase spectrum and a frequency spectrum of an ENF component in the signal;
and step 3: performing feature fusion on the extracted multiple feature sets by using a DCA method;
and 4, step 4: and (3) performing model construction on the fused features by using a deep random forest, and making a decision on the signal to be detected.
Further, step 1 specifically comprises the following steps:
step 1.1: for the signal x [ n ] to be measured]Preprocessing, including down-sampling and removing DC component to obtain xd[n];
Step 1.2: down-sampled signal x from step 1.1d[n]The ENF component x in the signal is obtained through a band-pass filter with the center frequency at the ENF standard frequencyENFC[n]。
Further, step 2 specifically comprises the following steps:
step A1: for xENFC[n]Performing DFT-based1Estimating the phase spectrum, and extracting a phase spectrum fluctuation characteristic F;
step A2: for xENFC[n]Performing Hilbert-based instantaneous frequency spectrum estimation;
step A3: respectively performing curve fitting on the phase spectrum and the frequency spectrum, and extracting the fitting characteristics of the phase spectrum
Figure BDA0001686169760000042
And instantaneous frequency spectrum fitting characteristics
Figure BDA0001686169760000041
Further, in step A1, for xENFC[n]Performing DFT-based1For the phase spectrum estimation of (2), first for xENFC[n]The signal is subjected to a conventional N-point Discrete Fourier Transform (DFT) based on DFT0To obtain an estimated phase
Figure BDA0001686169760000057
Based on DFT1Phase estimation at DFT0Based on the phase estimation, calculating xENFC[n]Approximate first derivative at point n:
x′ENFC[n]=fd(xENFC[n]-xENFC[n-1])
combining approximate first derivative sums
Figure BDA0001686169760000051
Performing higher-order phase estimation, performing linear interpolation on the estimation result to obtain a phase spectrum estimation result, and extracting a phase spectrum fluctuation characteristic F;
in step A2, for xENFC[n]An instantaneous frequency estimation based on Hilbert transform is performed, firstly obtaining xENFC[n]The analytic function of (2):
x(a) ENFC[x]=xENFC[x]+i*Η{xENFC[x]},
wherein
Figure BDA0001686169760000052
H represents Hilbert transform; instantaneous frequency of H { xENFC[n]Rate of change of phase angle, estimating instantaneous frequency f [ n ] of ENF signal]For f [ n ]]Removing oscillation and boundary effects, constructing xENFC[n]An instantaneous frequency spectrum;
in step A3, according to xENFC[n]Fitting the curves of the phase spectrum and the frequency spectrum by Sum of Sines and Gaussian respectively;
sum of Sines expression form:
Figure BDA0001686169760000053
gaussian expression form:
Figure BDA0001686169760000054
wherein the expression parameters are the fitting characteristics,
Figure BDA0001686169760000055
Figure BDA0001686169760000056
further, step 3 specifically includes:
the goal of feature fusion is to combine the relevant information in two or more feature vectors into one more discriminative information than any single input feature vector, or in the case of too many feature dimensions, to reduce the feature dimension but to achieve an accuracy similar to that of a high-dimensional feature by feature fusion. And (3) performing feature fusion on the phase feature set and the frequency feature set obtained in the step (2) by applying Discriminant Correlation Analysis (DCA), performing feature fusion by maximizing pairwise correlation between the two feature sets, and limiting the correlation in the class. The transformation matrix of the feature sets is computed by maximizing the covariance matrix among the feature sets while ensuring the diagonalization of the scatter matrix within the classes.
Further, step 4 specifically includes:
step 4.1: applying a deep random forest to construct a model of the fused features;
deep random forest is a deep neural network model, which can be used for classification. The fusion characteristic part is used for training the deep random forest, the training process of the deep random forest is different from that of the traditional random forest, model parameters such as the number of layers and the like can be automatically determined according to the change of precision and the limit of the number of layers, the training is stopped when the training precision is not improved or the number of layers reaches the maximum value, and the classification result at the moment is taken as the final classification precision.
Step 4.2: and (4) after the model is stored, determining whether any signal to be detected is tampered.
The number of layers and the structural parameters of the deep random forest obtained after the training process of the deep random forest is completed form the fusion feature classification model obtained by the invention, and the classification and decision can be carried out on the fusion features of any signals to be detected.
Another object of the present invention is to provide a computer program for implementing the digital audio authentication method based on analysis of the ENF phase spectrum and the instantaneous frequency spectrum.
Another object of the present invention is to provide a digital audio signal processing system for implementing the digital audio authenticity identification method based on analysis of the ENF phase spectrum and the instantaneous frequency spectrum.
It is another object of the present invention to provide a computer-readable storage medium including instructions that, when executed on a computer, cause the computer to perform the digital audio authentication method based on analysis of an ENF phase spectrum and an instantaneous frequency spectrum.
In summary, the advantages and positive effects of the invention are
The method analyzes a phase spectrum and an instantaneous frequency spectrum which are sensitive to signal truncation in the ENF signal, respectively extracts an effective characteristic set, and processes the extracted characteristic set;
the feature level fusion technology is used for processing feature data, so that the feature dimension is reduced, the identification gap is increased, a deep learning method is applied for model training, and the accuracy of passive tampering detection of digital audio is greatly increased;
the invention has high stability and strong robustness for complex environment recording and noisy speech.
The invention provides a broad algorithm for the accuracy and automation of the passive tampering detection of the digital audio.
The experimental data used by the method are from 500 voices (including original voices and tampered voices) in three different databases, MAT L AB is used for leading in the voice signals, ENF component consistency fluctuation characteristics are extracted through the step 1 of the method, according to the step 2, 5 sin kernels and 5 Gaussian kernels are used for fitting phase fluctuation and instantaneous frequency fluctuation, according to the step 3, the phase fluctuation characteristics and the frequency fluctuation characteristics are respectively used as a characteristic set for DCA characteristic fusion, two-dimensional fusion characteristics are obtained, labels are added to the characteristics, depth random forests are used for cross-folding cross validation of the fusion characteristics, and finally the classification accuracy reaches 99.8%.
Drawings
Fig. 1 is a flowchart of a digital audio authenticity identification method based on analysis of an ENF phase spectrum and an instantaneous frequency spectrum according to an embodiment of the present invention.
FIG. 2 is a DFT-based design provided by an embodiment of the present invention1A phase spectrum feature extraction flow chart;
fig. 3 is a flow chart of the extraction of instantaneous frequency spectrum features based on Hilbert transform according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, the method for identifying digital audio authenticity based on analysis of an ENF phase spectrum and an instantaneous frequency spectrum provided by the present invention includes the following steps:
step 1: preprocessing a signal to be detected;
the specific implementation comprises the following substeps:
step 1.1: for the signal x [ n ] to be measured]Preprocessing, including down-sampling and removing DC component to obtain xd[n];
In the embodiment, the resampling frequency f of the signal is set in consideration of the balance of the frequency aliasing effect, the signal information loss and the signal-to-noise ratio of the signal (the signal-to-noise ratio of the signal can be improved by oversampling)dSet as 1000HZ or 1200HZ (put the standard ENF frequency at ω0=π/10rad/sample)。
Step 1.2: down-sampled signal x from step 1.1d[n]The ENF component x in the signal is obtained through a band-pass filter with the center frequency at the ENF standard frequencyENFC[n]。
The present embodiment uses a linear zero-phase FIR filter of order 10000 for narrow-band filtering to prevent phase delay. The center frequency is at the ENF standard frequency, the bandwidth is 0.6HZ, the passband ripple is 0.5dB, and the stopband attenuation is 100 dB. The higher order filter is used in order to obtain an ideal narrow band signal. Zero padding (zero padding) refers to adding zeros at the end of the time domain signal to increase the signal length, and the use of zero padding before DFT can improve the frequency resolution and help find the peak point on the frequency spectrum more accurately.
Step 2: extracting the characteristics of a phase spectrum and a frequency spectrum of an ENF component in the signal;
the specific implementation comprises the following substeps:
step A1: for xENFC[n]Performing DFT-based1Estimating the phase spectrum, and extracting a phase spectrum fluctuation characteristic F;
as shown in fig. 2, first, x is alignedENFC[n]Subjecting the signal to conventional N-point Discrete Fourier Transform (DFT) to obtain X (k), and let kpeakInteger index, which is the maximum of | X (k) | per frame, is referred to as DFT-based0Phase estimation of (2):
Figure BDA0001686169760000081
calculating the ENF signal xENFC[n]At point nApproximate first derivative of (d):
x′ENFC[n]=fd(xENFC[n]-xENFC[n-1]) (2)
to x'ENFC[n]Performing DFT0And phase estimation, namely obtaining | X '(k) |, and multiplying | X' (k) | by a scale coefficient F (k).
Figure BDA0001686169760000082
Thus, DFT can be obtained0[k]X (k) and DFT1[k]=F(k)|X′(k)|。xENFCN estimated frequency value of
Figure BDA0001686169760000083
ENFC is a narrow band signal that can be written as: x is the number ofENFC[n]=acos(ω0n+φ0) Wherein ω is0=2πfENFC/fd,φ0Represents xENFCInitial phase of (a), and fENFCIt is ENF the actual frequency. According to mathematical calculation, the following can be obtained:
Figure BDA0001686169760000091
wherein
Figure BDA0001686169760000092
Theta represents x'ENFCThe estimated phase of X' (k) is linearly interpolated to obtain a more accurate value. Based on DFT1The estimated phase spectrum of the method is:
Figure BDA0001686169760000093
the calculation feature quantity F describes the phase fluctuation feature of the ENFC. Order to
Figure BDA0001686169760000094
Is the corresponding n-thbThe estimated phase of the frame is determined,
Figure BDA0001686169760000095
wherein 2 is not more than nb≤NBlock
Figure BDA0001686169760000096
To represent
Figure BDA0001686169760000097
From nb2 to NBlockAverage value of (a).
Figure BDA0001686169760000098
Step A2: for xENFC[n]Performing Hilbert-based instantaneous frequency spectrum estimation;
as shown in FIG. 3, for signal xENFC[n]And carrying out discrete Hilbert transform. First of all obtain xENFC[n]The analytic function of (2): x is the number of(a) ENFC[x]=xENFC[x]+i*Η{xENFC[x]Therein of
Figure BDA0001686169760000099
Η represents the Hilbert transform. Instantaneous amplitude is H { xENFC[n]Amplitude, instantaneous frequency is h { xENFC[n]Rate of change of phase angle. Estimating an instantaneous frequency f [ n ] of an ENF signal]. In the process of using Hilbert transform, f [ n ] is obtained due to numerical approximation]There is a certain parasitic oscillation that needs to be further coupled to fn]Low-pass filtering is carried out to remove oscillation. Due to the boundary effect of frequency estimation, f [ n ] is removed]2000 sampling points at head and tail, and f [ n ] obtained]I.e. an instantaneous frequency spectrum estimate of the ENFC.
Step A3: respectively performing curve fitting on the phase spectrum and the frequency spectrum, and extracting the fitting characteristics of the phase spectrum
Figure BDA00016861697600000910
And instantaneous frequency spectrum fitting characteristics
Figure BDA00016861697600000911
In the embodiment, different analytical expressions are respectively used for fitting the discrete data point groups according to the characteristics of the ENF phase distribution and the instantaneous frequency distribution. The criteria for selecting an analytical expression for a phase or frequency curve selection are: the expression can be used for respectively fitting the original signal curve and the edited signal curve and reflecting the difference between the original signal curve and the edited signal curve on parameters. Based on the standard, the present embodiment selects two fitting expressions of Sum of Sines and Gaussian to fit the phase curve and the frequency curve, respectively, where the parameters of the expressions are the fitting parameter characteristics.
The analytical expression Sum of Sines is suitable for fitting the phase spectrum and is of the form:
Figure BDA0001686169760000101
where a is the amplitude, b is the frequency, c is the phase constant of each sine wave term, n is the number of this sequence, and the range is 1. ltoreq. n.ltoreq.9. Order to
Figure BDA0001686169760000102
Fitting features to the phase spectrum, namely:
Figure BDA0001686169760000103
the analytical expression Gaussian is suitably used to fit the peak, and is of the form:
Figure BDA0001686169760000104
wherein a is the amplitude of the peak value, b is the position of the peak value, c is related to the side lobe of the peak value, n is the number of fitted peak values, and the value range is 1-8. Order to
Figure BDA0001686169760000105
Fitting features to the frequency spectrum, namely:
Figure BDA0001686169760000106
and step 3: performing feature fusion on the extracted multiple feature sets by using a DCA method;
and (3) performing feature fusion on the phase feature set and the frequency feature set obtained in the step (2) by applying Discriminant Correlation Analysis (DCA). DCA performs efficient feature fusion by maximizing pairwise correlation between two feature sets, while eliminating inter-class correlation and limiting intra-class correlation. Meanwhile, the feature dimension can be reduced, and the difference in recognition results is reduced. DCA is a feature level fusion applying a summation method, with the advantage of reducing feature dimensions while reducing gaps in recognition results.
Suppose X ∈ Rp×nAnd Y ∈ Rq×nTwo matrices are represented, each containing n training feature vectors from different patterns. If the samples in the data matrix are collected from c independent classes. Thus n columns in the data matrix may be divided into c independent groups, where niColumn belongs to ithClass I
Figure BDA0001686169760000107
Let xij∈ X denotes a group represented bythJ in classthThe corresponding feature vector of the sample.
Figure BDA0001686169760000108
And
Figure BDA0001686169760000109
respectively represent xijAt ithMean value in class and over the entire feature set, i.e.
Figure BDA00016861697600001010
The inter-class scatter matrix is defined as
Figure BDA00016861697600001011
Wherein
Figure BDA0001686169760000111
If the feature number is greater thanNumber of classifications (p)>>c) Calculating a covariance matrix
Figure BDA0001686169760000112
Will ratio calculation
Figure BDA0001686169760000113
It is easier. By pairs
Figure BDA0001686169760000114
Can be efficiently obtained
Figure BDA0001686169760000115
Therefore, only the covariance matrix of c × c dimension needs to be found
Figure BDA0001686169760000116
The feature vector of (2). If the classes can be well distinguished, then
Figure BDA0001686169760000117
Will be a diagonal matrix because
Figure BDA0001686169760000118
Is a symmetric semi-positive definite matrix, which can be diagonalized by the present invention through transformations:
Figure BDA0001686169760000119
p is a matrix of orthogonal eigenvectors,
Figure BDA00016861697600001110
a diagonal matrix with non-negative real eigenvalues sorted in descending order. Q(c×r)A matrix of r eigenvectors from matrix P corresponds to the first r largest non-zero eigenvalues. Comprises the following steps:
Figure BDA00016861697600001111
by such mapping can be obtainedTo SbxThe r important feature vectors: q → phibxQ
bxQ)TSbxbxQ)=Λ(r×r), (13)
Wbx=Φbx-1/2Is a kind of can unify SbxWhile reducing the transformation of the data matrix dimension X from the p dimension to the r dimension. Namely:
Figure BDA00016861697600001112
Figure BDA00016861697600001113
x' is the projection of X in space, the interspecies scatter matrix is I, and the classes are all separable. Note that there are at most c-1 generalized eigenvalues here, so the upper limit of r is c-1, and the other upper limit of r consists of the rank of the data matrix, i.e., r ≦ min (c-1, rank (X), rank (Y)).
A method similar to that described above processes the second feature set Y and finds the transformation matrix WbyUnifying the inter-class scatter matrix S of the second modalitybyWhile reducing the dimension of the data matrix Y from the q dimension to the r dimension.
Figure BDA00016861697600001114
Figure BDA0001686169760000121
Φ′bxAnd Φ'byThe updates of (c) are all non-square orthogonal matrices of r × c, albeit with Sbx=SbyI, matrix
Figure BDA0001686169760000122
And
Figure BDA0001686169760000123
are all strictly diagonal matrices
Figure BDA0001686169760000124
Where the elements on the diagonal are close to 1 and the elements on the off-diagonal are close to 0. This allows the center of the class to have minimal correlation before, so the classes can be well separated. It is then necessary to have features in the same feature set have only non-zero correlation with corresponding features in another feature set. To achieve this goal, the present invention requires diagonalizing the inter-class scatter matrix of the transform matrix, i.e., S'xy=X′Y′T. Diagonalization S using Singular Value Decomposition (SVD)xy
Figure BDA0001686169760000125
Wherein X ' and Y ' are both r, S 'xy(r×r)Is not simplified. Is a diagonal matrix and the elements on the main diagonal are all non-zero values. Let Wcx=UΣ-1/2,Wcy=VΣ-1/2The method comprises the following steps:
(UΣ-1/2)TS′xy(VΣ-1/2)=I, (19)
it is connected with a covariance matrix S 'between feature sets'xy. The feature set is then transformed:
Figure BDA0001686169760000126
Figure BDA0001686169760000127
wherein
Figure BDA0001686169760000128
The final transformation matrices for X and Y, respectively. It is readily demonstrated that the inter-class scatter matrix of the transformed feature set is still diagonal and, therefore, can be separated between classes.
Figure BDA0001686169760000129
The inter-class scatter matrix of (1) is:
Figure BDA00016861697600001210
in formula (14) is known
Figure BDA00016861697600001211
And U is an orthogonal matrix having:
Figure BDA00016861697600001212
here too, it can be shown
Figure BDA00016861697600001213
Is a diagonal matrix. Deriving a set of transformed features
Figure BDA00016861697600001214
The covariance between the representative features is a principal diagonal strictly symmetric matrix, indicating that the correlation between different features in a single feature set is minimal. Set of transformation features
Figure BDA0001686169760000131
Representing the covariance between the samples, is a block diagonal matrix, indicating that the samples have a higher correlation with the samples in the same class.
And 4, step 4: and (3) performing model construction on the fused features by using a deep random forest, and making a decision on the signal to be detected.
Step 4.1: applying a deep random forest to construct a model of the fused features;
firstly, the invention needs to scan the data in multiple granularities to enlarge the data volume of the sample and sample the data through a sliding window. With a window size of 100 and a step size of 1, a set of 301 samples with a characteristic number of 100 are obtained after sampling, but all the samples are originally one sample, so that the number of the samples is expanded. Training is then performed using one random forest and one fully random forest. The generation of the decision tree in the completely random forest is completed step by randomly selecting an attribute as a division attribute without calculating the kini index or entropy gain. Assuming that three classifications are needed in the invention, 301 sets of feature information with three dimensions are respectively generated after one random forest and one completely random forest, and 1806-dimensional data are generated after combination. In the generation and test processes of the two random forests and the completely random forest, a k-fold cross validation mode is used for prediction, firstly, a k-1 group is used, and then 300 groups of data are used for training the random forest, the test is carried out in the other group of data distribution areas k-1, then the test set is averaged, the output of the random forest is obtained, each group of data is tested once, and k groups of output can still be obtained after the cycle is carried out for k times. Of course, different serial port sizes and different step sizes can be set when the sliding window is used for feature extraction, and then the serial ports are combined together after passing through the random forest and the completely random forest.
In the cascading forest, the output (3 x 4-12 dimensional data) of two complete random forests and two common random forests and the original data (3618 dimensional data output after multi-granularity scanning) are connected in series to be used as the input (12+ 3618-3630 dimensional data) of the next layer, because the output of the previous layer is connected in series each time, the input of each layer has 3630 dimensional data, and therefore, the parameter of the random forest is corrected, therefore, the layer number of the deep random forest is not set by the invention, the invention is determined according to the change of precision and the limit of the layer number, when the training precision is not improved or the layer number reaches the maximum value, the training is stopped, and the classification result at the moment is used as the final classification precision.
Step 4.2: after the model is stored, a decision can be made whether any signal to be detected is tampered.
The number of layers and the structural parameters of the deep random forest obtained after the training process of the deep random forest is completed form the fusion feature classification model obtained by the invention, and the classification and decision can be carried out on the fusion features of any signals to be detected.
The invention is further described below in conjunction with specific examples/experiments/simulation analyses.
The experimental data used by the method are from 500 voices (including original voices and tampered voices) in three different databases, MAT L AB is used for leading in the voice signals, ENF component consistency fluctuation characteristics are extracted through the step 1 of the method, according to the step 2, 5 sin kernels and 5 Gaussian kernels are used for fitting phase fluctuation and instantaneous frequency fluctuation, according to the step 3, the phase fluctuation characteristics and the frequency fluctuation characteristics are respectively used as a characteristic set for DCA characteristic fusion, two-dimensional fusion characteristics are obtained, labels are added to the characteristics, depth random forests are used for cross-folding cross validation of the fusion characteristics, and finally the classification accuracy reaches 99.8%.
The computer instructions may be stored on or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g., from one website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DS L) or wireless (e.g., infrared, wireless, microwave, etc.) means to another website site, computer, server, or data center via a solid state storage medium, such as a solid state Disk, or the like, (e.g., a solid state Disk, a magnetic storage medium, such as a DVD, a SSD, etc.), or any combination thereof.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (9)

1. A digital audio true-false identification method based on analysis of an ENF phase spectrum and an instantaneous frequency spectrum is characterized in that the digital audio true-false identification method based on analysis of the ENF phase spectrum and the instantaneous frequency spectrum comprises the following steps:
firstly, preprocessing a signal to be detected, including down sampling and narrow-band filtering, to obtain a narrow-band signal with the power grid frequency ENF standard frequency as the center; then, extracting the characteristics of the ENF signal, analyzing the phase spectrum and the instantaneous frequency spectrum of the ENF signal, and extracting the phase spectrum fluctuation characteristics, the phase spectrum and the frequency spectrum fitting parameter characteristics of the ENF signal;
feature fusion is carried out by a discriminant correlation analysis DCA method, the correlation among different feature sets is maximized, the correlation among classes is eliminated, and the correlation in the classes is limited;
finally, model construction is carried out on the fused features by applying a deep random forest, and the trained model is subjected to transfer learning; after the model is stored, making a decision on whether any signal to be detected is tampered;
the digital audio authenticity identification method based on analysis of the ENF phase spectrum and the instantaneous frequency spectrum specifically comprises the following steps:
step 1: preprocessing a signal to be detected;
step 2: extracting the characteristics of a phase spectrum and a frequency spectrum of an ENF component in the signal;
and step 3: performing feature fusion on the extracted multiple feature sets by using a DCA method;
and 4, step 4: and performing model construction on the fused features by using a deep random forest, and making a decision on the signal to be measured.
2. The method of claim 1, wherein the digital audio authentication method based on analysis of the ENF phase spectrum and the instantaneous frequency spectrum,
the method comprises the following steps:
step 1.1: for the signal x [ n ] to be measured]Preprocessing, including down-sampling and removing DC component to obtain xd[n];
Step 1.2: down-sampled signal x from step 1.1d[n]The ENF component x in the signal is obtained through a band-pass filter with the center frequency at the ENF standard frequencyENFC[n]。
3. The method for authenticating digital audio according to claim 1, wherein the method for authenticating digital audio based on analysis of the ENF phase spectrum and the instantaneous frequency spectrum comprises the following steps:
step A1: for xENFC[n]Performing DFT-based1Estimating the phase spectrum, and extracting a phase spectrum fluctuation characteristic F;
step A2: for xENFC[n]Performing Hilbert-based instantaneous frequency spectrum estimation;
step A3: respectively performing curve fitting on the phase spectrum and the frequency spectrum, and extracting the fitting characteristics of the phase spectrum
Figure FDA0002490111650000021
And instantaneous frequency spectrum fitting characteristics
Figure FDA0002490111650000022
4. The method for authenticating digital audio based on analysis of ENF phase spectrum and instantaneous frequency spectrum as claimed in claim 2, wherein in step a1, x is subjected toENFC[n]Performing DFT-based1For the phase spectrum estimation of (2), first for xENFC[n]The signal is subjected to a conventional N-point Discrete Fourier Transform (DFT) based on DFT0To obtain an estimated phase
Figure FDA0002490111650000023
Based on DFT1Phase estimation at DFT0Based on the phase estimation, calculating xENFC[n]Approximate first derivative at point n:
x′ENFC[n]=fd(xENFC[n]-xENFC[n-1])
combining approximate first derivative sums
Figure FDA0002490111650000024
Performing higher-order phase estimation, performing linear interpolation on the estimation result to obtain a phase spectrum estimation result, and extracting a phase spectrum fluctuation characteristic F;
in step A2, for xENFC[n]An instantaneous frequency estimation based on Hilbert transform is performed, firstly obtaining xENFC[n]The analytic function of (2):
x(a) ENFC[x]=xENFC[x]+i*H{xENFC[x]},
wherein
Figure FDA0002490111650000025
H represents Hilbert transform; instantaneous frequency of H { xENFC[n]Rate of change of phase angle, estimating instantaneous frequency f [ n ] of ENF signal]For f [ n ]]Removing oscillation and boundary effects, constructing xENFC[n]An instantaneous frequency spectrum;
in step A3, according to xENFC[n]Fitting the curves of the phase spectrum and the frequency spectrum by Sum of Sines and Gaussian respectively;
sum of Sines expression form:
Figure FDA0002490111650000026
gaussian expression form:
Figure FDA0002490111650000027
wherein the expression parameters are the fitting characteristics,
Figure FDA0002490111650000028
Figure FDA0002490111650000031
5. the method for authenticating digital audio according to claim 1, wherein the method for authenticating digital audio based on analysis of the ENF phase spectrum and the instantaneous frequency spectrum comprises the following steps:
performing feature fusion on the phase feature set and the frequency feature set obtained in the step 2 by applying Discriminant Correlation Analysis (DCA), performing feature fusion by maximizing pairwise correlation between the two feature sets by the DCA, and limiting the correlation in the class; the transformation matrix of the feature set is calculated by maximizing the covariance matrix among the feature sets, and the diagonalization of the intra-class scatter matrix is performed at the same time.
6. The method for authenticating digital audio based on analysis of ENF phase spectrum and instantaneous frequency spectrum as claimed in claim 1, wherein the step 4 comprises:
step 4.1: and (3) applying a deep random forest to construct a model of the fused features: the fusion characteristic part is used for training a deep random forest, in the training process of the deep random forest, the number of layers model parameters are automatically determined according to the change of precision and the number of layers limitation, the training is stopped when the training precision is not improved or the number of layers reaches the maximum value, and the classification result is used as the final classification precision;
step 4.2: and (3) after the model is stored, deciding whether any signal to be detected is tampered: and constructing a fusion feature classification model by using the layer number and the structure parameters of the deep random forest obtained after the training process of the deep random forest is completed, and classifying and deciding the fusion features of any signal to be detected.
7. A computer program for implementing the digital audio authenticity identification method based on analysis of ENF phase spectrum and instantaneous frequency spectrum according to any one of claims 1 to 6.
8. A digital audio signal processing system for implementing the digital audio authenticity identification method based on the analysis of the ENF phase spectrum and the instantaneous frequency spectrum according to any one of claims 1 to 6.
9. A computer-readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the digital audio authenticity identification method based on analysis of ENF phase spectrum and instantaneous frequency spectrum according to any of claims 1 to 6.
CN201810585686.3A 2018-06-06 2018-06-06 Audio identification method based on analysis of ENF phase spectrum and instantaneous frequency spectrum Active CN108806718B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810585686.3A CN108806718B (en) 2018-06-06 2018-06-06 Audio identification method based on analysis of ENF phase spectrum and instantaneous frequency spectrum

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810585686.3A CN108806718B (en) 2018-06-06 2018-06-06 Audio identification method based on analysis of ENF phase spectrum and instantaneous frequency spectrum

Publications (2)

Publication Number Publication Date
CN108806718A CN108806718A (en) 2018-11-13
CN108806718B true CN108806718B (en) 2020-07-21

Family

ID=64087865

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810585686.3A Active CN108806718B (en) 2018-06-06 2018-06-06 Audio identification method based on analysis of ENF phase spectrum and instantaneous frequency spectrum

Country Status (1)

Country Link
CN (1) CN108806718B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11039205B2 (en) 2019-10-09 2021-06-15 Sony Interactive Entertainment Inc. Fake video detection using block chain
CN110808070B (en) * 2019-11-14 2022-05-06 福州大学 Sound event classification method based on deep random forest in audio monitoring
CN111998936B (en) * 2020-08-25 2022-04-15 四川长虹电器股份有限公司 Equipment abnormal sound detection method and system based on transfer learning
CN112151067B (en) * 2020-09-27 2023-05-02 湖北工业大学 Digital audio tampering passive detection method based on convolutional neural network
CN112365901A (en) * 2020-11-03 2021-02-12 武汉工程大学 Mechanical audio fault detection method and device
CN113453225B (en) * 2021-06-23 2022-05-20 华中科技大学 Physical layer watermark authentication method and system for LTE system
CN113704409B (en) * 2021-08-31 2023-08-04 上海师范大学 False recruitment information detection method based on cascading forests

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105223475A (en) * 2015-08-25 2016-01-06 国家电网公司 Based on the shelf depreciation chromatogram characteristic algorithm for pattern recognition of Gaussian parameter matching

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11069370B2 (en) * 2016-01-11 2021-07-20 University Of Tennessee Research Foundation Tampering detection and location identification of digital audio recordings
CN107274915B (en) * 2017-07-31 2020-08-07 华中师范大学 Digital audio tampering automatic detection method based on feature fusion

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105223475A (en) * 2015-08-25 2016-01-06 国家电网公司 Based on the shelf depreciation chromatogram characteristic algorithm for pattern recognition of Gaussian parameter matching

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于局部Gabor相位特征融合的人脸识别;江艳霞;《光电工程》;20100703;全文 *
音频取证若干关键技术研究进展;包永强;《数据采集与处理》;20160315;全文 *

Also Published As

Publication number Publication date
CN108806718A (en) 2018-11-13

Similar Documents

Publication Publication Date Title
CN108806718B (en) Audio identification method based on analysis of ENF phase spectrum and instantaneous frequency spectrum
CN108766464B (en) Digital audio tampering automatic detection method based on power grid frequency fluctuation super vector
CN112367273B (en) Flow classification method and device of deep neural network model based on knowledge distillation
CN107274915A (en) A kind of DAB of feature based fusion distorts automatic testing method
Wang et al. Digital audio tampering detection based on ENF consistency
US11533373B2 (en) Global iterative clustering algorithm to model entities' behaviors and detect anomalies
CN103761965B (en) A kind of sorting technique of instrument signal
CN109086830B (en) Typical correlation analysis near-duplicate video detection method based on sample punishment
Wang et al. Multi-task Joint Sparse Representation Classification Based on Fisher Discrimination Dictionary Learning.
CN110929525A (en) Network loan risk behavior analysis and detection method, device, equipment and storage medium
Khalaf et al. Robust partitioning and indexing for iris biometric database based on local features
CN112509601B (en) Note starting point detection method and system
Yao et al. An efficient cascaded filtering retrieval method for big audio data
CN113886821A (en) Malicious process identification method and device based on twin network, electronic equipment and storage medium
CN108766465B (en) Digital audio tampering blind detection method based on ENF general background model
CN114168788A (en) Audio audit processing method, device, equipment and storage medium
CN116599743A (en) 4A abnormal detour detection method and device, electronic equipment and storage medium
CN109598216B (en) Convolution-based radio frequency fingerprint feature extraction method
CN114710344B (en) Intrusion detection method based on traceability graph
CN115472179A (en) Automatic detection method and system for digital audio deletion and insertion tampering operation
CN100363943C (en) Color image matching analytical method based on color content and distribution
CN113722607B (en) Improved clustering-based bracket attack detection method
CN114968351B (en) Hierarchical multi-feature code homologous analysis method and system
CN112529035B (en) Intelligent identification method for identifying individual types of different radio stations
CN116738259B (en) Multi-harmonic-based electromagnetic leakage radiation source fingerprint extraction and identification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20181113

Assignee: Hubei ZHENGBO Xusheng Technology Co.,Ltd.

Assignor: CENTRAL CHINA NORMAL University

Contract record no.: X2024980001275

Denomination of invention: Audio identification method based on analysis of ENF phase spectrum and instantaneous frequency spectrum

Granted publication date: 20200721

License type: Common License

Record date: 20240124

EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20181113

Assignee: Hubei Rongzhi Youan Technology Co.,Ltd.

Assignor: CENTRAL CHINA NORMAL University

Contract record no.: X2024980001548

Denomination of invention: Audio identification method based on analysis of ENF phase spectrum and instantaneous frequency spectrum

Granted publication date: 20200721

License type: Common License

Record date: 20240126