CN112394224B - Audio file generation time tracing dynamic matching method and system - Google Patents

Audio file generation time tracing dynamic matching method and system Download PDF

Info

Publication number
CN112394224B
CN112394224B CN202011217274.8A CN202011217274A CN112394224B CN 112394224 B CN112394224 B CN 112394224B CN 202011217274 A CN202011217274 A CN 202011217274A CN 112394224 B CN112394224 B CN 112394224B
Authority
CN
China
Prior art keywords
time
signal
power grid
frequency
grid frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011217274.8A
Other languages
Chinese (zh)
Other versions
CN112394224A (en
Inventor
华光
王清懿
张海剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WUHAN DASHENGJI TECHNOLOGY Co.,Ltd.
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202011217274.8A priority Critical patent/CN112394224B/en
Publication of CN112394224A publication Critical patent/CN112394224A/en
Application granted granted Critical
Publication of CN112394224B publication Critical patent/CN112394224B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R23/00Arrangements for measuring frequencies; Arrangements for analysing frequency spectra
    • G01R23/16Spectrum analysis; Fourier analysis
    • G01R23/165Spectrum analysis; Fourier analysis using filters
    • G01R23/167Spectrum analysis; Fourier analysis using filters with digital filters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Abstract

The invention discloses a method and a system for tracing the source of audio file generation time, wherein the method comprises the following steps: extracting a power grid frequency time domain signal in the audio to be detected by using a narrow-band-pass filter; converting the time-domain power grid frequency signal into a time-frequency domain power grid frequency signal by using a short-time Fourier transform and extremum adding method; simultaneously, identifying a region with a smaller signal-to-noise ratio in the audio frequency to be detected, and performing noise reduction processing by using a short-time Fourier transform method for locally increasing the window length; acquiring a power grid frequency reference signal of a time-frequency domain; and matching the power grid frequency signal of the time-frequency domain with the power grid frequency reference signal of the time-frequency domain by adopting a dynamic matching algorithm with the compressed Fourier transform frequency resolution as a threshold value, wherein the optimal matching position corresponds to the timestamp information of the audio to be detected. The method and the device can successfully acquire the timestamp information of the audio file to be detected, judge the authenticity of the digital multimedia file and guarantee the information safety.

Description

Audio file generation time tracing dynamic matching method and system
Technical Field
The invention belongs to the technical field of multimedia digital evidence obtaining, and particularly relates to a source tracing dynamic matching method and system for audio file generation time.
Background
The rapid development of information technology enables people to record and transmit information in a mode which is not limited to texts any more, and multimedia files stored in a digital form become an important component of information communication due to the advantages of rapid transmission, convenient storage and the like. Meanwhile, the digital multimedia content is easy to be tampered and attacked in the process of online propagation, so that false and wrong information is transmitted, the daily life of people is influenced, and even the stable union of the society is threatened. Therefore, how to judge the authenticity of the digital multimedia file is very important to guarantee the information security. The multimedia digital evidence collection aims to analyze the problems of generation time (timestamp), generation place, tampering and the like of a digital multimedia file by technical means to determine the authenticity and reliability of the file, so as to reveal rumors and criminal behaviors related to the multimedia file, and play a vital role in the aspects of judicial law, criminal investigation, security and the like related to multimedia content.
Multimedia forensics techniques fall into two categories: active and passive evidence collection. The active forensics technology is to embed some specific information, such as digital watermark, digital signature, etc., into a file in advance, and then to perform copyright verification or detect whether the file is tampered by extracting and analyzing the information. The passive evidence obtaining technology is a method for identifying whether the multimedia file is artificially modified or not through the characteristics of related software, hardware and file content under the condition that specific information is not embedded in advance. Because the unprocessed original multimedia file has certain inherent rules, and the rules can be damaged after the file is artificially tampered, the file information is changed, therefore, the information which naturally exists without being embedded in advance can be regarded as the natural watermark in the multimedia file, and the authenticity and the integrity of the multimedia file can be identified by extracting the natural watermark.
The power grid frequency (ENF), i.e. the transmission frequency of the power grid signal, has a nominal value of 50 or 60Hz, and in recent years, the ENF gradually becomes an important passive criterion for audio and video evidence collection. The existing research shows that due to the activities of an alternating current power supply and an electric appliance, the power grid frequency and harmonic waves thereof can be collected by audio and video recording equipment through the vibration hum of the electric appliance and the flicker of an electric lamp, so that the audio and video contain power grid frequency signals captured together. The grid frequency signal can be used as a 'natural' evidence-obtaining criterion, and besides the characteristic of easy capture, the grid frequency signal also has the following two important characteristics: the instantaneous value of the power grid frequency fluctuates randomly in a small range around the nominal value of the instantaneous value, and the fluctuation modes of the power grid frequency are kept consistent in the same power grid. Based on the random fluctuation of the instantaneous value of the power grid frequency and the fluctuation consistency in the same power grid, the acquired instantaneous frequency of the reference power grid signal forms an important mapping relation with time, so that the time stamp information can be acquired from the audio file to be detected.
Disclosure of Invention
The invention aims to provide a method and a system for tracing the source of the audio file generation time, which solve the problem of obtaining the timestamp information of the audio file to be detected.
The invention provides a dynamic source tracing matching method for audio file generation time, which comprises the following steps:
extracting a power grid frequency time domain signal in the audio to be detected by using a narrow-band-pass filter;
converting the time-domain power grid frequency signal into a time-frequency domain power grid frequency signal by using a short-time Fourier transform and extremum adding method; simultaneously, identifying a region with a smaller signal-to-noise ratio in the audio frequency to be detected, and performing noise reduction processing by using a short-time Fourier transform method for locally increasing the window length;
acquiring a power grid frequency reference signal, and converting the power grid frequency reference signal in a time domain into a power grid frequency reference signal in a time-frequency domain by using a short-time Fourier transform and extremum adding method;
and matching the power grid frequency signal of the time-frequency domain with the power grid frequency reference signal of the time-frequency domain by adopting a dynamic matching algorithm with the compressed Fourier transform frequency resolution as a threshold value, wherein the optimal matching position corresponds to the timestamp information of the audio to be detected.
Further, when a power grid frequency time domain signal in the audio to be detected is extracted, the band-pass filter is used for filtering noise and recording content in the audio to be detected.
Further, the method for converting the time-domain power grid frequency signal into the time-domain power grid frequency signal by using the short-time fourier transform plus extremum method specifically includes:
dividing a long-time power grid frequency time domain signal into a plurality of equal-length short-time signals;
respectively calculating Fourier transform of each short-time signal, and drawing a transformed frequency spectrum into a function related to time;
and the maximum value of the frequency spectrum energy in each short-time signal is used as the instantaneous value of the power grid frequency at the moment, so that the power grid frequency signal of the time-frequency domain is obtained.
Further, an area with a small signal-to-noise ratio in the audio frequency to be detected is identified by setting a signal variance and a difference threshold.
Further, the corresponding grid frequency reference signal is determined by searching the time range.
Further, the threshold of the dynamic matching algorithm is 0.5 Δ f, and Δ f is the frequency resolution of the short-time fourier transform.
Further, a dynamic matching algorithm with the resolution of the compressed Fourier transform frequency as a threshold is adopted to match the power grid frequency signal of the time-frequency domain with the power grid frequency reference signal of the time-frequency domain, and the optimal matching position corresponds to the timestamp information of the audio frequency to be detected, and the method specifically comprises the following steps:
let x (N) be the power grid frequency signal of the time-frequency domain, the signal length is N; r isk(n) is a grid frequency reference signal of the time-frequency domain, rk(N) r (N + k), the signal length is L, L > N, and k is an optimal matching coefficient, i.e. a matching region serial number of the grid frequency reference signal; n is 0, 1, 2 … … N-1;
s11, initializing n ═ 0, k ═ 0, p (k) ═ 0;
s12, setting a frequency resolution threshold, and correcting the power grid frequency signal by using a correction signal c (n, k):
Figure BDA0002760812130000031
the corrected grid frequency signal is: x (n) + c (n, k)
S13, judging whether N is equal to N-1;
if not, if n is equal to n +1, the process proceeds to step S12, and the grid frequency signal of the matching area is continuously corrected;
if yes, calculating a penalty coefficient P (k) of the power grid frequency signal in the matching area and a corrected mean square error
Figure BDA0002760812130000035
Figure BDA0002760812130000032
Figure BDA0002760812130000033
And proceeds to perform step S14;
s14, judging whether k is equal to L-N;
if not, k is k +1, and n is 0, and the process proceeds to step S12, and a next matching area is calculated;
if yes, go to step S15;
s15, judging the mean square error
Figure BDA0002760812130000034
Whether the minimum of (a) is unique;
if yes, outputting a k value which enables the mean square error to be minimum;
if not, outputting the k value which enables the penalty coefficient to be minimum.
The invention also provides a source tracing dynamic matching system for audio file generation time, which comprises:
the signal extraction module is used for extracting a power grid frequency time domain signal in the audio to be detected by using the narrow-band-pass filter;
a power grid signal module; converting the time-domain power grid frequency signal into a time-frequency domain power grid frequency signal by using a short-time Fourier transform and extremum adding method; simultaneously, identifying a region with a small signal-to-noise ratio in the audio frequency to be detected, and performing noise reduction processing by using a short-time Fourier transform method for locally increasing the window length;
the reference signal module is used for acquiring a power grid frequency reference signal, and converting the power grid frequency reference signal in a time domain into a power grid frequency reference signal in a time-frequency domain by using a short-time Fourier transform and extremum adding method;
and the dynamic matching module is used for matching the power grid frequency signal of the time-frequency domain with the power grid frequency reference signal of the time-frequency domain by adopting a dynamic matching algorithm with the compressed Fourier transform frequency resolution as a threshold value, and the optimal matching position corresponds to the timestamp information of the audio frequency to be detected.
The invention has the beneficial effects that: according to the audio file generation time tracing dynamic matching method and system, when the time domain power grid frequency signal is converted into the time domain power grid frequency signal, the region with the smaller signal-to-noise ratio in the audio to be detected can be identified, the noise reduction processing is carried out by using the short-time Fourier transform method of locally increasing the window length, and the accuracy of timestamp information acquisition is improved; the dynamic matching algorithm can enable the audio time stamp identification to be more accurate, and the audio time stamp identification can show good robustness in the face of larger noise and shorter duration, successfully acquire the time stamp information of the audio file to be detected, judge the authenticity of the digital multimedia file and guarantee the information safety.
Further, when a power grid frequency time domain signal in the audio to be detected is extracted, the band-pass filter is used for filtering noise and recording content in the audio to be detected, so that the power grid frequency signal waveform is effectively extracted, and the timestamp information acquisition accuracy is improved; in the process of matching the audio to be detected, due to the influence of frequency resolution, the noise component is likely to cause deviation smaller than 0.5 delta f, and the peak value in the deviation amount is corrected to the center of a nominal main lobe to avoid noise interference; therefore, after the method is adopted for automatic correction, the influence of frequency resolution is reduced, and the audio time stamp identification becomes more accurate.
Drawings
FIG. 1 is a flowchart of an audio file generation time tracing dynamic matching method of the present invention;
FIG. 2 is a schematic diagram of a narrow band pass filter extracting a power grid frequency time domain signal in a to-be-detected audio frequency according to the present invention;
FIG. 3 is a schematic diagram of a short-time Fourier transform and extremum adding method of the present invention;
FIG. 4 is a time-frequency diagram of a main lobe of a grid frequency signal according to the present invention;
fig. 5 is a flow chart of an ECM matching algorithm of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings in which:
the invention aims to generate time tracing dynamic matching for audio files, and time stamp information of the audio files to be tested is obtained through local window length increase and dynamic threshold matching based on power grid frequency signals. The method has the advantages that the change of a weak power grid frequency instantaneous value in the audio record is accurately compared with a reference signal database, so that the creation time of the recording file is identified, and even whether the existing audio is artificially tampered, edited or damaged or not is detected.
The invention provides a dynamic source tracing matching method for audio file generation time, which comprises the following steps as shown in figure 1:
s1, extracting a power grid frequency time domain signal in the audio to be detected by using a narrow-band-pass filter;
s2, converting the time-domain power grid frequency signal into a time-frequency domain power grid frequency signal by using a short-time Fourier transform and extremum adding method; simultaneously, identifying a region with a smaller signal-to-noise ratio in the audio frequency to be detected, and performing noise reduction processing by using a short-time Fourier transform method for locally increasing the window length;
s3, acquiring a power grid frequency reference signal, and converting the power grid frequency reference signal in a time domain into a power grid frequency reference signal in a time-frequency domain by using a short-time Fourier transform and extremum adding method;
and S4, matching the power grid frequency signal of the time-frequency domain with the power grid frequency reference signal of the time-frequency domain by adopting a dynamic matching algorithm with the compressed Fourier transform frequency resolution as a threshold value, wherein the optimal matching position corresponds to the timestamp information of the audio to be detected.
Further, step S1 is to extract the grid frequency signal in the audio to be detected mainly by using a narrow-band bandpass filter. In addition, a band-pass filter with higher precision and lower calculation amount can be designed, as shown in fig. 2, in addition to extracting the power grid frequency signal in the audio to be detected, the noise and the recording content in the audio file to be detected can be effectively filtered, and the power grid frequency signal is extracted to ensure the smooth proceeding of the subsequent timestamp identification process.
The band-pass filter is divided into two categories, namely a finite length unit impulse response Filter (FIR) and an infinite length unit impulse response filter (IIR): for an IIR filter, distortion caused by nonlinear phase-frequency characteristics of the IIR filter can cause certain influence in application, but a voice signal is not as sensitive to phase as an image signal; for FIR filters, which have a linear phase characteristic, different window functions can be selected or FIR digital filters can be designed using frequency sampling. However, FIR filters require more parameters and orders than IIR filters, and too high an order affects the efficiency of the filter. And analyzing and comparing respective performance indexes, such as dynamic attenuation of cut-off frequency of the upper and lower limits of a pass band, attenuation of a stop band, time delay and the like. And finally determining the optimal band-pass filters corresponding to different scenes by combining the factors, so that the work of obtaining evidence of the power grid frequency can be better realized.
Further, in step S2, the key point is to convert the time-domain grid frequency signal into a time-frequency domain grid frequency signal by a short-time fourier transform (STFT) plus extremum method. As shown in fig. 3, the short-time fourier transform can be considered as dividing a long time signal into equal length shorter time signals, then calculating the fourier transform of each shorter time signal, and plotting the transformed spectrum as a function of time. And the maximum value of the frequency spectrum energy in each section of signal is regarded as the instantaneous value of the power grid frequency at the moment, so that the power grid frequency signal of the time-frequency domain is obtained.
In addition to converting a time-domain power grid frequency signal into a time-domain signal, when the signal-to-noise ratio of a record file to be detected is small, the noise reduction processing of the signal is also important by adopting a short-time Fourier transform method for locally increasing the window length.
The signal-to-noise ratio in the audio file to be detected is expressed as:
Figure BDA0002760812130000051
in the formula, s (n) represents a corresponding power grid frequency reference signal in the file to be detected, and v (n) represents noise in the file to be detected. During recording, the SNR of the recording file is detectedENFHas already been determined. One criterion for describing whether the signal noise is large is the signal variance, which is about 4.56 x 10 for the grid frequency reference signal-4Hz2And the signal variance may be increased by one to two orders of magnitude when the noise is large, so that the noise level can be effectively controlled by reducing the signal variance.
The lower cramer-meror bound of the grid frequency signal variance is:
Figure BDA0002760812130000061
in the formula (f)SRepresenting the sampling frequency and N the window length of the short-time fourier transform. Under the condition that the signal-to-noise ratio and the sampling frequency are not changed, the lower boundary of the Cramer-O is in negative correlation with the window length N, namely when the window length is increased, the lower boundary of the Cramer-O is reduced, the signal variance is reduced, and the effect of controlling the noise can be achieved.
Under the condition that the signal-to-noise ratio is small, the signal variance and the difference of the signal variance become large, so that whether the audio file to be detected needs to control the noise or not can be judged by setting the signal variance and the difference threshold. The differential detection has no formula derivation and is a detection mode based on experimental results. After the area with larger noise is identified, noise control is carried out by adopting a short-time Fourier algorithm with the increased local window length, so that the waveform of the power grid frequency signal is effectively extracted.
Further, step S3 is mainly to obtain a grid frequency reference signal of the time-frequency domain, so as to match with the audio frequency to be measured. And determining a corresponding power grid frequency reference signal by searching a time range, and converting the time-domain power grid frequency reference signal into a time-frequency domain signal by using an STFT (space time transformation) extremum adding method, wherein the time-frequency domain signal is consistent with the power grid frequency signal and is convenient to match.
Further, in step S4, a dynamic threshold-based matching algorithm (ECM) is used to select the threshold value according to the frequency resolution determined by the STFT window size during the matching process. A penalty factor is introduced to monitor the auto-correction process and ultimately determine the estimated time stamp, the specific flow of which is shown in fig. 5.
The uncertainty principle states that the time resolution and the frequency resolution of the signal cannot be arbitrarily high at the same time. This results in a trade-off in selecting the STFT window size. Therefore, noise and interference components located near the ENF frequency may cause a bias in the frequency estimation, compared to a noise-free estimation of the reference data, which is inevitable even if the noise and interference are weaker than the ENF signal.
The frequency resolution problem or offset is considered to be an important problem affecting the matching result when the window size of the STFT is NWSampling frequency of fSHz, the duration of the window segment is TW=NW/fSSecond, the corresponding frequency resolution of the segment:
Figure BDA0002760812130000062
this means that any two frequency components smaller than af cannot be resolved in this signal segment and the peaks of the two frequency components will merge into a single peak. Without loss of generality, one of these frequency components may be considered as an extracted ENF signal, while the other is considered as a noise component, thus causing a deviation of the resultant peak of the frequency interval.
Shown in fig. 4 is a spectral mainlobe waveform, with a mainlobe width of approximately Δ f. In this case, the noise component is likely to result in a deviation of less than 0.5 Δ f, and the peak within this deviation amount should be corrected to the nominal main lobe center. If the noise component is so strong that the corresponding peak is outside the main lobe region, the true peak is not recoverable. Therefore, a threshold η may be selected as the frequency resolution value, i.e., η ═ 0.5 Δ f.
The ECM algorithm is based on the Minimum Mean Square Error (MMSE) of the conventional algorithm, and the mean square error of the reference signal and the grid frequency signal to be detected is expressed as follows:
Figure BDA0002760812130000071
the optimal matching index is as follows:
Figure BDA0002760812130000072
wherein, x (N) is a power grid frequency signal extracted from the audio to be detected, x (N) ═ s (N) + v (N), the length of the signal is N, rk(n) is a reference signal, rkWhere (N) is r (N + k), the signal length is L (L > N), and e (k) represents the mean square error.
Because the uncertain principle in the ENF estimation limits the frequency resolution, the minimum mean square error is improved: in the matching process, a threshold η Hz is defined to tolerate the difference between the ENF signal in the audio to be detected and the reference signal with which it is aligned, and to automatically correct the ENF signal. The automatic correction rules are set as follows: for any given k, i.e. any segment of the grid frequency reference signal, match the region if | x (n) -r in the matching regionk(n) < eta, this x (n) in the matching region is considered correctable by rk(n) otherwise, x (n) is considered uncorrectable and MMSE computation is still required. Thus, all x (n) within the matching region are corrected.
Thus, a correction signal c (n, k) and a penalty factor p (k) are introduced. The penalty coefficient can be subjected to superposition operation at each time of automatic correction, as shown in fig. 5; or after the automatic correction of a certain matching area is completed. The expressions for c (n, k) and P (k) are as follows:
Figure BDA0002760812130000073
Figure BDA0002760812130000074
the corrected ENF correction signal of the audio frequency to be detected is x (n) + c (n, k), wherein the corrected mean square error and the optimum matching coefficient are as follows:
Figure BDA0002760812130000075
Figure BDA0002760812130000076
when the value of the best matching coefficient is unique, the best matching coefficient is the area with the best matching; when the best match index is not unique, there are multiple best match coefficients for k1、k2、k3And (3) the coefficients are equal, a matching area is determined through a penalty coefficient, and the optimal matching coefficient is as follows:
Figure BDA0002760812130000077
this means that if there are multiple positions with the same MMSE, the selection corresponding to the minimum penalty factor is the estimated matching position.
Comprehensive performance analysis is carried out on the ECM algorithm in an experiment, and the method is found to have good anti-noise capability and good self-correcting capability. The use of the ECM algorithm allows audio time stamp identification to be more accurate, and the ECM algorithm also exhibits good robustness in the face of greater noise and shorter duration.
The invention also provides a source tracing dynamic matching system for audio file generation time, which comprises:
the signal extraction module is used for extracting a power grid frequency time domain signal in the audio to be detected by using the narrow-band-pass filter;
a power grid signal module; converting the time-domain power grid frequency signal into a time-frequency domain power grid frequency signal by using a short-time Fourier transform and extremum adding method; simultaneously, identifying a region with a smaller signal-to-noise ratio in the audio frequency to be detected, and performing noise reduction processing by using a short-time Fourier transform method for locally increasing the window length;
the reference signal module is used for acquiring a power grid frequency reference signal, and converting the power grid frequency reference signal in a time domain into a power grid frequency reference signal in a time-frequency domain by using a short-time Fourier transform and extremum adding method;
and the dynamic matching module is used for matching the power grid frequency signal of the time-frequency domain with the power grid frequency reference signal of the time-frequency domain by adopting a dynamic matching algorithm with the compressed Fourier transform frequency resolution as a threshold value, and the optimal matching position corresponds to the timestamp information of the audio frequency to be detected.
It will be understood by those skilled in the art that the foregoing is merely a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included within the scope of the present invention.

Claims (7)

1. A dynamic source tracing matching method for audio file generation time is characterized by comprising the following steps:
extracting a power grid frequency time domain signal in the audio to be detected by using a narrow-band-pass filter;
converting the time-domain power grid frequency signal into a time-frequency domain power grid frequency signal by using a short-time Fourier transform and extremum adding method; simultaneously, identifying a region with a smaller signal-to-noise ratio in the audio frequency to be detected, and performing noise reduction processing by using a short-time Fourier transform method for locally increasing the window length;
acquiring a power grid frequency reference signal, and converting the power grid frequency reference signal in a time domain into a power grid frequency reference signal in a time-frequency domain by using a short-time Fourier transform and extremum adding method;
matching the power grid frequency signal of the time-frequency domain with the power grid frequency reference signal of the time-frequency domain by adopting a dynamic matching algorithm with the compressed Fourier transform frequency resolution as a threshold value, wherein the optimal matching position corresponds to the timestamp information of the audio to be detected;
the method comprises the following steps of matching a power grid frequency signal of a time-frequency domain with a power grid frequency reference signal of the time-frequency domain by adopting a dynamic matching algorithm with a compressed Fourier transform frequency resolution as a threshold, wherein the optimal matching position corresponds to timestamp information of audio to be detected, and the method specifically comprises the following steps:
let x (N) be the power grid frequency signal of the time-frequency domain, the signal length is N; r isk(n) is a grid frequency reference signal of the time-frequency domain, rk(N) r (N + k), length of signal L, L > N, k is optimum matching coefficient, i.e. power network frequency parameterThe matching area serial number of the test signal; n is 0, 1, 2 … … N-1;
s11, initializing n ═ 0, k ═ 0, p (k) ═ 0;
s12, setting a frequency resolution threshold, and correcting the power grid frequency signal by using a correction signal c (n, k):
Figure FDA0003121115650000011
the corrected grid frequency signal is: x (n) + c (n, k)
S13, judging whether N is equal to N-1;
if not, if n is equal to n +1, the process proceeds to step S12, and the grid frequency signal of the matching area is continuously corrected;
if yes, calculating a penalty coefficient P (k) of the power grid frequency signal in the matching area and a corrected mean square error
Figure FDA0003121115650000013
P(k)=∑nc2(n,k)
Figure FDA0003121115650000012
And proceeds to perform step S14;
s14, judging whether k is equal to L-N;
if not, k is k +1, and n is 0, and the process proceeds to step S12, and a next matching area is calculated;
if yes, go to step S15;
s15, judging the mean square error
Figure FDA0003121115650000021
Whether the minimum of (a) is unique;
if yes, outputting a k value which enables the mean square error to be minimum;
if not, outputting the k value which enables the penalty coefficient to be minimum.
2. The audio file generation time tracing dynamic matching method according to claim 1, wherein when extracting a power grid frequency time domain signal in the audio to be tested, a band-pass filter is used to filter out noise and recording contents in the audio to be tested.
3. The audio file generation time tracing dynamic matching method according to claim 1, wherein a method of short-time fourier transform plus extremum is used to convert a time domain power grid frequency signal into a time-frequency domain power grid frequency signal, and specifically comprises:
dividing a long-time power grid frequency time domain signal into a plurality of equal-length short-time signals;
respectively calculating Fourier transform of each short-time signal, and drawing a transformed frequency spectrum into a function related to time;
and the maximum value of the frequency spectrum energy in each short-time signal is used as the instantaneous value of the power grid frequency at the moment, so that the power grid frequency signal of the time-frequency domain is obtained.
4. The audio file generation time tracing dynamic matching method according to claim 1, wherein a region with a small signal-to-noise ratio in the audio to be tested is identified by setting a signal variance and a difference threshold.
5. The audio file generation time tracing dynamic matching method of claim 1, wherein the corresponding grid frequency reference signal is determined by searching a time range.
6. The audio file generation time tracing dynamic matching method of claim 1, wherein the threshold of the dynamic matching algorithm is 0.5 Δ f, and Δ f is the frequency resolution of the short-time fourier transform.
7. An audio file generation time tracing dynamic matching system, comprising:
the signal extraction module is used for extracting a power grid frequency time domain signal in the audio to be detected by using the narrow-band-pass filter;
a power grid signal module; converting the time-domain power grid frequency signal into a time-frequency domain power grid frequency signal by using a short-time Fourier transform and extremum adding method; simultaneously, identifying a region with a smaller signal-to-noise ratio in the audio frequency to be detected, and performing noise reduction processing by using a short-time Fourier transform method for locally increasing the window length;
the reference signal module is used for acquiring a power grid frequency reference signal, and converting the power grid frequency reference signal in a time domain into a power grid frequency reference signal in a time-frequency domain by using a short-time Fourier transform and extremum adding method;
the dynamic matching module is used for matching the power grid frequency signal of the time-frequency domain with the power grid frequency reference signal of the time-frequency domain by adopting a dynamic matching algorithm taking the frequency resolution of the compressed Fourier transform as a threshold, and the optimal matching position corresponds to the timestamp information of the audio frequency to be detected, and specifically comprises the following steps:
let x (N) be the power grid frequency signal of the time-frequency domain, the signal length is N; r isk(n) is a grid frequency reference signal of the time-frequency domain, rk(N) r (N + k), the signal length is L, L > N, and k is an optimal matching coefficient, i.e. a matching region serial number of the grid frequency reference signal; n is 0, 1, 2 … … N-1;
s11, initializing n ═ 0, k ═ 0, p (k) ═ 0;
s12, setting a frequency resolution threshold, and correcting the power grid frequency signal by using a correction signal c (n, k):
Figure FDA0003121115650000031
the corrected grid frequency signal is: x (n) + c (n, k)
S13, judging whether N is equal to N-1;
if not, if n is equal to n +1, the process proceeds to step S12, and the grid frequency signal of the matching area is continuously corrected;
if yes, calculating the power grid in the matching areaPenalty factor p (k) for frequency signal and corrected mean square error
Figure FDA0003121115650000032
P(k)=∑nc2(n,k)
Figure FDA0003121115650000033
And proceeds to perform step S14;
s14, judging whether k is equal to L-N;
if not, k is k +1, and n is 0, and the process proceeds to step S12, and a next matching area is calculated;
if yes, go to step S15;
s15, judging the mean square error
Figure FDA0003121115650000034
Whether the minimum of (a) is unique;
if yes, outputting a k value which enables the mean square error to be minimum;
if not, outputting the k value which enables the penalty coefficient to be minimum.
CN202011217274.8A 2020-11-04 2020-11-04 Audio file generation time tracing dynamic matching method and system Active CN112394224B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011217274.8A CN112394224B (en) 2020-11-04 2020-11-04 Audio file generation time tracing dynamic matching method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011217274.8A CN112394224B (en) 2020-11-04 2020-11-04 Audio file generation time tracing dynamic matching method and system

Publications (2)

Publication Number Publication Date
CN112394224A CN112394224A (en) 2021-02-23
CN112394224B true CN112394224B (en) 2021-08-10

Family

ID=74598793

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011217274.8A Active CN112394224B (en) 2020-11-04 2020-11-04 Audio file generation time tracing dynamic matching method and system

Country Status (1)

Country Link
CN (1) CN112394224B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113611314A (en) * 2021-08-03 2021-11-05 成都理工大学 Speaker identification method and system
CN116545860B (en) * 2023-07-07 2023-10-03 Tcl通讯科技(成都)有限公司 Calibration data reading method and device, storage medium and electronic equipment
CN116664154B (en) * 2023-07-31 2023-10-24 山东瑞升智慧医疗科技有限公司 Medical disinfection supply-based full-flow information tracing method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104598541A (en) * 2014-12-29 2015-05-06 乐视网信息技术(北京)股份有限公司 Identification method and device for multimedia file
CN108447503A (en) * 2018-01-23 2018-08-24 浙江大学山东工业技术研究院 Motor abnormal sound detection method based on Hilbert-Huang transformation
CN109065067A (en) * 2018-08-16 2018-12-21 福建星网智慧科技股份有限公司 A kind of conference terminal voice de-noising method based on neural network model
CN111613239A (en) * 2020-05-29 2020-09-01 北京达佳互联信息技术有限公司 Audio denoising method and device, server and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10446165B2 (en) * 2017-09-27 2019-10-15 Sonos, Inc. Robust short-time fourier transform acoustic echo cancellation during audio playback

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104598541A (en) * 2014-12-29 2015-05-06 乐视网信息技术(北京)股份有限公司 Identification method and device for multimedia file
CN108447503A (en) * 2018-01-23 2018-08-24 浙江大学山东工业技术研究院 Motor abnormal sound detection method based on Hilbert-Huang transformation
CN109065067A (en) * 2018-08-16 2018-12-21 福建星网智慧科技股份有限公司 A kind of conference terminal voice de-noising method based on neural network model
CN111613239A (en) * 2020-05-29 2020-09-01 北京达佳互联信息技术有限公司 Audio denoising method and device, server and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
工业现场强噪声环境下目标声音片段的提取;叶树亮 等;《仪器仪表学报》;20090815;第30卷(第8期);第1630-1633页 *

Also Published As

Publication number Publication date
CN112394224A (en) 2021-02-23

Similar Documents

Publication Publication Date Title
CN112394224B (en) Audio file generation time tracing dynamic matching method and system
Gerdes et al. Device identification via analog signal fingerprinting: A matched filter approach
Huijbregtse et al. Using the ENF criterion for determining the time of recording of short digital audio recordings
TWI480855B (en) Extraction and matching of characteristic fingerprints from audio signals
Gupta et al. Current developments and future trends in audio authentication
US20060229878A1 (en) Waveform recognition method and apparatus
Liu et al. Application of power system frequency for digital audio authentication
KR20010111057A (en) Watermark embedding and extracting method for protecting digital audio contents copyright and preventing duplication and apparatus using thereof
US20060287785A1 (en) Decoding an alternator output signal
CN107274915A (en) A kind of DAB of feature based fusion distorts automatic testing method
KR20070037579A (en) Searching for a scaling factor for watermark detection
Yang et al. Exposing MP3 audio forgeries using frame offsets
JP4267463B2 (en) Method for identifying audio content, method and system for forming a feature for identifying a portion of a recording of an audio signal, a method for determining whether an audio stream includes at least a portion of a known recording of an audio signal, a computer program , A system for identifying the recording of audio signals
Zeng et al. Audio tampering forensics based on representation learning of enf phase sequence
Lin et al. Exposing speech tampering via spectral phase analysis
Grigoras et al. Analytical framework for digital audio authentication
Nicolalde-Rodríguez et al. Audio authenticity based on the discontinuity of ENF higher harmonics
US7171302B2 (en) Determining engine cylinder contribution from indexed engine data
Lin et al. Robust electric network frequency estimation with rank reduction and linear prediction
Doets et al. Distortion estimation in compressed music using only audio fingerprints
CN112033656A (en) Mechanical system fault detection method based on broadband spectrum processing
Liao et al. ENF detection in audio recordings via multi-harmonic combining
Wang et al. Speech Resampling Detection Based on Inconsistency of Band Energy.
Hajj-Ahmad et al. Power signature for multimedia forensics
CN107548007B (en) Detection method and device of audio signal acquisition equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220106

Address after: 430223 1-602, North building, Xingye building, Wuda Science Park, Garden Road, Donghu High Tech University, Wuhan City, Hubei Province

Patentee after: WUHAN DASHENGJI TECHNOLOGY Co.,Ltd.

Address before: 430072 No. 299 Bayi Road, Wuchang District, Hubei, Wuhan

Patentee before: WUHAN University

TR01 Transfer of patent right