CN108510994A - A kind of homologous altering detecting method of audio using byte interframe amplitude spectrum correlation - Google Patents
A kind of homologous altering detecting method of audio using byte interframe amplitude spectrum correlation Download PDFInfo
- Publication number
- CN108510994A CN108510994A CN201810072583.7A CN201810072583A CN108510994A CN 108510994 A CN108510994 A CN 108510994A CN 201810072583 A CN201810072583 A CN 201810072583A CN 108510994 A CN108510994 A CN 108510994A
- Authority
- CN
- China
- Prior art keywords
- frame
- byte
- audio
- homologous
- amplitude spectrum
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001228 spectrum Methods 0.000 title claims abstract description 33
- 238000000034 method Methods 0.000 title claims abstract description 19
- 238000009432 framing Methods 0.000 claims abstract description 9
- 230000005236 sound signal Effects 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000000926 separation method Methods 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 2
- 230000003362 replicative effect Effects 0.000 claims description 2
- 238000006243 chemical reaction Methods 0.000 claims 1
- 238000001514 detection method Methods 0.000 abstract description 8
- 230000003595 spectral effect Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 4
- 230000010076 replication Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000003292 glue Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000010183 spectrum analysis Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/018—Audio watermarking, i.e. embedding inaudible data in the audio signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
Abstract
The invention discloses a kind of homologous altering detecting methods of audio using byte interframe amplitude spectrum correlation, including audio preemphasis, framing adding window, calculate each frame zero-crossing rate, detach byte, reject short byte, the amplitude spectrum similarity for calculating each frame between two bytes judges that byte replicates stickup relationship and tampering location.Inventive method Detection accuracy is high, positioning accuracy is small compared with high and computation complexity.
Description
Technical field
The present invention relates to audio forensics technical fields, and in particular to a kind of audio using byte interframe amplitude spectrum correlation
Homologous altering detecting method.
Background technology
Generally using and reaching its maturity with multimedia technology, people be easier obtain information, produce therewith how
The problem for examining multimedia messages whether complete, reliable.How effective tampering detection is carried out to multi-medium data and has become letter
Cease an important subject of security fields.Compared to image and video, the tampering detection research for digital audio is less.
For audio forgery, it is to be easiest to realize to be also most common that homologous duplication stickup, which is distorted,.Interpolater is by some in audio
Segment carries out the other positions for being copied and pasted to the audio, to change the true semanteme of audio.If criminal will turn round
Bent distorts audio for court evidence, department's confidential information etc., and it will cause serious consequences.Because homologous replicate is glued
Patch, which is distorted, only to be operated in same section audio so that this kind of distort has the characteristics that concealment is high and easy to implement.Therefore, sound is studied
Frequently the homologous detection method pasted and distorted that replicates is for ensureing that the primitiveness of digital medium information, authenticity and integrity have
Very important meaning.
Invention content
In order to overcome shortcoming and deficiency of the existing technology, the present invention to provide a kind of related using byte interframe amplitude spectrum
The homologous altering detecting method of audio of property.
The present invention adopts the following technical scheme that;
A kind of homologous altering detecting method of audio using byte interframe amplitude spectrum correlation includes the following steps:
S1 is by audio signal preemphasis to be measured;
A length of m when S2 carries out adding window sub-frame processing, wherein frame to the audio after preemphasis, it is n that frame, which moves, after framing adding window
Time-domain audio signal is expressed as yl, wherein frame number l=1,2 ..., Nframe, NframeFor audio frame number;
S3 calculates zero-crossing rate zcr (l) to each frame audio signal after adding window framing;
S4 is according to each byte in low-frequency spectra energy separation audio to be measured;
S5 rejects slack byte, specially:Set shortest word section duration threshold value tm, duration is less than tmByte reject, obtain
To effective byte set X={ x1,x2,x3,…,xM, wherein xiFor i-th of byte, M is the number of effective byte;
S6 calculates the amplitude spectrum similarity of each frame between two bytes in the audio signal to be measured after rejecting slack byte;
S7 sets similarity threshold Th, if there are two pairs or more frame amplitude spectrum similarities to be more than in two bytes
Given threshold value then judges byte xiAnd xjIn the presence of duplication stickup relationship;
S8 repeats step 6 and 7 to all byte i ≠ j ∈ { 1,2 ..., M }, obtains all in the presence of duplication stickup relationship
Byte pair, thus can orient the duplication sticking area in audio to be measured.
The calculation formula of the zero-crossing rate is:
Wherein, yl(k) indicate that k-th of data point of l frames, K are that the data of each frame are counted, sgn [] is sign function, such as
Following formula:
According to each byte in low-frequency spectra energy separation audio to be measured in the S4, specially:Acoustic frequency is treated to believe
Number each frame ylProgress length is NfftThe Fourier transformation of point, obtains corresponding amplitude spectrum S (l, f), and wherein f indicates Frequency point
Serial number,
Then the low frequency energy average value for calculating all frames in audio signal to be measured calculates each frame ylLow frequency energy with it is low
The ratio NLFER of frequency average energy.
The NLFER
Wherein, if low frequency part lower-frequency limit is f0_minHz, upper frequency limit f0_maxHz, if sampling frequency is fs, then right
The bound of FFT transform frequency is answered to be respectively:F0_min=(f0_min×2/fs)×Nfft, F0_max=(f0_max/fs)×Nfft;
Energy threshold is set, the frame that NLFER values are more than to threshold value is determined as speech frame, is otherwise determined as noise frame, continuously
Multiple speech frames constitute byte, to isolate each byte in audio to be measured.
Window function selects Hamming window in S2.
In the S6, when the absolute value of the difference of the zero-crossing rate of two frames is less than given threshold value TzcrWhen just calculate its amplitude spectrum phase
Like degree.
Frame duration m chooses between 16 milliseconds to 128 milliseconds, and frame moves duration n and takes audio frame duration 1/2~2/3.
Amplitude spectrum similarity between two frames is measured using Pearson correlation coefficient.
Beneficial effects of the present invention
(1) existing algorithm when detection replicates sticking area and does not differentiate between voice snippet and noise segments, it is contemplated that practical
In application scenario, usual voice byte could express actual semantic information, thus the present invention first extract it is effective in audio
Byte, then similarity mode is carried out for these bytes, it on the one hand can greatly reduce operation time, on the other hand can also carry
The accuracy rate of high detection;
(2) because the operand of related coefficient is larger, the present invention is in the amplitude spectral correlative coefficient between calculating two frames, first
The similitude between two frames is tentatively judged with zero-crossing rate, and related coefficient is just further calculated when zero-crossing rate is close, it can be into one
Step reduces operation time.
Description of the drawings
Fig. 1 is the work flow diagram of the present invention;
Fig. 2 is original audio volume control figure in the embodiment of the present invention;
Fig. 3 is that audio volume control figure is distorted in amplitude stickup in the embodiment of the present invention;
Fig. 4 is the zero-crossing rate schematic diagram that audio is distorted in the embodiment of the present invention per frame;
Fig. 5 is byte segmentation effect figure in the embodiment of the present invention;
Fig. 6 is tampering detection result figure in the embodiment of the present invention.
Specific implementation mode
With reference to embodiment and attached drawing, the present invention is described in further detail, but embodiments of the present invention are not
It is limited to this.
Embodiment
It is as shown in Figure 1 the flow diagram of the present invention, including eight steps, respectively audio preemphasis, framing adding window, meter
Each frame zero-crossing rate is calculated, byte is detached, rejects short byte, calculates the amplitude spectrum similarity of each frame between two bytes, judges that byte replicates
Stickup relationship and tampering location.
The present embodiment, according to the process that the present invention is judged, is such as schemed using the audio of one section of WAV format as analysis object
It is original audio oscillogram shown in 2, voice content behaviour is spoken " one two three four, 34 ".As shown in figure 3, to distort audio wave
Shape figure, voice content are " one two three four, 1 ", wherein the 5th and the 6th byte is to be replicated to glue by the 1st and the 2nd byte
Patch, i.e., the 1st is respectively present replication relation with the 5th byte, the 2nd with the 6th byte.Two section audio sample rates are
8kHz.The duplication location for paste distorted in audio is detected by method through the invention in embodiment and is oriented to come.
Include the following steps:
S1 treats acoustic frequency and carries out preemphasis, is realized using single order high-pass digital filter, and filter response such as following formula is:
H (Z)=1-uz-1
Preemphasis purpose is to promote high frequency section, convenient for spectrum analysis, and for eliminating sound in voiced process
The effect of band and lip, to compensate the high frequency section that voice signal is inhibited by articulatory system, also for being total to for prominent high frequency
Shake peak.Pre emphasis factor u takes 0.97 in embodiment.
A length of m when S2 carries out framing windowing process, wherein frame to the audio after preemphasis, it is n that frame, which moves, and window function can be selected
Hamming window.Time-domain audio signal after framing adding window is expressed as yl, wherein frame number l=1,2 ..., Nframe, NframeFor audio frame
Quantity.
The audio frame sum N of audio after preemphasisframeIt can be sought by following formula:
Wherein,Represent downward round numbers operation, tsFor audio duration to be measured, m is audio frame duration, ts>m>0, n is frame
Move duration, m>n>0.Audio frame duration m generally chooses between 16 milliseconds to 128 milliseconds, and audio frame moves duration n and indicates adjacent tone
The part size overlapped between frequency frame, between generally take audio frame duration 1/2 to 2/3, making can be smoothed between frame and frame
It crosses.Give up the data of the last inadequate frame length of audio.In the present embodiment, a length of 5984 milliseconds are distorted when audio, chooses audio
A length of 128 milliseconds when frame, it is the 1/2 of frame length that frame, which moves, and audio shares 128 milliseconds × 8kHz=1024 data point per frame, according to
Formula (3) is calculated audio and shares 92 frames.Audio frame uses Hamming window adding window.
S3 calculates zero-crossing rate zcr (l) to each frame audio signal after framing adding window, specially:
Wherein, yl(k) indicate that k-th of data point of l frames, K are that the data of each frame are counted, sgn [] is sign function, such as
Formula (5):
As shown in figure 4, to distort the zero-crossing rate variation diagram of each frame of audio, it can be seen that there are the 1st of replication relation the and the 5th
The zero-crossing rate of a byte, the 2nd and the 6th each frame of byte is close.
S4 treats each frame y of acoustic frequency according to each byte in low-frequency spectra energy separation audio to be measuredlCarrying out length is
NfftThe Fourier transformation of point, obtains corresponding amplitude spectrum S (l, f), and wherein f indicates frequency point serial number.Calculate all frame low frequencies of audio
Average energy, to audio frame ylCalculate ratio NLFER (the Normalized Low of its low frequency energy and the average value
Frequency Energy Ratio), such as following formula:
Wherein, if low frequency part lower-frequency limit is f0_minHz, upper frequency limit f0_maxHz, if sampling frequency is fs, then
The bound that FFT transform frequency is corresponded in formula (1) is respectively:F0_min=(f0_min×2/fs)×Nfft, F0_max=(f0_max/fs)
×Nfft.The characteristics of according to mute section with high-frequency noise being main, can suitable threshold value be set to NLFER values, if NLFER values are higher than
On the contrary threshold value judges that the frame is to have an acoustic frame, then be mute frame, it is continuous it is multiple have acoustic frame composition byte, wait for acoustic to isolate
Each byte in frequency.
In the present embodiment, totalframes NframeIt is 92, low frequency part lower-frequency limit f0_minFor 60Hz, upper frequency limit f0_maxFor
400Hz, the length N of Fourier transformationfftIt is 8192, the FFT lower-frequency limits F in formula (1)0_min=(f0_min×2/fs)×Nfft,
It is approximately equal to 123, FFT upper frequency limits F0_max=(f0_max/fs)×Nfft, it is approximately equal to 410.
Energy threshold is set, the frame that NLFER values are more than to threshold value is determined as speech frame, is otherwise determined as noise frame, continuously
Multiple speech frames constitute byte.Energy threshold is 0.75 in the present embodiment.
S5 rejects too short slack byte.
By Environmental Noise Influence, too short slack byte, setting shortest word section duration threshold value t are will appear in audiom, by when
It is long to be less than tmByte reject.In the present embodiment, tmValue is the duration of a frame, i.e., 128 milliseconds, 8 effective bytes are obtained,
Byte set is denoted as X={ x1,x2,x3,…,x8}.Fig. 5 illustrates for the final result of audio byte segmentation to be measured in the present embodiment
Scheme, the part that range value is 1 in figure indicates effective byte.
Between S6 calculates two bytes in the audio signal to be measured after rejecting slack byte, the amplitude spectrum similarity of each frame.
The amplitude spectrum similarity formula that Pearson correlation coefficient measures two frames is as follows:
Two byte x are chosen from XiAnd xj, byte x is calculated one by oneiIn each frame and byte xjIn each frame amplitude spectrum it is similar
Degree, wherein byte xiBy frame set I={ yl| l=αi…βiComposition, byte xjBy frame set J={ yk| k=αj…βjComposition,
The amplitude spectrum similarity of each frame and each frame in J in I is calculated one by one.For reduce calculation amount, first examine two frames zero-crossing rate whether phase
Closely, only when the absolute value of the difference of the zero-crossing rate of two frames is less than given threshold value TzcrWhen just calculate its amplitude spectrum similarity.
Wherein ylAnd ykRespectively byte xiWith byte xjIn frame, indicate inner product operation,Indicate vectorial mean value.This reality
It applies in example, the start frame and end frame number of 8 bytes are as shown in table 1.
The corresponding starting frame number α of 18 bytes of tableiWith end frame number βi
Byte | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
αi | 5 | 18 | 30 | 42 | 50 | 63 | 73 | 83 |
βi | 8 | 22 | 34 | 44 | 53 | 67 | 76 | 86 |
To reduce calculation amount, first examine the zero-crossing rate of two frames whether close, it is only absolute when the difference of the zero-crossing rate of two frames
Value is less than given threshold value TzcrWhen just calculate its amplitude spectrum similarity.In the present embodiment, threshold value T is takenzcrIt is 60.As shown in table 2, it adopts
The calculation times that amplitude spectral correlative coefficient can be significantly reduced with short-time zero-crossing rate anticipation, compare to reduce two byte of detection algorithm
Partial run time.
Whether table 2 is compared using the calculation amount of zero-crossing rate anticipation
Related coefficient calculation times | Rating unit run time (s) | |
It is prejudged using zero-crossing rate | 247 | 0.045 |
Zero-crossing rate is not used to prejudge | 504 | 0.085 |
Table 3 gives the amplitude spectral correlative coefficient of the 1st and the 2nd each frame of byte in the present embodiment, and table 4 then gives
The amplitude spectral correlative coefficient of 1 and the 5th each frame of byte.
The amplitude spectral correlative coefficient of the 1st and the 2nd each frame of byte of table 3
ρ(l,k) | L=5 | L=6 | L=7 | L=8 |
K=18 | -0.1714 | -0.0982 | -0.1675 | -0.2620 |
K=19 | -0.0258 | -0.0604 | -0.0635 | 0.0603 |
K=20 | 0.3999 | 0.1888 | 0.1817 | 0.1821 |
K=21 | 0.6535 | 0.1008 | 0.0198 | 0.2024 |
K=22 | 0.3120 | 0.0654 | -0.0458 | 0.0818 |
The amplitude spectral correlative coefficient of the 1st and the 5th each frame of byte of table 4
ρ(l,k) | L=5 | L=6 | L=7 | L=8 |
K=50 | 0.9090 | 0.3784 | 0.0654 | 0.2240 |
K=51 | 0.0979 | 0.9654 | 0.5834 | 0.0851 |
K=52 | -0.0275 | 0.3679 | 0.9603 | 0.5527 |
K=53 | 0.3039 | 0.1110 | 0.2994 | 0.9417 |
Contrast table 3 and table 4 can see, and the interframe related coefficient very little between two bytes of replication relation is not present,
There are the interframe correlation coefficient value between two bytes of replication relation is larger, the related coefficient of diagonal positions especially in table
Value is close to 1.
S7 sets similarity threshold Th, if there are two pairs or more frame amplitude spectrum similarities to be more than in two bytes
Given threshold value then judges byte xiAnd xjIn the presence of duplication stickup relationship;
Specially:Similarity threshold Th is set, if there is two pairs or more frame amplitude spectral correlative coefficient in the 5th step
More than given threshold value, then its affiliated byte x is judgediAnd xjIn the presence of duplication stickup relationship.Threshold value Th is 0.94 in the present embodiment.From
Table 3 is as it can be seen that all interframe amplitude spectral correlative coefficients of the 1st and the 2nd byte are less than threshold value, therefore, it is determined that the two words
There is no replicate stickup relationship for section.As seen from Table 4, when the 1st and the 5th byte compare, there is the amplitude spectrum of 3 pairs of audio frames related
Coefficient is more than threshold value Th, therefore, it is determined that the two bytes, which exist, replicates stickup relationship.
S8 repeats S6 and S7 to all i ≠ j ∈ { 1,2 ..., M }, obtains all bytes for existing and replicating stickup relationship
It is right, it thus can orient the duplication sticking area in audio to be measured.
Share 8 bytes in the present embodiment, 28 matchings need to be carried out, finally obtain the 1st and the 5th byte, the 2nd and
6th byte, which is respectively present, replicates stickup relationship, thus can orient the duplication sticking area in audio to be measured.Fig. 6 gives
The testing result of the present embodiment, the result be consistent with actual conditions, it was demonstrated that effectiveness of the invention.
The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by the embodiment
Limitation, it is other it is any without departing from the spirit and principles of the present invention made by changes, modifications, substitutions, combinations, simplifications,
Equivalent substitute mode is should be, is included within the scope of the present invention.
Claims (8)
1. a kind of homologous altering detecting method of audio using byte interframe amplitude spectrum correlation, which is characterized in that including as follows
Step:
S1 is by audio signal preemphasis to be measured;
A length of m when S2 carries out adding window sub-frame processing, wherein frame to the audio after preemphasis, it is n, the time domain after framing adding window that frame, which moves,
Audio signal is expressed as yl, wherein frame number l=1,2 ..., Nframe, NframeFor audio frame number;
S3 calculates zero-crossing rate zcr (l) to each frame audio signal after adding window framing;
S4 is according to each byte in low-frequency spectra energy separation audio to be measured;
S5 rejects slack byte, specially:Set shortest word section duration threshold value tm, duration is less than tmByte reject, had
Imitate byte set X={ x1,x2,x3,…,xM, wherein xiFor i-th of byte, M is the number of effective byte;
S6 calculates the amplitude spectrum similarity of each frame between two bytes in the audio signal to be measured after rejecting slack byte;
S7 sets similarity threshold Th, if there are two pairs or more frame amplitude spectrum similarities more than given in two bytes
Threshold value then judges byte xiAnd xjIn the presence of duplication stickup relationship;
S8 repeats S6 and S7 to all byte i ≠ j ∈ { 1,2 ..., M }, obtains all bytes for existing and replicating stickup relationship
It is right, it thus can orient the duplication sticking area in audio to be measured.
2. the homologous altering detecting method of audio according to claim 1, which is characterized in that the calculation formula of the zero-crossing rate
For:
Wherein, yl(k) indicate that k-th of data point of l frames, K are that the data of each frame are counted, sgn [] is sign function, as follows
Formula:
3. the homologous altering detecting method of audio according to claim 1, which is characterized in that according to low-frequency spectra in the S4
Each byte in energy separation audio to be measured, specially:Treat each frame y for surveying audio signallProgress length is NfftFu of point
In leaf transformation, obtain corresponding amplitude spectrum S (l, f), wherein f indicates Frequency point serial number,
Then the low frequency energy average value for calculating all frames in audio signal to be measured calculates each frame ylLow frequency energy and low frequency energy
Measure the ratio NLFER of average value.
4. the homologous altering detecting method of audio according to claim 3, which is characterized in that the NLFER
Wherein, if low frequency part lower-frequency limit is f0_minHz, upper frequency limit f0_maxHz, if sampling frequency is fs, then FFT is corresponded to
The bound of conversion frequency is respectively:F0_min=(f0_min×2/fs)×Nfft, F0_max=(f0_max/fs)×Nfft;
Energy threshold is set, the frame that NLFER values are more than to threshold value is determined as speech frame, is otherwise determined as noise frame, continuous multiple
Speech frame constitutes byte, to isolate each byte in audio to be measured.
5. the homologous altering detecting method of audio according to claim 1, which is characterized in that window function selects Hamming in S2
Window.
6. the homologous altering detecting method of audio according to claim 1, which is characterized in that in the S6, when the mistake of two frames
The absolute value of the difference of zero rate is less than given threshold value TzcrWhen just calculate its amplitude spectrum similarity.
7. the homologous altering detecting method of audio according to claim 1, which is characterized in that frame duration m is at 16 milliseconds to 128
It is chosen between millisecond, frame moves duration n and takes audio frame duration 1/2~2/3.
8. the homologous altering detecting method of audio according to claim 1, which is characterized in that use Pearson correlation coefficient degree
Measure the amplitude spectrum similarity between two frames.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810072583.7A CN108510994B (en) | 2018-01-25 | 2018-01-25 | Audio homologous tampering detection method utilizing byte interframe amplitude spectral correlation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810072583.7A CN108510994B (en) | 2018-01-25 | 2018-01-25 | Audio homologous tampering detection method utilizing byte interframe amplitude spectral correlation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108510994A true CN108510994A (en) | 2018-09-07 |
CN108510994B CN108510994B (en) | 2020-09-22 |
Family
ID=63374843
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810072583.7A Expired - Fee Related CN108510994B (en) | 2018-01-25 | 2018-01-25 | Audio homologous tampering detection method utilizing byte interframe amplitude spectral correlation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108510994B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111863023A (en) * | 2020-09-22 | 2020-10-30 | 深圳市声扬科技有限公司 | Voice detection method and device, computer equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120294452A1 (en) * | 2011-05-20 | 2012-11-22 | Andrew John Macdonald | Method and apparatus for reducing noise pumping due to noise suppression and echo control interaction |
CN103854646A (en) * | 2014-03-27 | 2014-06-11 | 成都康赛信息技术有限公司 | Method for classifying digital audio automatically |
CN104616663A (en) * | 2014-11-25 | 2015-05-13 | 重庆邮电大学 | Music separation method of MFCC (Mel Frequency Cepstrum Coefficient)-multi-repetition model in combination with HPSS (Harmonic/Percussive Sound Separation) |
CN106878704A (en) * | 2017-02-14 | 2017-06-20 | 福建师范大学 | Turn altering detecting method on video frame rate based on light stream cyclophysis |
CN106941008A (en) * | 2017-04-05 | 2017-07-11 | 华南理工大学 | It is a kind of that blind checking method is distorted based on Jing Yin section of heterologous audio splicing |
-
2018
- 2018-01-25 CN CN201810072583.7A patent/CN108510994B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120294452A1 (en) * | 2011-05-20 | 2012-11-22 | Andrew John Macdonald | Method and apparatus for reducing noise pumping due to noise suppression and echo control interaction |
CN103854646A (en) * | 2014-03-27 | 2014-06-11 | 成都康赛信息技术有限公司 | Method for classifying digital audio automatically |
CN104616663A (en) * | 2014-11-25 | 2015-05-13 | 重庆邮电大学 | Music separation method of MFCC (Mel Frequency Cepstrum Coefficient)-multi-repetition model in combination with HPSS (Harmonic/Percussive Sound Separation) |
CN106878704A (en) * | 2017-02-14 | 2017-06-20 | 福建师范大学 | Turn altering detecting method on video frame rate based on light stream cyclophysis |
CN106941008A (en) * | 2017-04-05 | 2017-07-11 | 华南理工大学 | It is a kind of that blind checking method is distorted based on Jing Yin section of heterologous audio splicing |
Non-Patent Citations (1)
Title |
---|
郭伟、于凤芹: "基于改进时频比的语音音乐信号分离", 《计算机工程》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111863023A (en) * | 2020-09-22 | 2020-10-30 | 深圳市声扬科技有限公司 | Voice detection method and device, computer equipment and storage medium |
CN111863023B (en) * | 2020-09-22 | 2021-01-08 | 深圳市声扬科技有限公司 | Voice detection method and device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108510994B (en) | 2020-09-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101269296B1 (en) | Neural network classifier for separating audio sources from a monophonic audio signal | |
Marafioti et al. | Adversarial generation of time-frequency features with application in audio synthesis | |
CN107293286B (en) | Voice sample collection method based on network dubbing game | |
AU2002242265B2 (en) | Method for time aligning audio signals using characterizations based on auditory events | |
Chi et al. | Multiresolution spectrotemporal analysis of complex sounds | |
CN104900238B (en) | A kind of audio real-time comparison method based on perception filtering | |
EP1390942B1 (en) | Method for time aligning audio signals using characterizations based on auditory events | |
US7461002B2 (en) | Method for time aligning audio signals using characterizations based on auditory events | |
CN102129456B (en) | Method for monitoring and automatically classifying music factions based on decorrelation sparse mapping | |
CN104992713B (en) | A kind of quick broadcast audio comparison method | |
CN105469807B (en) | A kind of more fundamental frequency extracting methods and device | |
AU2002242265A1 (en) | Method for time aligning audio signals using characterizations based on auditory events | |
CN108962229A (en) | A kind of target speaker's voice extraction method based on single channel, unsupervised formula | |
CN104778948B (en) | A kind of anti-noise audio recognition method based on bending cepstrum feature | |
CN109872720A (en) | It is a kind of that speech detection algorithms being rerecorded to different scenes robust based on convolutional neural networks | |
Li et al. | The contribution of obstruent consonants and acoustic landmarks to speech recognition in noise | |
Shifas et al. | A non-causal FFTNet architecture for speech enhancement | |
CN108510994A (en) | A kind of homologous altering detecting method of audio using byte interframe amplitude spectrum correlation | |
CN104900239B (en) | A kind of audio real-time comparison method based on Walsh-Hadamard transform | |
CN102237093A (en) | Echo hiding method based on forward and backward echo kernels | |
CN109300486A (en) | Fricative automatic identifying method is swallowed based on the cleft palate speech that PICGTFs and SSMC enhances | |
CN103077706A (en) | Method for extracting and representing music fingerprint characteristic of music with regular drumbeat rhythm | |
Dong | Time Series Analysis of Jitter in Sustained Vowels. | |
CN114038469A (en) | Speaker identification method based on multi-class spectrogram feature attention fusion network | |
Shine | Extended bipolar echo kernel for audio watermarking |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20200922 |
|
CF01 | Termination of patent right due to non-payment of annual fee |