CN108510994A - A kind of homologous altering detecting method of audio using byte interframe amplitude spectrum correlation - Google Patents

A kind of homologous altering detecting method of audio using byte interframe amplitude spectrum correlation Download PDF

Info

Publication number
CN108510994A
CN108510994A CN201810072583.7A CN201810072583A CN108510994A CN 108510994 A CN108510994 A CN 108510994A CN 201810072583 A CN201810072583 A CN 201810072583A CN 108510994 A CN108510994 A CN 108510994A
Authority
CN
China
Prior art keywords
frame
byte
audio
homologous
amplitude spectrum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810072583.7A
Other languages
Chinese (zh)
Other versions
CN108510994B (en
Inventor
胡永健
余颖娟
刘琲贝
贺前华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201810072583.7A priority Critical patent/CN108510994B/en
Publication of CN108510994A publication Critical patent/CN108510994A/en
Application granted granted Critical
Publication of CN108510994B publication Critical patent/CN108510994B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/018Audio watermarking, i.e. embedding inaudible data in the audio signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window

Abstract

The invention discloses a kind of homologous altering detecting methods of audio using byte interframe amplitude spectrum correlation, including audio preemphasis, framing adding window, calculate each frame zero-crossing rate, detach byte, reject short byte, the amplitude spectrum similarity for calculating each frame between two bytes judges that byte replicates stickup relationship and tampering location.Inventive method Detection accuracy is high, positioning accuracy is small compared with high and computation complexity.

Description

A kind of homologous altering detecting method of audio using byte interframe amplitude spectrum correlation
Technical field
The present invention relates to audio forensics technical fields, and in particular to a kind of audio using byte interframe amplitude spectrum correlation Homologous altering detecting method.
Background technology
Generally using and reaching its maturity with multimedia technology, people be easier obtain information, produce therewith how The problem for examining multimedia messages whether complete, reliable.How effective tampering detection is carried out to multi-medium data and has become letter Cease an important subject of security fields.Compared to image and video, the tampering detection research for digital audio is less. For audio forgery, it is to be easiest to realize to be also most common that homologous duplication stickup, which is distorted,.Interpolater is by some in audio Segment carries out the other positions for being copied and pasted to the audio, to change the true semanteme of audio.If criminal will turn round Bent distorts audio for court evidence, department's confidential information etc., and it will cause serious consequences.Because homologous replicate is glued Patch, which is distorted, only to be operated in same section audio so that this kind of distort has the characteristics that concealment is high and easy to implement.Therefore, sound is studied Frequently the homologous detection method pasted and distorted that replicates is for ensureing that the primitiveness of digital medium information, authenticity and integrity have Very important meaning.
Invention content
In order to overcome shortcoming and deficiency of the existing technology, the present invention to provide a kind of related using byte interframe amplitude spectrum The homologous altering detecting method of audio of property.
The present invention adopts the following technical scheme that;
A kind of homologous altering detecting method of audio using byte interframe amplitude spectrum correlation includes the following steps:
S1 is by audio signal preemphasis to be measured;
A length of m when S2 carries out adding window sub-frame processing, wherein frame to the audio after preemphasis, it is n that frame, which moves, after framing adding window Time-domain audio signal is expressed as yl, wherein frame number l=1,2 ..., Nframe, NframeFor audio frame number;
S3 calculates zero-crossing rate zcr (l) to each frame audio signal after adding window framing;
S4 is according to each byte in low-frequency spectra energy separation audio to be measured;
S5 rejects slack byte, specially:Set shortest word section duration threshold value tm, duration is less than tmByte reject, obtain To effective byte set X={ x1,x2,x3,…,xM, wherein xiFor i-th of byte, M is the number of effective byte;
S6 calculates the amplitude spectrum similarity of each frame between two bytes in the audio signal to be measured after rejecting slack byte;
S7 sets similarity threshold Th, if there are two pairs or more frame amplitude spectrum similarities to be more than in two bytes Given threshold value then judges byte xiAnd xjIn the presence of duplication stickup relationship;
S8 repeats step 6 and 7 to all byte i ≠ j ∈ { 1,2 ..., M }, obtains all in the presence of duplication stickup relationship Byte pair, thus can orient the duplication sticking area in audio to be measured.
The calculation formula of the zero-crossing rate is:
Wherein, yl(k) indicate that k-th of data point of l frames, K are that the data of each frame are counted, sgn [] is sign function, such as Following formula:
According to each byte in low-frequency spectra energy separation audio to be measured in the S4, specially:Acoustic frequency is treated to believe Number each frame ylProgress length is NfftThe Fourier transformation of point, obtains corresponding amplitude spectrum S (l, f), and wherein f indicates Frequency point Serial number,
Then the low frequency energy average value for calculating all frames in audio signal to be measured calculates each frame ylLow frequency energy with it is low The ratio NLFER of frequency average energy.
The NLFER
Wherein, if low frequency part lower-frequency limit is f0_minHz, upper frequency limit f0_maxHz, if sampling frequency is fs, then right The bound of FFT transform frequency is answered to be respectively:F0_min=(f0_min×2/fs)×Nfft, F0_max=(f0_max/fs)×Nfft
Energy threshold is set, the frame that NLFER values are more than to threshold value is determined as speech frame, is otherwise determined as noise frame, continuously Multiple speech frames constitute byte, to isolate each byte in audio to be measured.
Window function selects Hamming window in S2.
In the S6, when the absolute value of the difference of the zero-crossing rate of two frames is less than given threshold value TzcrWhen just calculate its amplitude spectrum phase Like degree.
Frame duration m chooses between 16 milliseconds to 128 milliseconds, and frame moves duration n and takes audio frame duration 1/2~2/3.
Amplitude spectrum similarity between two frames is measured using Pearson correlation coefficient.
Beneficial effects of the present invention
(1) existing algorithm when detection replicates sticking area and does not differentiate between voice snippet and noise segments, it is contemplated that practical In application scenario, usual voice byte could express actual semantic information, thus the present invention first extract it is effective in audio Byte, then similarity mode is carried out for these bytes, it on the one hand can greatly reduce operation time, on the other hand can also carry The accuracy rate of high detection;
(2) because the operand of related coefficient is larger, the present invention is in the amplitude spectral correlative coefficient between calculating two frames, first The similitude between two frames is tentatively judged with zero-crossing rate, and related coefficient is just further calculated when zero-crossing rate is close, it can be into one Step reduces operation time.
Description of the drawings
Fig. 1 is the work flow diagram of the present invention;
Fig. 2 is original audio volume control figure in the embodiment of the present invention;
Fig. 3 is that audio volume control figure is distorted in amplitude stickup in the embodiment of the present invention;
Fig. 4 is the zero-crossing rate schematic diagram that audio is distorted in the embodiment of the present invention per frame;
Fig. 5 is byte segmentation effect figure in the embodiment of the present invention;
Fig. 6 is tampering detection result figure in the embodiment of the present invention.
Specific implementation mode
With reference to embodiment and attached drawing, the present invention is described in further detail, but embodiments of the present invention are not It is limited to this.
Embodiment
It is as shown in Figure 1 the flow diagram of the present invention, including eight steps, respectively audio preemphasis, framing adding window, meter Each frame zero-crossing rate is calculated, byte is detached, rejects short byte, calculates the amplitude spectrum similarity of each frame between two bytes, judges that byte replicates Stickup relationship and tampering location.
The present embodiment, according to the process that the present invention is judged, is such as schemed using the audio of one section of WAV format as analysis object It is original audio oscillogram shown in 2, voice content behaviour is spoken " one two three four, 34 ".As shown in figure 3, to distort audio wave Shape figure, voice content are " one two three four, 1 ", wherein the 5th and the 6th byte is to be replicated to glue by the 1st and the 2nd byte Patch, i.e., the 1st is respectively present replication relation with the 5th byte, the 2nd with the 6th byte.Two section audio sample rates are 8kHz.The duplication location for paste distorted in audio is detected by method through the invention in embodiment and is oriented to come.
Include the following steps:
S1 treats acoustic frequency and carries out preemphasis, is realized using single order high-pass digital filter, and filter response such as following formula is:
H (Z)=1-uz-1
Preemphasis purpose is to promote high frequency section, convenient for spectrum analysis, and for eliminating sound in voiced process The effect of band and lip, to compensate the high frequency section that voice signal is inhibited by articulatory system, also for being total to for prominent high frequency Shake peak.Pre emphasis factor u takes 0.97 in embodiment.
A length of m when S2 carries out framing windowing process, wherein frame to the audio after preemphasis, it is n that frame, which moves, and window function can be selected Hamming window.Time-domain audio signal after framing adding window is expressed as yl, wherein frame number l=1,2 ..., Nframe, NframeFor audio frame Quantity.
The audio frame sum N of audio after preemphasisframeIt can be sought by following formula:
Wherein,Represent downward round numbers operation, tsFor audio duration to be measured, m is audio frame duration, ts>m>0, n is frame Move duration, m>n>0.Audio frame duration m generally chooses between 16 milliseconds to 128 milliseconds, and audio frame moves duration n and indicates adjacent tone The part size overlapped between frequency frame, between generally take audio frame duration 1/2 to 2/3, making can be smoothed between frame and frame It crosses.Give up the data of the last inadequate frame length of audio.In the present embodiment, a length of 5984 milliseconds are distorted when audio, chooses audio A length of 128 milliseconds when frame, it is the 1/2 of frame length that frame, which moves, and audio shares 128 milliseconds × 8kHz=1024 data point per frame, according to Formula (3) is calculated audio and shares 92 frames.Audio frame uses Hamming window adding window.
S3 calculates zero-crossing rate zcr (l) to each frame audio signal after framing adding window, specially:
Wherein, yl(k) indicate that k-th of data point of l frames, K are that the data of each frame are counted, sgn [] is sign function, such as Formula (5):
As shown in figure 4, to distort the zero-crossing rate variation diagram of each frame of audio, it can be seen that there are the 1st of replication relation the and the 5th The zero-crossing rate of a byte, the 2nd and the 6th each frame of byte is close.
S4 treats each frame y of acoustic frequency according to each byte in low-frequency spectra energy separation audio to be measuredlCarrying out length is NfftThe Fourier transformation of point, obtains corresponding amplitude spectrum S (l, f), and wherein f indicates frequency point serial number.Calculate all frame low frequencies of audio Average energy, to audio frame ylCalculate ratio NLFER (the Normalized Low of its low frequency energy and the average value Frequency Energy Ratio), such as following formula:
Wherein, if low frequency part lower-frequency limit is f0_minHz, upper frequency limit f0_maxHz, if sampling frequency is fs, then The bound that FFT transform frequency is corresponded in formula (1) is respectively:F0_min=(f0_min×2/fs)×Nfft, F0_max=(f0_max/fs) ×Nfft.The characteristics of according to mute section with high-frequency noise being main, can suitable threshold value be set to NLFER values, if NLFER values are higher than On the contrary threshold value judges that the frame is to have an acoustic frame, then be mute frame, it is continuous it is multiple have acoustic frame composition byte, wait for acoustic to isolate Each byte in frequency.
In the present embodiment, totalframes NframeIt is 92, low frequency part lower-frequency limit f0_minFor 60Hz, upper frequency limit f0_maxFor 400Hz, the length N of Fourier transformationfftIt is 8192, the FFT lower-frequency limits F in formula (1)0_min=(f0_min×2/fs)×Nfft, It is approximately equal to 123, FFT upper frequency limits F0_max=(f0_max/fs)×Nfft, it is approximately equal to 410.
Energy threshold is set, the frame that NLFER values are more than to threshold value is determined as speech frame, is otherwise determined as noise frame, continuously Multiple speech frames constitute byte.Energy threshold is 0.75 in the present embodiment.
S5 rejects too short slack byte.
By Environmental Noise Influence, too short slack byte, setting shortest word section duration threshold value t are will appear in audiom, by when It is long to be less than tmByte reject.In the present embodiment, tmValue is the duration of a frame, i.e., 128 milliseconds, 8 effective bytes are obtained, Byte set is denoted as X={ x1,x2,x3,…,x8}.Fig. 5 illustrates for the final result of audio byte segmentation to be measured in the present embodiment Scheme, the part that range value is 1 in figure indicates effective byte.
Between S6 calculates two bytes in the audio signal to be measured after rejecting slack byte, the amplitude spectrum similarity of each frame.
The amplitude spectrum similarity formula that Pearson correlation coefficient measures two frames is as follows:
Two byte x are chosen from XiAnd xj, byte x is calculated one by oneiIn each frame and byte xjIn each frame amplitude spectrum it is similar Degree, wherein byte xiBy frame set I={ yl| l=αi…βiComposition, byte xjBy frame set J={ yk| k=αj…βjComposition, The amplitude spectrum similarity of each frame and each frame in J in I is calculated one by one.For reduce calculation amount, first examine two frames zero-crossing rate whether phase Closely, only when the absolute value of the difference of the zero-crossing rate of two frames is less than given threshold value TzcrWhen just calculate its amplitude spectrum similarity.
Wherein ylAnd ykRespectively byte xiWith byte xjIn frame, indicate inner product operation,Indicate vectorial mean value.This reality It applies in example, the start frame and end frame number of 8 bytes are as shown in table 1.
The corresponding starting frame number α of 18 bytes of tableiWith end frame number βi
Byte 1 2 3 4 5 6 7 8
αi 5 18 30 42 50 63 73 83
βi 8 22 34 44 53 67 76 86
To reduce calculation amount, first examine the zero-crossing rate of two frames whether close, it is only absolute when the difference of the zero-crossing rate of two frames Value is less than given threshold value TzcrWhen just calculate its amplitude spectrum similarity.In the present embodiment, threshold value T is takenzcrIt is 60.As shown in table 2, it adopts The calculation times that amplitude spectral correlative coefficient can be significantly reduced with short-time zero-crossing rate anticipation, compare to reduce two byte of detection algorithm Partial run time.
Whether table 2 is compared using the calculation amount of zero-crossing rate anticipation
Related coefficient calculation times Rating unit run time (s)
It is prejudged using zero-crossing rate 247 0.045
Zero-crossing rate is not used to prejudge 504 0.085
Table 3 gives the amplitude spectral correlative coefficient of the 1st and the 2nd each frame of byte in the present embodiment, and table 4 then gives The amplitude spectral correlative coefficient of 1 and the 5th each frame of byte.
The amplitude spectral correlative coefficient of the 1st and the 2nd each frame of byte of table 3
ρ(l,k) L=5 L=6 L=7 L=8
K=18 -0.1714 -0.0982 -0.1675 -0.2620
K=19 -0.0258 -0.0604 -0.0635 0.0603
K=20 0.3999 0.1888 0.1817 0.1821
K=21 0.6535 0.1008 0.0198 0.2024
K=22 0.3120 0.0654 -0.0458 0.0818
The amplitude spectral correlative coefficient of the 1st and the 5th each frame of byte of table 4
ρ(l,k) L=5 L=6 L=7 L=8
K=50 0.9090 0.3784 0.0654 0.2240
K=51 0.0979 0.9654 0.5834 0.0851
K=52 -0.0275 0.3679 0.9603 0.5527
K=53 0.3039 0.1110 0.2994 0.9417
Contrast table 3 and table 4 can see, and the interframe related coefficient very little between two bytes of replication relation is not present, There are the interframe correlation coefficient value between two bytes of replication relation is larger, the related coefficient of diagonal positions especially in table Value is close to 1.
S7 sets similarity threshold Th, if there are two pairs or more frame amplitude spectrum similarities to be more than in two bytes Given threshold value then judges byte xiAnd xjIn the presence of duplication stickup relationship;
Specially:Similarity threshold Th is set, if there is two pairs or more frame amplitude spectral correlative coefficient in the 5th step More than given threshold value, then its affiliated byte x is judgediAnd xjIn the presence of duplication stickup relationship.Threshold value Th is 0.94 in the present embodiment.From Table 3 is as it can be seen that all interframe amplitude spectral correlative coefficients of the 1st and the 2nd byte are less than threshold value, therefore, it is determined that the two words There is no replicate stickup relationship for section.As seen from Table 4, when the 1st and the 5th byte compare, there is the amplitude spectrum of 3 pairs of audio frames related Coefficient is more than threshold value Th, therefore, it is determined that the two bytes, which exist, replicates stickup relationship.
S8 repeats S6 and S7 to all i ≠ j ∈ { 1,2 ..., M }, obtains all bytes for existing and replicating stickup relationship It is right, it thus can orient the duplication sticking area in audio to be measured.
Share 8 bytes in the present embodiment, 28 matchings need to be carried out, finally obtain the 1st and the 5th byte, the 2nd and 6th byte, which is respectively present, replicates stickup relationship, thus can orient the duplication sticking area in audio to be measured.Fig. 6 gives The testing result of the present embodiment, the result be consistent with actual conditions, it was demonstrated that effectiveness of the invention.
The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by the embodiment Limitation, it is other it is any without departing from the spirit and principles of the present invention made by changes, modifications, substitutions, combinations, simplifications, Equivalent substitute mode is should be, is included within the scope of the present invention.

Claims (8)

1. a kind of homologous altering detecting method of audio using byte interframe amplitude spectrum correlation, which is characterized in that including as follows Step:
S1 is by audio signal preemphasis to be measured;
A length of m when S2 carries out adding window sub-frame processing, wherein frame to the audio after preemphasis, it is n, the time domain after framing adding window that frame, which moves, Audio signal is expressed as yl, wherein frame number l=1,2 ..., Nframe, NframeFor audio frame number;
S3 calculates zero-crossing rate zcr (l) to each frame audio signal after adding window framing;
S4 is according to each byte in low-frequency spectra energy separation audio to be measured;
S5 rejects slack byte, specially:Set shortest word section duration threshold value tm, duration is less than tmByte reject, had Imitate byte set X={ x1,x2,x3,…,xM, wherein xiFor i-th of byte, M is the number of effective byte;
S6 calculates the amplitude spectrum similarity of each frame between two bytes in the audio signal to be measured after rejecting slack byte;
S7 sets similarity threshold Th, if there are two pairs or more frame amplitude spectrum similarities more than given in two bytes Threshold value then judges byte xiAnd xjIn the presence of duplication stickup relationship;
S8 repeats S6 and S7 to all byte i ≠ j ∈ { 1,2 ..., M }, obtains all bytes for existing and replicating stickup relationship It is right, it thus can orient the duplication sticking area in audio to be measured.
2. the homologous altering detecting method of audio according to claim 1, which is characterized in that the calculation formula of the zero-crossing rate For:
Wherein, yl(k) indicate that k-th of data point of l frames, K are that the data of each frame are counted, sgn [] is sign function, as follows Formula:
3. the homologous altering detecting method of audio according to claim 1, which is characterized in that according to low-frequency spectra in the S4 Each byte in energy separation audio to be measured, specially:Treat each frame y for surveying audio signallProgress length is NfftFu of point In leaf transformation, obtain corresponding amplitude spectrum S (l, f), wherein f indicates Frequency point serial number,
Then the low frequency energy average value for calculating all frames in audio signal to be measured calculates each frame ylLow frequency energy and low frequency energy Measure the ratio NLFER of average value.
4. the homologous altering detecting method of audio according to claim 3, which is characterized in that the NLFER
Wherein, if low frequency part lower-frequency limit is f0_minHz, upper frequency limit f0_maxHz, if sampling frequency is fs, then FFT is corresponded to The bound of conversion frequency is respectively:F0_min=(f0_min×2/fs)×Nfft, F0_max=(f0_max/fs)×Nfft
Energy threshold is set, the frame that NLFER values are more than to threshold value is determined as speech frame, is otherwise determined as noise frame, continuous multiple Speech frame constitutes byte, to isolate each byte in audio to be measured.
5. the homologous altering detecting method of audio according to claim 1, which is characterized in that window function selects Hamming in S2 Window.
6. the homologous altering detecting method of audio according to claim 1, which is characterized in that in the S6, when the mistake of two frames The absolute value of the difference of zero rate is less than given threshold value TzcrWhen just calculate its amplitude spectrum similarity.
7. the homologous altering detecting method of audio according to claim 1, which is characterized in that frame duration m is at 16 milliseconds to 128 It is chosen between millisecond, frame moves duration n and takes audio frame duration 1/2~2/3.
8. the homologous altering detecting method of audio according to claim 1, which is characterized in that use Pearson correlation coefficient degree Measure the amplitude spectrum similarity between two frames.
CN201810072583.7A 2018-01-25 2018-01-25 Audio homologous tampering detection method utilizing byte interframe amplitude spectral correlation Expired - Fee Related CN108510994B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810072583.7A CN108510994B (en) 2018-01-25 2018-01-25 Audio homologous tampering detection method utilizing byte interframe amplitude spectral correlation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810072583.7A CN108510994B (en) 2018-01-25 2018-01-25 Audio homologous tampering detection method utilizing byte interframe amplitude spectral correlation

Publications (2)

Publication Number Publication Date
CN108510994A true CN108510994A (en) 2018-09-07
CN108510994B CN108510994B (en) 2020-09-22

Family

ID=63374843

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810072583.7A Expired - Fee Related CN108510994B (en) 2018-01-25 2018-01-25 Audio homologous tampering detection method utilizing byte interframe amplitude spectral correlation

Country Status (1)

Country Link
CN (1) CN108510994B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111863023A (en) * 2020-09-22 2020-10-30 深圳市声扬科技有限公司 Voice detection method and device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120294452A1 (en) * 2011-05-20 2012-11-22 Andrew John Macdonald Method and apparatus for reducing noise pumping due to noise suppression and echo control interaction
CN103854646A (en) * 2014-03-27 2014-06-11 成都康赛信息技术有限公司 Method for classifying digital audio automatically
CN104616663A (en) * 2014-11-25 2015-05-13 重庆邮电大学 Music separation method of MFCC (Mel Frequency Cepstrum Coefficient)-multi-repetition model in combination with HPSS (Harmonic/Percussive Sound Separation)
CN106878704A (en) * 2017-02-14 2017-06-20 福建师范大学 Turn altering detecting method on video frame rate based on light stream cyclophysis
CN106941008A (en) * 2017-04-05 2017-07-11 华南理工大学 It is a kind of that blind checking method is distorted based on Jing Yin section of heterologous audio splicing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120294452A1 (en) * 2011-05-20 2012-11-22 Andrew John Macdonald Method and apparatus for reducing noise pumping due to noise suppression and echo control interaction
CN103854646A (en) * 2014-03-27 2014-06-11 成都康赛信息技术有限公司 Method for classifying digital audio automatically
CN104616663A (en) * 2014-11-25 2015-05-13 重庆邮电大学 Music separation method of MFCC (Mel Frequency Cepstrum Coefficient)-multi-repetition model in combination with HPSS (Harmonic/Percussive Sound Separation)
CN106878704A (en) * 2017-02-14 2017-06-20 福建师范大学 Turn altering detecting method on video frame rate based on light stream cyclophysis
CN106941008A (en) * 2017-04-05 2017-07-11 华南理工大学 It is a kind of that blind checking method is distorted based on Jing Yin section of heterologous audio splicing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郭伟、于凤芹: "基于改进时频比的语音音乐信号分离", 《计算机工程》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111863023A (en) * 2020-09-22 2020-10-30 深圳市声扬科技有限公司 Voice detection method and device, computer equipment and storage medium
CN111863023B (en) * 2020-09-22 2021-01-08 深圳市声扬科技有限公司 Voice detection method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN108510994B (en) 2020-09-22

Similar Documents

Publication Publication Date Title
KR101269296B1 (en) Neural network classifier for separating audio sources from a monophonic audio signal
Marafioti et al. Adversarial generation of time-frequency features with application in audio synthesis
CN107293286B (en) Voice sample collection method based on network dubbing game
AU2002242265B2 (en) Method for time aligning audio signals using characterizations based on auditory events
Chi et al. Multiresolution spectrotemporal analysis of complex sounds
CN104900238B (en) A kind of audio real-time comparison method based on perception filtering
EP1390942B1 (en) Method for time aligning audio signals using characterizations based on auditory events
US7461002B2 (en) Method for time aligning audio signals using characterizations based on auditory events
CN102129456B (en) Method for monitoring and automatically classifying music factions based on decorrelation sparse mapping
CN104992713B (en) A kind of quick broadcast audio comparison method
CN105469807B (en) A kind of more fundamental frequency extracting methods and device
AU2002242265A1 (en) Method for time aligning audio signals using characterizations based on auditory events
CN108962229A (en) A kind of target speaker's voice extraction method based on single channel, unsupervised formula
CN104778948B (en) A kind of anti-noise audio recognition method based on bending cepstrum feature
CN109872720A (en) It is a kind of that speech detection algorithms being rerecorded to different scenes robust based on convolutional neural networks
Li et al. The contribution of obstruent consonants and acoustic landmarks to speech recognition in noise
Shifas et al. A non-causal FFTNet architecture for speech enhancement
CN108510994A (en) A kind of homologous altering detecting method of audio using byte interframe amplitude spectrum correlation
CN104900239B (en) A kind of audio real-time comparison method based on Walsh-Hadamard transform
CN102237093A (en) Echo hiding method based on forward and backward echo kernels
CN109300486A (en) Fricative automatic identifying method is swallowed based on the cleft palate speech that PICGTFs and SSMC enhances
CN103077706A (en) Method for extracting and representing music fingerprint characteristic of music with regular drumbeat rhythm
Dong Time Series Analysis of Jitter in Sustained Vowels.
CN114038469A (en) Speaker identification method based on multi-class spectrogram feature attention fusion network
Shine Extended bipolar echo kernel for audio watermarking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200922

CF01 Termination of patent right due to non-payment of annual fee