CN107274911A - A kind of similarity analysis method based on sound characteristic - Google Patents
A kind of similarity analysis method based on sound characteristic Download PDFInfo
- Publication number
- CN107274911A CN107274911A CN201710305251.4A CN201710305251A CN107274911A CN 107274911 A CN107274911 A CN 107274911A CN 201710305251 A CN201710305251 A CN 201710305251A CN 107274911 A CN107274911 A CN 107274911A
- Authority
- CN
- China
- Prior art keywords
- audio
- sequence
- similarity
- signal
- zero
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 27
- 230000005236 sound signal Effects 0.000 claims abstract description 51
- 238000005314 correlation function Methods 0.000 claims abstract description 39
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 18
- 238000000034 method Methods 0.000 claims description 35
- 238000005070 sampling Methods 0.000 claims description 24
- 238000000605 extraction Methods 0.000 claims description 18
- 238000002864 sequence alignment Methods 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 6
- 230000001256 tonic effect Effects 0.000 claims description 6
- 238000011524 similarity measure Methods 0.000 claims description 4
- 210000001367 artery Anatomy 0.000 claims 1
- 210000003462 vein Anatomy 0.000 claims 1
- 238000012545 processing Methods 0.000 abstract description 8
- 238000005516 engineering process Methods 0.000 abstract description 3
- 238000012544 monitoring process Methods 0.000 abstract description 2
- 238000009432 framing Methods 0.000 description 8
- 238000001914 filtration Methods 0.000 description 7
- 238000001228 spectrum Methods 0.000 description 4
- 230000001788 irregular Effects 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 238000001208 nuclear magnetic resonance pulse sequence Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 230000002123 temporal effect Effects 0.000 description 3
- FFBHFFJDDLITSX-UHFFFAOYSA-N benzyl N-[2-hydroxy-4-(3-oxomorpholin-4-yl)phenyl]carbamate Chemical compound OC1=C(NC(=O)OCC2=CC=CC=C2)C=CC(=C1)N1CCOCC1=O FFBHFFJDDLITSX-UHFFFAOYSA-N 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000020509 sex determination Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
The present invention relates to a kind of similarity analysis method based on sound characteristic, belong to Audio Signal Processing technical field.The present invention is compares the similitude of two audios to be measured, and it is that, using the amplitude in physical features, zero-crossing rate as basic parameter, compared for three kinds of physical characteristic parameter algorithms to carry out audio similarity to compare:Waveform comparison, envelope compare to be compared with zero-crossing rate.The calculating of Similarity value is carried out by correlation function;Set similarity threshold;Similarity value is compared with similar threshold value, carries out similarity determination.The present invention is compared available for the similarity of audio signal, can be applied in terms of the monitoring of broadcast television signal.Compared with prior art, inventive algorithm is simple, and theoretical clear, technology is easily realized.
Description
Technical field
The present invention relates to a kind of similarity analysis method based on sound characteristic, belong to Audio Signal Processing technical field.
Background technology
It is current urgent problem to be solved that safety is carried out to broadcast audio, is quickly and efficiently monitored, and existing at present
Most of researchs for audio content are mainly in terms of audio classification, audio retrieval, speech recognition, for these researchs
Algorithm complex is high, and when actual audio similarity is compared, these algorithms are difficult often to implement and apply.Existing base
In terms of audio research mainly audio classification, audio retrieval, the speech recognition of content, its algorithm complex is high, and theory is multiple
It is miscellaneous, it is difficult to implement in actual applications.
The content of the invention
The technical problem to be solved in the present invention is to provide a kind of similarity analysis method based on sound characteristic, pass through respectively
The calculating that the characteristic parameters such as waveform, envelope, zero-crossing rate carry out audio signal similarity is extracted, and it is similar to the result progress of calculating
Sex determination.
The technical scheme is that:A kind of similarity analysis method based on sound characteristic.This method includes following step
Suddenly:
(1) audio collection:Audio collection is to receive audio to be measured by microphone, and this process needs analog signal to convert
For data signal, set microphone to receive the channel number of audio, while setting sample rate, quantified precision, recover in order to undistorted
Former continuous signal, sample rate needs to meet nyquist sampling theorem;
(2) pre-process:Preprocessing process includes:Filtering process, preemphasis processing, adding window framing;
(3) data write-in wav file:Pretreated sequence is write wav file, this step can be by writing
MATLAB programs are realized;
(4) wav file data are read:The data value in wav file is read, this step can be by writing MATLAB programs
To realize;
(5) characteristic parameter extraction:Characteristic parameter, wave sequence, envelope sequence, zero-crossing rate are extracted from tonic train to be measured
Sequence;
(6) audio is compared:Three kinds of characteristic sequences of audio to be measured calculate Similarity value by correlation function respectively;
(7) similarity threshold is set:The threshold value of similitude is set, for judging the similitude of audio to be measured;
(8) similarity judges:Similarity Measure result and the threshold value of setting are compared, more than or equal to similarity threshold,
Judge that two audios to be measured are similar, otherwise, it is determined that being dissmilarity;
A kind of above-mentioned similarity analysis method based on sound characteristic, step (1) sound intermediate frequency collection, is passing through microphone
, it is necessary to set reception channel number when receiving testing audio, when receiving voice signal, monophonic is set to, when receiving music signal,
It is set to two-channel.Sample rate meets nyquist sampling theorem, sample rate fs≥2fh, fhFor signal highest frequency, reception sound
Road number is set to monophonic, and sample rate is set to 44.1KHz, and quantified precision is 16bit;
Pretreatment comprises the following steps in a kind of above-mentioned similarity analysis method based on sound characteristic, step (2):
(1) filtering process:The purpose of filtering process has two:Suppress frequency in each frequency component of input signal and exceed fs/2
Important (the f of institutesFor sample frequency), with anti-aliased interference;(2) 50Hz power supply Hz noise is suppressed.So, wave filter must
Must be a bandpass filter, if thereon, lower limiting frequency be f respectivelyHAnd fL, generally take fH=3400Hz, fL=60~
100Hz;
(2) preemphasis is handled:The purpose of preemphasis processing is lifting HFS, and making the frequency spectrum of signal becomes flat, is protected
Hold in the whole frequency band of low frequency to high frequency, frequency spectrum can be sought with same signal to noise ratio.Preemphasis is usually in voice signal numeral
After change, the preemphasis digital filtering of the lifting high frequency characteristics with 6dB/ octaves is used in computer before Parameter analysis
Device is realized.The usually digital filter of single order, i.e. H (Z)=1-uZ-1, wherein, u values are close to 1, and representative value is 0.94;
(3) adding window framing:Tonic train is the one-dimensional signal on time shaft, in order to carry out signal analysis to it, is needed
Assume that audio signal is in stable state in Millisecond other short time, therefore adding window is carried out to audio signal on this basis
Framing is operated.The method that contiguous segmentation can be used to the sub-frame processing of audio signal adding window, but it is in order that smoothed between frame and frame
Cross and keep its continuity, typically can be using the overlapping method being segmented.Framing is weighted with moveable finite length window
Method multiply s (n) to realize, that is, with certain window function w (n), so as to form the audio signal s of adding windoww(n)=
s(n)×w(n);
Characteristic parameter extraction includes following in a kind of above-mentioned similarity analysis method based on sound characteristic, step (5)
Step:
(1) wave sequence is extracted:The waveform of audio signal is, containing the irregular waveform for relatively enriching frequency distribution, to include sound
All temporal signatures of frequency signal, compare the time domain waveform of two audio signals, and comparing audio signal that can be complete is in time domain
All minutias, therefore similarity can be calculated using wave-shape amplitude value.Audio signal is that time and amplitude are all continuous
The One-dimensional simulation signal of change, wants in a computer to handle it is necessary to first be sampled and quantified, when it is become it
Between and amplitude be all discrete data signal.T is defined in the continuous variable on time shaft, and n is the integer value for representing sequence of points,
Sampling is exactly, using sampling pulse sequence p (t) " extraction " series of discrete sample value from continuous signal f (t), to obtain sampled signal
fs(t).Sampled signal fs(t) data signal f (n) is obtained by the preprocessing process quantified.If TsFor the sampling period, acoustic is treated
The highest frequency of frequency signal is fh, meet sampling thheorem, 1/Ts≥2fh.The comparison duration all same of audio to be measured, is set to T, false
If two audio time domain functions to be measured are x1And x (t)2(t), t is defined in the continuous variable on time shaft, orderN=T × (1/
TS), TsIt is normalized to 1, so, x1(nTs) and x2(nTs) x can be abbreviated as1And x (n)2(n), then x1And x (n)2(n) width
Degree by quantifying, that is, obtains the wave sequence x to be extracted again1' (n) and x2’(n);
(2) envelope sequence is extracted:Signal envelope is the curve of reflected waveform changes in amplitude, can describe the part of the signal
The situation of change of maximum.The time domain waveform of audio signal can be with all details compositions of comparing audio signal, and envelope is to compare letter
The profile of number waveform.Assuming that two audio time domain functions to be measured are x1And x (t)2(t), t is defined in the continuous change on time shaft
Amount, by Wave shape extracting method, can obtain audio volume control sequence x1' (n) and x2’(n).By envelope extraction flow:Audio wave
Shape sequence x1' (n) and x2' (n), take absolute value | x1' (n) | and | x2' (n) |, LPF, subtract DC component, finally obtain
Audio signal envelope sequence x to be measured1" (n) and x2”(n);
(3) zero-crossing rate sequential extraction procedures:Zero-crossing rate is a kind of simple feature in audio signal time-domain analysis, refers to signal by zero
The number of times of value, for continuous audio signal, can observe the situation of time domain waveform passage time axle.For discrete signal, zero passage
Number of times is the number of times of signal sampling value sign change.Assuming that two audio time domain functions to be measured are x1And x (t)2(t), t is definition
Continuous variable on a timeline, by Wave shape extracting method, can obtain wave sequence x1' (n) and x2' (n), pass through formulaWithMeter
Calculate x1' (n) and x2' (n) zero-crossing rate, in formula, LeffIt is sequence x1' (n) and x2' (n) setting time section in calculated
The sequence length of zero rate value, per 50ms sequence of calculation zero-crossing rate values, sgn is sign function, Z1、Z2It is sequence x respectively1' (n) and
x2' (n) in LeffZero-crossing rate value under length, zero-crossing rate sequence x is obtained by said process1" ' (n) and x2”’(n)。
A kind of above-mentioned similarity analysis method based on sound characteristic, step (6) sound intermediate frequency, which is compared, to be comprised the following steps:
(1) if the audio frequency characteristics parameter extracted is wave sequence, it is to calculate waveform sequence by cross-correlation function that audio, which is compared,
The similarity degree of row, the definition of cross-correlation function:
(2) if the audio frequency characteristics parameter extracted is envelope sequence, it is to calculate envelope sequence by cross-correlation function that audio, which is compared,
The similarity degree of row, the definition of cross-correlation function:
(3) if the audio frequency characteristics parameter extracted is zero-crossing rate sequence, it is to calculate zero passage by cross-correlation function that audio, which is compared,
The similarity degree of rate sequence, the definition of cross-correlation function:
Similarity threshold is set in a kind of above-mentioned similarity analysis method based on sound characteristic, step (7):With mutual
The peak value for closing function is maximum cross-correlation coefficient to determine whether comparison audio is similar.In wave sequence alignment algorithm, setting
Threshold value is 60%, in envelope and zero-crossing rate sequence alignment algorithms, and given threshold is 80%;
Similarity judges in a kind of above-mentioned similarity analysis method based on sound characteristic, step (8):Wave sequence value
Cross-correlation function peak value be determined as more than or equal to 60% similar, less than 60%, be determined as dissmilarity, in envelope and zero-crossing rate sequence
In row alignment algorithm, cross-correlation function peak value is determined as similar more than or equal to 80%, less than 80%, is determined as dissmilarity.
The beneficial effects of the invention are as follows:The present invention is compared available for the similarity of audio signal, can be applied in broadcast electricity
In terms of monitoring depending on signal.Compared with prior art, inventive algorithm is simple, and theoretical clear, technology is easily realized.
Brief description of the drawings
Fig. 1 is similarity-rough set flow chart of the present invention;
Fig. 2 is that audio signal wave sequence of the present invention extracts flow chart;
Fig. 3 is audio signal envelope sequential extraction procedures flow chart of the present invention;
Embodiment
With reference to the accompanying drawings and detailed description, the invention will be further described.
A kind of similarity analysis method based on sound characteristic, is concretely comprised the following steps:
(1) audio collection:Audio collection is to receive audio to be measured by microphone, and analog signal is converted into digital letter
Number;
(2) characteristic parameter extraction:Characteristic parameter, including wave sequence, envelope sequence, mistake are extracted from tonic train to be measured
Zero rate sequence;
(3) audio is compared:Three kinds of characteristic sequences of audio to be measured calculate Similarity value by correlation function respectively;
(4) similarity threshold is set:The threshold value of similitude is set, for judging the similitude of audio to be measured.
(5) similarity judges:Similarity Measure result and the threshold value of setting are compared, more than or equal to similarity threshold,
Judge that two audios to be measured are similar, otherwise, it is determined that being dissmilarity.
The audio collection by microphone when receiving audio to be measured, it is necessary to set reception channel number;When reception voice
During signal, monophonic is set to, when receiving music signal, two-channel is set to;Sample rate meets nyquist sampling theorem, adopts
Sample rate fs≥2fh, fhFor signal highest frequency.Receive channel number and be set to monophonic, sample rate is set to 44.1KHz, quantifies essence
Spend for 16bit;
The characteristic parameter extraction comprises the following steps:
(1) wave sequence is extracted:The waveform of audio signal is, containing the irregular waveform for relatively enriching frequency distribution, to include sound
All temporal signatures of frequency signal, compare the time domain waveform of two audio signals, and comparing audio signal that can be complete is in time domain
All minutias, therefore similarity can be calculated using wave-shape amplitude value.Audio signal is that time and amplitude are all continuous
The One-dimensional simulation signal of change, wants in a computer to handle it is necessary to first be sampled and quantified, when it is become it
Between and amplitude be all discrete data signal.Audio signal is sampled and quantified, audio signal is become into time and amplitude
All it is discrete data signal;T is defined in the continuous variable on time shaft, and n is the integer value for representing sequence of points, is sampled as profit
Series of discrete sample value is extracted from continuous signal f (t) with sampling pulse sequence p (t), sampled signal f is obtaineds(t), sampling letter
Number fs(t) data signal f (n) is obtained by the preprocessing process quantified;If TsFor sampling period, the highest of audio signal to be measured
Frequency is fh, meet sampling thheorem, 1/Ts≥2fh;The comparison duration of audio to be measured is identical, is set to T, it is assumed that two audios to be measured
Time-domain function is x1And x (t)2(t), t is defined in the continuous variable on time shaft;
OrderN
=T × (1/TS), by TsIt is normalized to 1, x1(nTs) and x2(nTs) it is designated as x1And x (n)2(n), then x1And x (n)2(n) width
Degree by quantifying, that is, obtains the wave sequence x to be extracted again1' (n) and x2’(n);
(2) envelope sequence is extracted:Signal envelope is the curve of reflected waveform changes in amplitude, can describe the part of the signal
The situation of change of maximum.The time domain waveform of audio signal can be with all details compositions of comparing audio signal, and envelope is to compare letter
The profile of number waveform.Assuming that two audio time domain functions to be measured are x1And x (t)2(t), t is defined in the continuous change on time shaft
Amount, by Wave shape extracting method, obtains audio volume control sequence x1' (n) and x2’(n);By envelope extraction flow:Audio volume control sequence
Arrange x1' (n) and x2' (n), take absolute value | x1' (n) | and | x2' (n) |, LPF, subtract DC component, finally obtain to be measured
Audio signal envelope sequence x1" (n) and x2”(n);
(3) zero-crossing rate sequential extraction procedures:Zero-crossing rate is a kind of simple feature in audio signal time-domain analysis, refers to signal by zero
The number of times of value, for continuous audio signal, can observe the situation of time domain waveform passage time axle.For discrete signal, zero passage
Number of times is the number of times of signal sampling value sign change.Assuming that two audio time domain functions to be measured are x1And x (t)2(t), t is definition
Continuous variable on a timeline, by Wave shape extracting method, obtains wave sequence x1' (n) and x2' (n),
X is calculated by formula (1) and (2)1' (n) and x2' (n) zero-crossing rate
In formula, LeffIt is sequence x1' (n) and x2' (n) setting time section in calculate zero-crossing rate value sequence length, sgn
For sign function, Z1、Z2It is sequence x respectively1' (n) and x2' (n) in LeffZero-crossing rate value under length, obtains zero-crossing rate sequence
x1" ' (n) and x2”’(n);
The audio, which is compared, to be comprised the following steps:
(1) if the audio frequency characteristics parameter extracted is wave sequence, it is to calculate waveform sequence by cross-correlation function that audio, which is compared,
The similarity degree of row, the definition of cross-correlation function:
(2) if the audio frequency characteristics parameter extracted is envelope sequence, it is to calculate envelope sequence by cross-correlation function that audio, which is compared,
The similarity degree of row, the definition of cross-correlation function:
(3) if the audio frequency characteristics parameter extracted is zero-crossing rate sequence, it is to calculate zero passage by cross-correlation function that audio, which is compared,
The similarity degree of rate sequence, the definition of cross-correlation function:
The similarity threshold is set as determining to compare audio with the i.e. maximum cross-correlation coefficient of the peak value of cross-correlation function
Whether similar, in wave sequence value alignment algorithm, given threshold is 60%, in envelope and zero-crossing rate sequence alignment algorithms,
Given threshold is 80%.
The similarity is determined as:
The cross-correlation function peak value of wave sequence value is determined as similar more than or equal to 60%, and not phase is determined as less than 60%
Seemingly;In envelope and zero-crossing rate sequence alignment algorithms, cross-correlation function peak value is determined as similar more than or equal to 80%, is less than
80%, it is determined as dissmilarity.
Embodiment 1:The audio similarity analysis of the present invention comprises the following steps:
(1) audio collection:Audio collection is to receive audio to be measured by microphone, and this process needs analog signal to convert
For data signal, set microphone to receive the channel number of audio, while setting sample rate, quantified precision, recover in order to undistorted
Former continuous signal, sample rate needs to meet nyquist sampling theorem;
(2) when receiving testing audio by microphone, it is necessary to set reception channel number, when receiving voice signal, set
For monophonic, when receiving music signal, two-channel is set to.Sample rate meets nyquist sampling theorem, sample rate fs≥2fh,
fhFor signal highest frequency,
Channel number will be received and be set to monophonic, sample rate is set to 44.1KHz, and quantified precision is 16bit.
(3) pre-process:Preprocessing process includes:Filtering process, preemphasis processing, adding window framing;
(4) purpose of filtering process has two:Suppress frequency in each frequency component of input signal and exceed fsAll points of/2
Measure (fsFor sample frequency), with anti-aliased interference;(2) 50Hz power supply Hz noise is suppressed.So, wave filter must be one
Bandpass filter, if thereon, lower limiting frequency be f respectivelyHAnd fL.Generally take fH=3400Hz, fL=60~100Hz;
(5) preemphasis is handled:The purpose of preemphasis processing is lifting HFS, and making the frequency spectrum of signal becomes flat, is protected
Hold in the whole frequency band of low frequency to high frequency, frequency spectrum can be sought with same signal to noise ratio.Preemphasis is usually in voice signal numeral
After change, the preemphasis digital filtering of the lifting high frequency characteristics with 6dB/ octaves is used in computer before Parameter analysis
Device is realized.The usually digital filter of single order, i.e. H (Z)=1-uZ-1, wherein, u values are close to 1, and representative value is 0.94;
(6) adding window framing:Tonic train is the one-dimensional signal on time shaft, in order to carry out signal analysis to it, is needed
Assume that audio signal is in stable state in Millisecond other short time, therefore adding window is carried out to audio signal on this basis
Framing is operated.The method that contiguous segmentation can be used to the sub-frame processing of audio signal adding window, but it is in order that smoothed between frame and frame
Cross and keep its continuity, typically can be using the overlapping method being segmented.Framing is weighted with moveable finite length window
Method multiply s (n) to realize, that is, with certain window function w (n), so as to form the audio signal s of adding windoww(n)=
s(n)×w(n)。
(7) data write-in wav file:Pretreated sequence is write wav file, this step can be by writing
MATLAB programs are realized;
(8) wav file data are read:The data value in wav file is read, this step can be by writing MATLAB programs
To realize;
(9) characteristic parameter extraction:Characteristic parameter, wave sequence, envelope sequence, zero-crossing rate are extracted from tonic train to be measured
Sequence;
(10) wave sequence is extracted:The waveform of audio signal is the irregular waveform containing relatively abundant frequency distribution, comprising
All temporal signatures of audio signal, compare the time domain waveform of two audio signals, comparing audio signal that can be complete when
All minutias in domain, therefore similarity can be calculated using wave-shape amplitude value.Audio signal is that time and amplitude all connect
The One-dimensional simulation signal of continuous change, wants in a computer to handle it is necessary to first be sampled and quantified it, it is become
Time and amplitude are all discrete data signals.T is defined in the continuous variable on time shaft, and n is the integer for representing sequence of points
Value, sampling is exactly, using sampling pulse sequence p (t) " extraction " series of discrete sample value from continuous signal f (t), to be sampled
Signal fs(t).Sampled signal fs(t) data signal f (n) is obtained by the preprocessing process quantified.If TsFor the sampling period, treat
The highest frequency for surveying audio signal is fh, meet sampling thheorem, 1/Ts≥2fh.The comparison duration all same of audio to be measured, is set to
T, it is assumed that two audio time domain functions to be measured are x1And x (t)2(t), t is defined in the continuous variable on time shaft, orderN=T × (1/
TS), TsIt is normalized to 1, so, x1(nTs) and x2(nTs) x can be abbreviated as1And x (n)2(n), then x1And x (n)2(n) width
Degree by quantifying, that is, obtains the wave sequence x to be extracted again1' (n) and x2’(n);
(11) envelope sequence is extracted, and signal envelope is the curve of reflected waveform changes in amplitude, can describe the office of the signal
The situation of change of portion's maximum.The time domain waveform of audio signal can be with all details compositions of comparing audio signal, and envelope is to compare
The profile of signal waveform.Assuming that two audio time domain functions to be measured are x1And x (t)2(t), t is defined in continuous on time shaft
Variable, by Wave shape extracting method, can obtain audio volume control sequence x1' (n) and x2’(n).By envelope extraction flow:Audio
Wave sequence x1' (n) and x2' (n), take absolute value | x1' (n) | and | x2' (n) |, LPF, subtract DC component, finally
To audio signal envelope sequence x to be measured1" (n) and x2”(n);
(12) zero-crossing rate sequential extraction procedures, zero-crossing rate is a kind of simple feature in audio signal time-domain analysis, refers to signal and passes through
The number of times of null value, for continuous audio signal, can observe the situation of time domain waveform passage time axle.For discrete signal, mistake
Zero degree number is the number of times of signal sampling value sign change.Assuming that two audio time domain functions to be measured are x1And x (t)2(t), t is fixed
The continuous variable of justice on a timeline, by Wave shape extracting method, can obtain wave sequence x1' (n) and x2' (n), pass through formulaWithCalculate
x1' (n) and x2' (n) zero-crossing rate, in formula, LeffIt is sequence x1' (n) and x2' (n) calculating zero-crossing rate in the section of setting time
The sequence length of value, sgn is sign function, Z1、Z2It is sequence x respectively1' (n) and x2' (n) in LeffZero-crossing rate value under length,
Zero-crossing rate sequence x is obtained by said process1" ' (n) and x2”’(n)。
(13) audio is compared:Three kinds of characteristic sequences of audio to be measured calculate Similarity value by correlation function respectively;Point
Not Ji Suan R (m) draw corresponding correlation, step is as follows:
If (a) the audio frequency characteristics parameter extracted is wave sequence, it is to calculate waveform sequence by cross-correlation function that audio, which is compared,
The similarity degree of row, the definition of cross-correlation function:
If (b) the audio frequency characteristics parameter extracted is envelope sequence, it is to calculate envelope sequence by cross-correlation function that audio, which is compared,
The similarity degree of row, the definition of cross-correlation function:
If (c) the audio frequency characteristics parameter extracted is zero-crossing rate sequence, it is to calculate zero passage by cross-correlation function that audio, which is compared,
The similarity degree of rate sequence, the definition of cross-correlation function:
(14) similarity threshold is set:The threshold value of similitude is set, for judging the similitude of audio to be measured.With cross-correlation
The peak value of function is maximum cross-correlation coefficient to determine whether comparison audio is similar.In wave sequence alignment algorithm, threshold is set
It is worth for 60%, in envelope and zero-crossing rate sequence alignment algorithms, given threshold is 80%.
(15) similarity judges:Similarity Measure result and the threshold value of setting are compared, more than or equal to similarity threshold,
Judge that two audios to be measured are similar, otherwise, it is determined that being dissmilarity;The cross-correlation function peak value of wave sequence value is more than or equal to 60%
It is determined as similar, less than 60%, is determined as dissmilarity, in envelope and zero-crossing rate sequence alignment algorithms, cross-correlation function peak value
It is determined as more than or equal to 80% similar, less than 80%, is determined as dissmilarity.
Above in association with accompanying drawing to the present invention embodiment be explained in detail, but the present invention be not limited to it is above-mentioned
Embodiment, can also be before present inventive concept not be departed from the knowledge that those of ordinary skill in the art possess
Put that various changes can be made.
Claims (6)
1. a kind of similarity analysis method based on sound characteristic, it is characterised in that concretely comprise the following steps:
(1) audio collection:Audio collection is to receive audio to be measured by microphone, and analog signal is converted into data signal;
(2) characteristic parameter extraction:Characteristic parameter, including wave sequence, envelope sequence, zero-crossing rate are extracted from tonic train to be measured
Sequence;
(3) audio is compared:Three kinds of characteristic sequences of audio to be measured calculate Similarity value by correlation function respectively;
(4) similarity threshold is set:The threshold value of similitude is set, for judging the similitude of audio to be measured.
(5) similarity judges:Similarity Measure result and the threshold value of setting are compared, more than or equal to similarity threshold, are judged
Two audios to be measured are similar, otherwise, it is determined that being dissmilarity.
2. the similarity analysis method according to claim 1 based on sound characteristic, it is characterised in that:The audio collection
When receiving audio to be measured by microphone, it is necessary to set reception channel number;When receiving voice signal, monophonic is set to,
When receiving music signal, two-channel is set to;Sample rate meets nyquist sampling theorem, sample rate fs≥2fh, fhFor signal
Highest frequency.
3. the similarity analysis method according to claim 1 based on sound characteristic, it is characterised in that:The characteristic parameter
Extraction comprises the following steps:
(1) wave sequence is extracted:Audio signal is sampled and quantified, it is all discrete that audio signal is become into time and amplitude
Data signal;T is defined in the continuous variable on time shaft, and n is the integer value for representing sequence of points, is sampled as utilizing arteries and veins of sampling
Rush sequence p (t) and series of discrete sample value is extracted from continuous signal f (t), obtain sampled signal fs(t), sampled signal fs(t) pass through
Cross the preprocessing process quantified and obtain data signal f (n);If TsFor the sampling period, the highest frequency of audio signal to be measured is fh,
Meet sampling thheorem, 1/Ts≥2fh;The comparison duration of audio to be measured is identical, is set to T, it is assumed that two audio time domain functions to be measured are
x1And x (t)2(t), t is defined in the continuous variable on time shaft;
Order
N=T × (1/TS), by TsIt is normalized to 1, x1(nTs) and x2(nTs) it is designated as x1And x (n)2(n), then x1And x (n)2(n) width
Degree by quantifying, that is, obtains the wave sequence x to be extracted again1' (n) and x2’(n);
(2) envelope sequence is extracted:Assuming that two audio time domain functions to be measured are x1And x (t)2(t), t is defined on time shaft
Continuous variable, by Wave shape extracting method, obtains audio volume control sequence x1' (n) and x2’(n);By envelope extraction flow:Audio
Wave sequence x1' (n) and x2' (n), take absolute value | x1' (n) | and | x2' (n) |, LPF, subtract DC component, finally
To audio signal envelope sequence x to be measured1" (n) and x2”(n);
(3) zero-crossing rate sequential extraction procedures:Assuming that two audio time domain functions to be measured are x1And x (t)2(t), t is defined on time shaft
Continuous variable, pass through Wave shape extracting method, obtain wave sequence x1' (n) and x2' (n),
X is calculated by formula (1) and (2)1' (n) and x2' (n) zero-crossing rate
In formula, LeffIt is sequence x1' (n) and x2' (n), in the interior sequence length for calculating zero-crossing rate value of the section of setting time, sgn is symbol
Number function, Z1、Z2It is sequence x respectively1' (n) and x2' (n) in LeffZero-crossing rate value under length, obtains zero-crossing rate sequence x1”’(n)
And x2”’(n)。
4. the similarity analysis method according to claim 1 based on sound characteristic, it is characterised in that:The audio is compared
Comprise the following steps:
(1) if the audio frequency characteristics parameter extracted is wave sequence, it is to calculate wave sequence by cross-correlation function that audio, which is compared,
Similarity degree, the definition of cross-correlation function:
(2) if the audio frequency characteristics parameter extracted is envelope sequence, it is to calculate envelope sequence by cross-correlation function that audio, which is compared,
Similarity degree, the definition of cross-correlation function:
(3) if the audio frequency characteristics parameter extracted is zero-crossing rate sequence, it is to calculate zero-crossing rate sequence by cross-correlation function that audio, which is compared,
The similarity degree of row, the definition of cross-correlation function:
5. the similarity analysis method according to claim 1 based on sound characteristic, it is characterised in that:The similitude threshold
Value is set as determining whether comparison audio is similar with the i.e. maximum cross-correlation coefficient of the peak value of cross-correlation function, in wave sequence value
In alignment algorithm, given threshold is 60%, in envelope and zero-crossing rate sequence alignment algorithms, and given threshold is 80%.
6. the similarity analysis method according to claim 1 based on sound characteristic, it is characterised in that:The similarity is sentenced
It is set to:
The cross-correlation function peak value of wave sequence value is determined as similar more than or equal to 60%, and dissmilarity is determined as less than 60%;
In envelope and zero-crossing rate sequence alignment algorithms, cross-correlation function peak value be determined as more than or equal to 80% it is similar, less than 80%, judge
For dissmilarity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710305251.4A CN107274911A (en) | 2017-05-03 | 2017-05-03 | A kind of similarity analysis method based on sound characteristic |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710305251.4A CN107274911A (en) | 2017-05-03 | 2017-05-03 | A kind of similarity analysis method based on sound characteristic |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107274911A true CN107274911A (en) | 2017-10-20 |
Family
ID=60073693
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710305251.4A Pending CN107274911A (en) | 2017-05-03 | 2017-05-03 | A kind of similarity analysis method based on sound characteristic |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107274911A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108231091A (en) * | 2018-01-24 | 2018-06-29 | 广州酷狗计算机科技有限公司 | A kind of whether consistent method and apparatus of left and right acoustic channels for detecting audio |
CN108615006A (en) * | 2018-04-23 | 2018-10-02 | 百度在线网络技术(北京)有限公司 | Method and apparatus for output information |
CN108711437A (en) * | 2018-03-06 | 2018-10-26 | 深圳市沃特沃德股份有限公司 | Method of speech processing and device |
CN108881652A (en) * | 2018-07-11 | 2018-11-23 | 北京大米科技有限公司 | Echo detection method, storage medium and electronic equipment |
CN109599104A (en) * | 2018-11-20 | 2019-04-09 | 北京小米智能科技有限公司 | Multi-beam choosing method and device |
CN109829265A (en) * | 2019-01-30 | 2019-05-31 | 杭州拾贝知识产权服务有限公司 | A kind of the infringement evidence collecting method and system of audio production |
CN110085259A (en) * | 2019-05-07 | 2019-08-02 | 国家广播电视总局中央广播电视发射二台 | Audio comparison method, device and equipment |
CN110134819A (en) * | 2019-04-25 | 2019-08-16 | 广州智伴人工智能科技有限公司 | A kind of speech audio screening system |
CN110310661A (en) * | 2019-07-03 | 2019-10-08 | 云南康木信科技有限责任公司 | A kind of calculation method of two-way real-time broadcast audio delay and similarity |
CN110491413A (en) * | 2019-08-21 | 2019-11-22 | 中国传媒大学 | A kind of audio content consistency monitoring method and system based on twin network |
CN110910899A (en) * | 2019-11-27 | 2020-03-24 | 杭州联汇科技股份有限公司 | Real-time audio signal consistency comparison detection method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102456346A (en) * | 2010-10-19 | 2012-05-16 | 盛乐信息技术(上海)有限公司 | Concatenated speech detection system and method |
CN103440873A (en) * | 2013-08-27 | 2013-12-11 | 大连理工大学 | Music recommendation method based on similarities |
CN105244040A (en) * | 2015-07-20 | 2016-01-13 | 杭州联汇数字科技有限公司 | Audio signal consistency comparison method |
-
2017
- 2017-05-03 CN CN201710305251.4A patent/CN107274911A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102456346A (en) * | 2010-10-19 | 2012-05-16 | 盛乐信息技术(上海)有限公司 | Concatenated speech detection system and method |
CN103440873A (en) * | 2013-08-27 | 2013-12-11 | 大连理工大学 | Music recommendation method based on similarities |
CN105244040A (en) * | 2015-07-20 | 2016-01-13 | 杭州联汇数字科技有限公司 | Audio signal consistency comparison method |
Non-Patent Citations (2)
Title |
---|
赵花婷: ""一种基于音频匹配的广告检测算法"", 《计算机与现代化》 * |
郭兴吉: "基于特征的音频比对技术", 《河南师范大学学报自然科学版》 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108231091A (en) * | 2018-01-24 | 2018-06-29 | 广州酷狗计算机科技有限公司 | A kind of whether consistent method and apparatus of left and right acoustic channels for detecting audio |
CN108231091B (en) * | 2018-01-24 | 2021-05-25 | 广州酷狗计算机科技有限公司 | Method and device for detecting whether left and right sound channels of audio are consistent |
CN108711437A (en) * | 2018-03-06 | 2018-10-26 | 深圳市沃特沃德股份有限公司 | Method of speech processing and device |
CN108615006B (en) * | 2018-04-23 | 2020-04-17 | 百度在线网络技术(北京)有限公司 | Method and apparatus for outputting information |
CN108615006A (en) * | 2018-04-23 | 2018-10-02 | 百度在线网络技术(北京)有限公司 | Method and apparatus for output information |
CN108881652A (en) * | 2018-07-11 | 2018-11-23 | 北京大米科技有限公司 | Echo detection method, storage medium and electronic equipment |
CN108881652B (en) * | 2018-07-11 | 2021-02-26 | 北京大米科技有限公司 | Echo detection method, storage medium and electronic device |
CN109599104A (en) * | 2018-11-20 | 2019-04-09 | 北京小米智能科技有限公司 | Multi-beam choosing method and device |
CN109599104B (en) * | 2018-11-20 | 2022-04-01 | 北京小米智能科技有限公司 | Multi-beam selection method and device |
CN109829265A (en) * | 2019-01-30 | 2019-05-31 | 杭州拾贝知识产权服务有限公司 | A kind of the infringement evidence collecting method and system of audio production |
CN110134819A (en) * | 2019-04-25 | 2019-08-16 | 广州智伴人工智能科技有限公司 | A kind of speech audio screening system |
CN110085259A (en) * | 2019-05-07 | 2019-08-02 | 国家广播电视总局中央广播电视发射二台 | Audio comparison method, device and equipment |
CN110085259B (en) * | 2019-05-07 | 2021-09-17 | 国家广播电视总局中央广播电视发射二台 | Audio comparison method, device and equipment |
CN110310661A (en) * | 2019-07-03 | 2019-10-08 | 云南康木信科技有限责任公司 | A kind of calculation method of two-way real-time broadcast audio delay and similarity |
CN110310661B (en) * | 2019-07-03 | 2021-06-11 | 云南康木信科技有限责任公司 | Method for calculating two-path real-time broadcast audio time delay and similarity |
CN110491413A (en) * | 2019-08-21 | 2019-11-22 | 中国传媒大学 | A kind of audio content consistency monitoring method and system based on twin network |
CN110910899A (en) * | 2019-11-27 | 2020-03-24 | 杭州联汇科技股份有限公司 | Real-time audio signal consistency comparison detection method |
CN110910899B (en) * | 2019-11-27 | 2022-04-08 | 杭州联汇科技股份有限公司 | Real-time audio signal consistency comparison detection method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107274911A (en) | A kind of similarity analysis method based on sound characteristic | |
KR101269296B1 (en) | Neural network classifier for separating audio sources from a monophonic audio signal | |
CN109256127B (en) | Robust voice feature extraction method based on nonlinear power transformation Gamma chirp filter | |
CN103236260A (en) | Voice recognition system | |
EP3701528A1 (en) | Segmentation-based feature extraction for acoustic scene classification | |
KR20060044629A (en) | Isolating speech signals utilizing neural networks | |
Pillos et al. | A Real-Time Environmental Sound Recognition System for the Android OS. | |
CN108682432B (en) | Speech emotion recognition device | |
WO2017045429A1 (en) | Audio data detection method and system and storage medium | |
CN106024010A (en) | Speech signal dynamic characteristic extraction method based on formant curves | |
CN101625860A (en) | Method for self-adaptively adjusting background noise in voice endpoint detection | |
CN112786059A (en) | Voiceprint feature extraction method and device based on artificial intelligence | |
WO2018095167A1 (en) | Voiceprint identification method and voiceprint identification system | |
CN110782915A (en) | Waveform music component separation method based on deep learning | |
Jaafar et al. | Automatic syllables segmentation for frog identification system | |
Labied et al. | An overview of automatic speech recognition preprocessing techniques | |
Pilia et al. | Time scaling detection and estimation in audio recordings | |
Martin et al. | Cepstral modulation ratio regression (CMRARE) parameters for audio signal analysis and classification | |
VH et al. | A study on speech recognition technology | |
CN106653040A (en) | Voice audio signal sampling processing method | |
CN110689875A (en) | Language identification method and device and readable storage medium | |
Zengyuan et al. | A speech denoising algorithm based on harmonic regeneration | |
Nandyala et al. | Real time isolated word recognition using adaptive algorithm | |
Ge et al. | Design and Implementation of Intelligent Singer Recognition System | |
Wei et al. | A Survey of Sound-based Biometrics used in Species Recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171020 |