CN107274911A

CN107274911A - A kind of similarity analysis method based on sound characteristic

Info

Publication number: CN107274911A
Application number: CN201710305251.4A
Authority: CN
Inventors: 龙华; 张琳; 邵玉斌; 杜庆治
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2017-05-03
Filing date: 2017-05-03
Publication date: 2017-10-20

Abstract

The present invention relates to a kind of similarity analysis method based on sound characteristic, belong to Audio Signal Processing technical field.The present invention is compares the similitude of two audios to be measured, and it is that, using the amplitude in physical features, zero-crossing rate as basic parameter, compared for three kinds of physical characteristic parameter algorithms to carry out audio similarity to compare：Waveform comparison, envelope compare to be compared with zero-crossing rate.The calculating of Similarity value is carried out by correlation function；Set similarity threshold；Similarity value is compared with similar threshold value, carries out similarity determination.The present invention is compared available for the similarity of audio signal, can be applied in terms of the monitoring of broadcast television signal.Compared with prior art, inventive algorithm is simple, and theoretical clear, technology is easily realized.

Description

A kind of similarity analysis method based on sound characteristic

Technical field

The present invention relates to a kind of similarity analysis method based on sound characteristic, belong to Audio Signal Processing technical field.

Background technology

It is current urgent problem to be solved that safety is carried out to broadcast audio, is quickly and efficiently monitored, and existing at present Most of researchs for audio content are mainly in terms of audio classification, audio retrieval, speech recognition, for these researchs Algorithm complex is high, and when actual audio similarity is compared, these algorithms are difficult often to implement and apply.Existing base In terms of audio research mainly audio classification, audio retrieval, the speech recognition of content, its algorithm complex is high, and theory is multiple It is miscellaneous, it is difficult to implement in actual applications.

The content of the invention

The technical problem to be solved in the present invention is to provide a kind of similarity analysis method based on sound characteristic, pass through respectively The calculating that the characteristic parameters such as waveform, envelope, zero-crossing rate carry out audio signal similarity is extracted, and it is similar to the result progress of calculating Sex determination.

The technical scheme is that：A kind of similarity analysis method based on sound characteristic.This method includes following step Suddenly：

(1) audio collection：Audio collection is to receive audio to be measured by microphone, and this process needs analog signal to convert For data signal, set microphone to receive the channel number of audio, while setting sample rate, quantified precision, recover in order to undistorted Former continuous signal, sample rate needs to meet nyquist sampling theorem；

(2) pre-process：Preprocessing process includes：Filtering process, preemphasis processing, adding window framing；

(3) data write-in wav file：Pretreated sequence is write wav file, this step can be by writing MATLAB programs are realized；

(4) wav file data are read：The data value in wav file is read, this step can be by writing MATLAB programs To realize；

(5) characteristic parameter extraction：Characteristic parameter, wave sequence, envelope sequence, zero-crossing rate are extracted from tonic train to be measured Sequence；

(6) audio is compared：Three kinds of characteristic sequences of audio to be measured calculate Similarity value by correlation function respectively；

(7) similarity threshold is set：The threshold value of similitude is set, for judging the similitude of audio to be measured；

(8) similarity judges：Similarity Measure result and the threshold value of setting are compared, more than or equal to similarity threshold, Judge that two audios to be measured are similar, otherwise, it is determined that being dissmilarity；

A kind of above-mentioned similarity analysis method based on sound characteristic, step (1) sound intermediate frequency collection, is passing through microphone , it is necessary to set reception channel number when receiving testing audio, when receiving voice signal, monophonic is set to, when receiving music signal, It is set to two-channel.Sample rate meets nyquist sampling theorem, sample rate f_s≥2f_h, f_hFor signal highest frequency, reception sound Road number is set to monophonic, and sample rate is set to 44.1KHz, and quantified precision is 16bit；

Pretreatment comprises the following steps in a kind of above-mentioned similarity analysis method based on sound characteristic, step (2)：

(1) filtering process：The purpose of filtering process has two：Suppress frequency in each frequency component of input signal and exceed f_s/2 Important (the f of institute_sFor sample frequency), with anti-aliased interference；(2) 50Hz power supply Hz noise is suppressed.So, wave filter must Must be a bandpass filter, if thereon, lower limiting frequency be f respectively_HAnd f_L, generally take f_H=3400Hz, f_L=60~ 100Hz；

(2) preemphasis is handled：The purpose of preemphasis processing is lifting HFS, and making the frequency spectrum of signal becomes flat, is protected Hold in the whole frequency band of low frequency to high frequency, frequency spectrum can be sought with same signal to noise ratio.Preemphasis is usually in voice signal numeral After change, the preemphasis digital filtering of the lifting high frequency characteristics with 6dB/ octaves is used in computer before Parameter analysis Device is realized.The usually digital filter of single order, i.e. H (Z)=1-uZ^-1, wherein, u values are close to 1, and representative value is 0.94；

(3) adding window framing：Tonic train is the one-dimensional signal on time shaft, in order to carry out signal analysis to it, is needed Assume that audio signal is in stable state in Millisecond other short time, therefore adding window is carried out to audio signal on this basis Framing is operated.The method that contiguous segmentation can be used to the sub-frame processing of audio signal adding window, but it is in order that smoothed between frame and frame Cross and keep its continuity, typically can be using the overlapping method being segmented.Framing is weighted with moveable finite length window Method multiply s (n) to realize, that is, with certain window function w (n), so as to form the audio signal s of adding window_w(n)= s(n)×w(n)；

Characteristic parameter extraction includes following in a kind of above-mentioned similarity analysis method based on sound characteristic, step (5) Step：

(1) wave sequence is extracted：The waveform of audio signal is, containing the irregular waveform for relatively enriching frequency distribution, to include sound All temporal signatures of frequency signal, compare the time domain waveform of two audio signals, and comparing audio signal that can be complete is in time domain All minutias, therefore similarity can be calculated using wave-shape amplitude value.Audio signal is that time and amplitude are all continuous The One-dimensional simulation signal of change, wants in a computer to handle it is necessary to first be sampled and quantified, when it is become it Between and amplitude be all discrete data signal.T is defined in the continuous variable on time shaft, and n is the integer value for representing sequence of points, Sampling is exactly, using sampling pulse sequence p (t) " extraction " series of discrete sample value from continuous signal f (t), to obtain sampled signal f_s(t).Sampled signal f_s(t) data signal f (n) is obtained by the preprocessing process quantified.If T_sFor the sampling period, acoustic is treated The highest frequency of frequency signal is f_h, meet sampling thheorem, 1/T_s≥2f_h.The comparison duration all same of audio to be measured, is set to T, false If two audio time domain functions to be measured are x₁And x (t)₂(t), t is defined in the continuous variable on time shaft, orderN=T × (1/ T_S), T_sIt is normalized to 1, so, x₁(nT_s) and x₂(nT_s) x can be abbreviated as₁And x (n)₂(n), then x₁And x (n)₂(n) width Degree by quantifying, that is, obtains the wave sequence x to be extracted again₁' (n) and x₂’(n)；

(2) envelope sequence is extracted：Signal envelope is the curve of reflected waveform changes in amplitude, can describe the part of the signal The situation of change of maximum.The time domain waveform of audio signal can be with all details compositions of comparing audio signal, and envelope is to compare letter The profile of number waveform.Assuming that two audio time domain functions to be measured are x₁And x (t)₂(t), t is defined in the continuous change on time shaft Amount, by Wave shape extracting method, can obtain audio volume control sequence x₁' (n) and x₂’(n).By envelope extraction flow：Audio wave Shape sequence x₁' (n) and x₂' (n), take absolute value | x₁' (n) | and | x₂' (n) |, LPF, subtract DC component, finally obtain Audio signal envelope sequence x to be measured₁" (n) and x₂”(n)；

(3) zero-crossing rate sequential extraction procedures：Zero-crossing rate is a kind of simple feature in audio signal time-domain analysis, refers to signal by zero The number of times of value, for continuous audio signal, can observe the situation of time domain waveform passage time axle.For discrete signal, zero passage Number of times is the number of times of signal sampling value sign change.Assuming that two audio time domain functions to be measured are x₁And x (t)₂(t), t is definition Continuous variable on a timeline, by Wave shape extracting method, can obtain wave sequence x₁' (n) and x₂' (n), pass through formulaWithMeter Calculate x₁' (n) and x₂' (n) zero-crossing rate, in formula, L_effIt is sequence x₁' (n) and x₂' (n) setting time section in calculated The sequence length of zero rate value, per 50ms sequence of calculation zero-crossing rate values, sgn is sign function, Z₁、Z₂It is sequence x respectively₁' (n) and x₂' (n) in L_effZero-crossing rate value under length, zero-crossing rate sequence x is obtained by said process₁" ' (n) and x₂”’(n)。

A kind of above-mentioned similarity analysis method based on sound characteristic, step (6) sound intermediate frequency, which is compared, to be comprised the following steps：

(1) if the audio frequency characteristics parameter extracted is wave sequence, it is to calculate waveform sequence by cross-correlation function that audio, which is compared, The similarity degree of row, the definition of cross-correlation function：

(2) if the audio frequency characteristics parameter extracted is envelope sequence, it is to calculate envelope sequence by cross-correlation function that audio, which is compared, The similarity degree of row, the definition of cross-correlation function：

(3) if the audio frequency characteristics parameter extracted is zero-crossing rate sequence, it is to calculate zero passage by cross-correlation function that audio, which is compared, The similarity degree of rate sequence, the definition of cross-correlation function：

Similarity threshold is set in a kind of above-mentioned similarity analysis method based on sound characteristic, step (7)：With mutual The peak value for closing function is maximum cross-correlation coefficient to determine whether comparison audio is similar.In wave sequence alignment algorithm, setting Threshold value is 60%, in envelope and zero-crossing rate sequence alignment algorithms, and given threshold is 80%；

Similarity judges in a kind of above-mentioned similarity analysis method based on sound characteristic, step (8)：Wave sequence value Cross-correlation function peak value be determined as more than or equal to 60% similar, less than 60%, be determined as dissmilarity, in envelope and zero-crossing rate sequence In row alignment algorithm, cross-correlation function peak value is determined as similar more than or equal to 80%, less than 80%, is determined as dissmilarity.

The beneficial effects of the invention are as follows：The present invention is compared available for the similarity of audio signal, can be applied in broadcast electricity In terms of monitoring depending on signal.Compared with prior art, inventive algorithm is simple, and theoretical clear, technology is easily realized.

Brief description of the drawings

Fig. 1 is similarity-rough set flow chart of the present invention；

Fig. 2 is that audio signal wave sequence of the present invention extracts flow chart；

Fig. 3 is audio signal envelope sequential extraction procedures flow chart of the present invention；

Embodiment

With reference to the accompanying drawings and detailed description, the invention will be further described.

A kind of similarity analysis method based on sound characteristic, is concretely comprised the following steps：

(1) audio collection：Audio collection is to receive audio to be measured by microphone, and analog signal is converted into digital letter Number；

(2) characteristic parameter extraction：Characteristic parameter, including wave sequence, envelope sequence, mistake are extracted from tonic train to be measured Zero rate sequence；

(3) audio is compared：Three kinds of characteristic sequences of audio to be measured calculate Similarity value by correlation function respectively；

(4) similarity threshold is set：The threshold value of similitude is set, for judging the similitude of audio to be measured.

(5) similarity judges：Similarity Measure result and the threshold value of setting are compared, more than or equal to similarity threshold, Judge that two audios to be measured are similar, otherwise, it is determined that being dissmilarity.

The audio collection by microphone when receiving audio to be measured, it is necessary to set reception channel number；When reception voice During signal, monophonic is set to, when receiving music signal, two-channel is set to；Sample rate meets nyquist sampling theorem, adopts Sample rate f_s≥2f_h, f_hFor signal highest frequency.Receive channel number and be set to monophonic, sample rate is set to 44.1KHz, quantifies essence Spend for 16bit；

The characteristic parameter extraction comprises the following steps：

(1) wave sequence is extracted：The waveform of audio signal is, containing the irregular waveform for relatively enriching frequency distribution, to include sound All temporal signatures of frequency signal, compare the time domain waveform of two audio signals, and comparing audio signal that can be complete is in time domain All minutias, therefore similarity can be calculated using wave-shape amplitude value.Audio signal is that time and amplitude are all continuous The One-dimensional simulation signal of change, wants in a computer to handle it is necessary to first be sampled and quantified, when it is become it Between and amplitude be all discrete data signal.Audio signal is sampled and quantified, audio signal is become into time and amplitude All it is discrete data signal；T is defined in the continuous variable on time shaft, and n is the integer value for representing sequence of points, is sampled as profit Series of discrete sample value is extracted from continuous signal f (t) with sampling pulse sequence p (t), sampled signal f is obtained_s(t), sampling letter Number f_s(t) data signal f (n) is obtained by the preprocessing process quantified；If T_sFor sampling period, the highest of audio signal to be measured Frequency is f_h, meet sampling thheorem, 1/T_s≥2f_h；The comparison duration of audio to be measured is identical, is set to T, it is assumed that two audios to be measured Time-domain function is x₁And x (t)₂(t), t is defined in the continuous variable on time shaft；

OrderN =T × (1/T_S), by T_sIt is normalized to 1, x₁(nT_s) and x₂(nT_s) it is designated as x₁And x (n)₂(n), then x₁And x (n)₂(n) width Degree by quantifying, that is, obtains the wave sequence x to be extracted again₁' (n) and x₂’(n)；

(2) envelope sequence is extracted：Signal envelope is the curve of reflected waveform changes in amplitude, can describe the part of the signal The situation of change of maximum.The time domain waveform of audio signal can be with all details compositions of comparing audio signal, and envelope is to compare letter The profile of number waveform.Assuming that two audio time domain functions to be measured are x₁And x (t)₂(t), t is defined in the continuous change on time shaft Amount, by Wave shape extracting method, obtains audio volume control sequence x₁' (n) and x₂’(n)；By envelope extraction flow：Audio volume control sequence Arrange x₁' (n) and x₂' (n), take absolute value | x₁' (n) | and | x₂' (n) |, LPF, subtract DC component, finally obtain to be measured Audio signal envelope sequence x₁" (n) and x₂”(n)；

(3) zero-crossing rate sequential extraction procedures：Zero-crossing rate is a kind of simple feature in audio signal time-domain analysis, refers to signal by zero The number of times of value, for continuous audio signal, can observe the situation of time domain waveform passage time axle.For discrete signal, zero passage Number of times is the number of times of signal sampling value sign change.Assuming that two audio time domain functions to be measured are x₁And x (t)₂(t), t is definition Continuous variable on a timeline, by Wave shape extracting method, obtains wave sequence x₁' (n) and x₂' (n),

X is calculated by formula (1) and (2)₁' (n) and x₂' (n) zero-crossing rate

In formula, L_effIt is sequence x₁' (n) and x₂' (n) setting time section in calculate zero-crossing rate value sequence length, sgn For sign function, Z₁、Z₂It is sequence x respectively₁' (n) and x₂' (n) in L_effZero-crossing rate value under length, obtains zero-crossing rate sequence x₁" ' (n) and x₂”’(n)；

The audio, which is compared, to be comprised the following steps：

The similarity threshold is set as determining to compare audio with the i.e. maximum cross-correlation coefficient of the peak value of cross-correlation function Whether similar, in wave sequence value alignment algorithm, given threshold is 60%, in envelope and zero-crossing rate sequence alignment algorithms, Given threshold is 80%.

The similarity is determined as：

The cross-correlation function peak value of wave sequence value is determined as similar more than or equal to 60%, and not phase is determined as less than 60% Seemingly；In envelope and zero-crossing rate sequence alignment algorithms, cross-correlation function peak value is determined as similar more than or equal to 80%, is less than 80%, it is determined as dissmilarity.

Embodiment 1：The audio similarity analysis of the present invention comprises the following steps：

(2) when receiving testing audio by microphone, it is necessary to set reception channel number, when receiving voice signal, set For monophonic, when receiving music signal, two-channel is set to.Sample rate meets nyquist sampling theorem, sample rate f_s≥2f_h, f_hFor signal highest frequency,

Channel number will be received and be set to monophonic, sample rate is set to 44.1KHz, and quantified precision is 16bit.

(3) pre-process：Preprocessing process includes：Filtering process, preemphasis processing, adding window framing；

(4) purpose of filtering process has two：Suppress frequency in each frequency component of input signal and exceed f_sAll points of/2 Measure (f_sFor sample frequency), with anti-aliased interference；(2) 50Hz power supply Hz noise is suppressed.So, wave filter must be one Bandpass filter, if thereon, lower limiting frequency be f respectively_HAnd f_L.Generally take f_H=3400Hz, f_L=60~100Hz；

(5) preemphasis is handled：The purpose of preemphasis processing is lifting HFS, and making the frequency spectrum of signal becomes flat, is protected Hold in the whole frequency band of low frequency to high frequency, frequency spectrum can be sought with same signal to noise ratio.Preemphasis is usually in voice signal numeral After change, the preemphasis digital filtering of the lifting high frequency characteristics with 6dB/ octaves is used in computer before Parameter analysis Device is realized.The usually digital filter of single order, i.e. H (Z)=1-uZ^-1, wherein, u values are close to 1, and representative value is 0.94；

(6) adding window framing：Tonic train is the one-dimensional signal on time shaft, in order to carry out signal analysis to it, is needed Assume that audio signal is in stable state in Millisecond other short time, therefore adding window is carried out to audio signal on this basis Framing is operated.The method that contiguous segmentation can be used to the sub-frame processing of audio signal adding window, but it is in order that smoothed between frame and frame Cross and keep its continuity, typically can be using the overlapping method being segmented.Framing is weighted with moveable finite length window Method multiply s (n) to realize, that is, with certain window function w (n), so as to form the audio signal s of adding window_w(n)= s(n)×w(n)。

(7) data write-in wav file：Pretreated sequence is write wav file, this step can be by writing MATLAB programs are realized；

(8) wav file data are read：The data value in wav file is read, this step can be by writing MATLAB programs To realize；

(9) characteristic parameter extraction：Characteristic parameter, wave sequence, envelope sequence, zero-crossing rate are extracted from tonic train to be measured Sequence；

(10) wave sequence is extracted：The waveform of audio signal is the irregular waveform containing relatively abundant frequency distribution, comprising All temporal signatures of audio signal, compare the time domain waveform of two audio signals, comparing audio signal that can be complete when All minutias in domain, therefore similarity can be calculated using wave-shape amplitude value.Audio signal is that time and amplitude all connect The One-dimensional simulation signal of continuous change, wants in a computer to handle it is necessary to first be sampled and quantified it, it is become Time and amplitude are all discrete data signals.T is defined in the continuous variable on time shaft, and n is the integer for representing sequence of points Value, sampling is exactly, using sampling pulse sequence p (t) " extraction " series of discrete sample value from continuous signal f (t), to be sampled Signal f_s(t).Sampled signal f_s(t) data signal f (n) is obtained by the preprocessing process quantified.If T_sFor the sampling period, treat The highest frequency for surveying audio signal is f_h, meet sampling thheorem, 1/T_s≥2f_h.The comparison duration all same of audio to be measured, is set to T, it is assumed that two audio time domain functions to be measured are x₁And x (t)₂(t), t is defined in the continuous variable on time shaft, orderN=T × (1/ T_S), T_sIt is normalized to 1, so, x₁(nT_s) and x₂(nT_s) x can be abbreviated as₁And x (n)₂(n), then x₁And x (n)₂(n) width Degree by quantifying, that is, obtains the wave sequence x to be extracted again₁' (n) and x₂’(n)；

(11) envelope sequence is extracted, and signal envelope is the curve of reflected waveform changes in amplitude, can describe the office of the signal The situation of change of portion's maximum.The time domain waveform of audio signal can be with all details compositions of comparing audio signal, and envelope is to compare The profile of signal waveform.Assuming that two audio time domain functions to be measured are x₁And x (t)₂(t), t is defined in continuous on time shaft Variable, by Wave shape extracting method, can obtain audio volume control sequence x₁' (n) and x₂’(n).By envelope extraction flow：Audio Wave sequence x₁' (n) and x₂' (n), take absolute value | x₁' (n) | and | x₂' (n) |, LPF, subtract DC component, finally To audio signal envelope sequence x to be measured₁" (n) and x₂”(n)；

(12) zero-crossing rate sequential extraction procedures, zero-crossing rate is a kind of simple feature in audio signal time-domain analysis, refers to signal and passes through The number of times of null value, for continuous audio signal, can observe the situation of time domain waveform passage time axle.For discrete signal, mistake Zero degree number is the number of times of signal sampling value sign change.Assuming that two audio time domain functions to be measured are x₁And x (t)₂(t), t is fixed The continuous variable of justice on a timeline, by Wave shape extracting method, can obtain wave sequence x₁' (n) and x₂' (n), pass through formulaWithCalculate x₁' (n) and x₂' (n) zero-crossing rate, in formula, L_effIt is sequence x₁' (n) and x₂' (n) calculating zero-crossing rate in the section of setting time The sequence length of value, sgn is sign function, Z₁、Z₂It is sequence x respectively₁' (n) and x₂' (n) in L_effZero-crossing rate value under length, Zero-crossing rate sequence x is obtained by said process₁" ' (n) and x₂”’(n)。

(13) audio is compared：Three kinds of characteristic sequences of audio to be measured calculate Similarity value by correlation function respectively；Point Not Ji Suan R (m) draw corresponding correlation, step is as follows：

If (a) the audio frequency characteristics parameter extracted is wave sequence, it is to calculate waveform sequence by cross-correlation function that audio, which is compared, The similarity degree of row, the definition of cross-correlation function：

If (b) the audio frequency characteristics parameter extracted is envelope sequence, it is to calculate envelope sequence by cross-correlation function that audio, which is compared, The similarity degree of row, the definition of cross-correlation function：

If (c) the audio frequency characteristics parameter extracted is zero-crossing rate sequence, it is to calculate zero passage by cross-correlation function that audio, which is compared, The similarity degree of rate sequence, the definition of cross-correlation function：

(14) similarity threshold is set：The threshold value of similitude is set, for judging the similitude of audio to be measured.With cross-correlation The peak value of function is maximum cross-correlation coefficient to determine whether comparison audio is similar.In wave sequence alignment algorithm, threshold is set It is worth for 60%, in envelope and zero-crossing rate sequence alignment algorithms, given threshold is 80%.

(15) similarity judges：Similarity Measure result and the threshold value of setting are compared, more than or equal to similarity threshold, Judge that two audios to be measured are similar, otherwise, it is determined that being dissmilarity；The cross-correlation function peak value of wave sequence value is more than or equal to 60% It is determined as similar, less than 60%, is determined as dissmilarity, in envelope and zero-crossing rate sequence alignment algorithms, cross-correlation function peak value It is determined as more than or equal to 80% similar, less than 80%, is determined as dissmilarity.

Above in association with accompanying drawing to the present invention embodiment be explained in detail, but the present invention be not limited to it is above-mentioned Embodiment, can also be before present inventive concept not be departed from the knowledge that those of ordinary skill in the art possess Put that various changes can be made.

Claims

1. a kind of similarity analysis method based on sound characteristic, it is characterised in that concretely comprise the following steps：

(1) audio collection：Audio collection is to receive audio to be measured by microphone, and analog signal is converted into data signal；

(2) characteristic parameter extraction：Characteristic parameter, including wave sequence, envelope sequence, zero-crossing rate are extracted from tonic train to be measured Sequence；

(5) similarity judges：Similarity Measure result and the threshold value of setting are compared, more than or equal to similarity threshold, are judged Two audios to be measured are similar, otherwise, it is determined that being dissmilarity.

2. the similarity analysis method according to claim 1 based on sound characteristic, it is characterised in that：The audio collection When receiving audio to be measured by microphone, it is necessary to set reception channel number；When receiving voice signal, monophonic is set to, When receiving music signal, two-channel is set to；Sample rate meets nyquist sampling theorem, sample rate f_s≥2f_h, f_hFor signal Highest frequency.

3. the similarity analysis method according to claim 1 based on sound characteristic, it is characterised in that：The characteristic parameter Extraction comprises the following steps：

(1) wave sequence is extracted：Audio signal is sampled and quantified, it is all discrete that audio signal is become into time and amplitude Data signal；T is defined in the continuous variable on time shaft, and n is the integer value for representing sequence of points, is sampled as utilizing arteries and veins of sampling Rush sequence p (t) and series of discrete sample value is extracted from continuous signal f (t), obtain sampled signal f_s(t), sampled signal f_s(t) pass through Cross the preprocessing process quantified and obtain data signal f (n)；If T_sFor the sampling period, the highest frequency of audio signal to be measured is f_h, Meet sampling thheorem, 1/T_s≥2f_h；The comparison duration of audio to be measured is identical, is set to T, it is assumed that two audio time domain functions to be measured are x₁And x (t)₂(t), t is defined in the continuous variable on time shaft；

Order N=T × (1/T_S), by T_sIt is normalized to 1, x₁(nT_s) and x₂(nT_s) it is designated as x₁And x (n)₂(n), then x₁And x (n)₂(n) width Degree by quantifying, that is, obtains the wave sequence x to be extracted again₁' (n) and x₂’(n)；

(2) envelope sequence is extracted：Assuming that two audio time domain functions to be measured are x₁And x (t)₂(t), t is defined on time shaft Continuous variable, by Wave shape extracting method, obtains audio volume control sequence x₁' (n) and x₂’(n)；By envelope extraction flow：Audio Wave sequence x₁' (n) and x₂' (n), take absolute value | x₁' (n) | and | x₂' (n) |, LPF, subtract DC component, finally To audio signal envelope sequence x to be measured₁" (n) and x₂”(n)；

(3) zero-crossing rate sequential extraction procedures：Assuming that two audio time domain functions to be measured are x₁And x (t)₂(t), t is defined on time shaft Continuous variable, pass through Wave shape extracting method, obtain wave sequence x₁' (n) and x₂' (n),

X is calculated by formula (1) and (2)₁' (n) and x₂' (n) zero-crossing rate

In formula, L_effIt is sequence x₁' (n) and x₂' (n), in the interior sequence length for calculating zero-crossing rate value of the section of setting time, sgn is symbol Number function, Z₁、Z₂It is sequence x respectively₁' (n) and x₂' (n) in L_effZero-crossing rate value under length, obtains zero-crossing rate sequence x₁”’(n) And x₂”’(n)。

4. the similarity analysis method according to claim 1 based on sound characteristic, it is characterised in that：The audio is compared Comprise the following steps：

(1) if the audio frequency characteristics parameter extracted is wave sequence, it is to calculate wave sequence by cross-correlation function that audio, which is compared, Similarity degree, the definition of cross-correlation function：

(2) if the audio frequency characteristics parameter extracted is envelope sequence, it is to calculate envelope sequence by cross-correlation function that audio, which is compared, Similarity degree, the definition of cross-correlation function：

(3) if the audio frequency characteristics parameter extracted is zero-crossing rate sequence, it is to calculate zero-crossing rate sequence by cross-correlation function that audio, which is compared, The similarity degree of row, the definition of cross-correlation function：

5. the similarity analysis method according to claim 1 based on sound characteristic, it is characterised in that：The similitude threshold Value is set as determining whether comparison audio is similar with the i.e. maximum cross-correlation coefficient of the peak value of cross-correlation function, in wave sequence value In alignment algorithm, given threshold is 60%, in envelope and zero-crossing rate sequence alignment algorithms, and given threshold is 80%.

6. the similarity analysis method according to claim 1 based on sound characteristic, it is characterised in that：The similarity is sentenced It is set to：

The cross-correlation function peak value of wave sequence value is determined as similar more than or equal to 60%, and dissmilarity is determined as less than 60%； In envelope and zero-crossing rate sequence alignment algorithms, cross-correlation function peak value be determined as more than or equal to 80% it is similar, less than 80%, judge For dissmilarity.