CN107274911A - A kind of similarity analysis method based on sound characteristic - Google Patents

A kind of similarity analysis method based on sound characteristic Download PDF

Info

Publication number
CN107274911A
CN107274911A CN201710305251.4A CN201710305251A CN107274911A CN 107274911 A CN107274911 A CN 107274911A CN 201710305251 A CN201710305251 A CN 201710305251A CN 107274911 A CN107274911 A CN 107274911A
Authority
CN
China
Prior art keywords
audio
sequence
similarity
signal
zero
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710305251.4A
Other languages
Chinese (zh)
Inventor
龙华
张琳
邵玉斌
杜庆治
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN201710305251.4A priority Critical patent/CN107274911A/en
Publication of CN107274911A publication Critical patent/CN107274911A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The present invention relates to a kind of similarity analysis method based on sound characteristic, belong to Audio Signal Processing technical field.The present invention is compares the similitude of two audios to be measured, and it is that, using the amplitude in physical features, zero-crossing rate as basic parameter, compared for three kinds of physical characteristic parameter algorithms to carry out audio similarity to compare:Waveform comparison, envelope compare to be compared with zero-crossing rate.The calculating of Similarity value is carried out by correlation function;Set similarity threshold;Similarity value is compared with similar threshold value, carries out similarity determination.The present invention is compared available for the similarity of audio signal, can be applied in terms of the monitoring of broadcast television signal.Compared with prior art, inventive algorithm is simple, and theoretical clear, technology is easily realized.

Description

A kind of similarity analysis method based on sound characteristic
Technical field
The present invention relates to a kind of similarity analysis method based on sound characteristic, belong to Audio Signal Processing technical field.
Background technology
It is current urgent problem to be solved that safety is carried out to broadcast audio, is quickly and efficiently monitored, and existing at present Most of researchs for audio content are mainly in terms of audio classification, audio retrieval, speech recognition, for these researchs Algorithm complex is high, and when actual audio similarity is compared, these algorithms are difficult often to implement and apply.Existing base In terms of audio research mainly audio classification, audio retrieval, the speech recognition of content, its algorithm complex is high, and theory is multiple It is miscellaneous, it is difficult to implement in actual applications.
The content of the invention
The technical problem to be solved in the present invention is to provide a kind of similarity analysis method based on sound characteristic, pass through respectively The calculating that the characteristic parameters such as waveform, envelope, zero-crossing rate carry out audio signal similarity is extracted, and it is similar to the result progress of calculating Sex determination.
The technical scheme is that:A kind of similarity analysis method based on sound characteristic.This method includes following step Suddenly:
(1) audio collection:Audio collection is to receive audio to be measured by microphone, and this process needs analog signal to convert For data signal, set microphone to receive the channel number of audio, while setting sample rate, quantified precision, recover in order to undistorted Former continuous signal, sample rate needs to meet nyquist sampling theorem;
(2) pre-process:Preprocessing process includes:Filtering process, preemphasis processing, adding window framing;
(3) data write-in wav file:Pretreated sequence is write wav file, this step can be by writing MATLAB programs are realized;
(4) wav file data are read:The data value in wav file is read, this step can be by writing MATLAB programs To realize;
(5) characteristic parameter extraction:Characteristic parameter, wave sequence, envelope sequence, zero-crossing rate are extracted from tonic train to be measured Sequence;
(6) audio is compared:Three kinds of characteristic sequences of audio to be measured calculate Similarity value by correlation function respectively;
(7) similarity threshold is set:The threshold value of similitude is set, for judging the similitude of audio to be measured;
(8) similarity judges:Similarity Measure result and the threshold value of setting are compared, more than or equal to similarity threshold, Judge that two audios to be measured are similar, otherwise, it is determined that being dissmilarity;
A kind of above-mentioned similarity analysis method based on sound characteristic, step (1) sound intermediate frequency collection, is passing through microphone , it is necessary to set reception channel number when receiving testing audio, when receiving voice signal, monophonic is set to, when receiving music signal, It is set to two-channel.Sample rate meets nyquist sampling theorem, sample rate fs≥2fh, fhFor signal highest frequency, reception sound Road number is set to monophonic, and sample rate is set to 44.1KHz, and quantified precision is 16bit;
Pretreatment comprises the following steps in a kind of above-mentioned similarity analysis method based on sound characteristic, step (2):
(1) filtering process:The purpose of filtering process has two:Suppress frequency in each frequency component of input signal and exceed fs/2 Important (the f of institutesFor sample frequency), with anti-aliased interference;(2) 50Hz power supply Hz noise is suppressed.So, wave filter must Must be a bandpass filter, if thereon, lower limiting frequency be f respectivelyHAnd fL, generally take fH=3400Hz, fL=60~ 100Hz;
(2) preemphasis is handled:The purpose of preemphasis processing is lifting HFS, and making the frequency spectrum of signal becomes flat, is protected Hold in the whole frequency band of low frequency to high frequency, frequency spectrum can be sought with same signal to noise ratio.Preemphasis is usually in voice signal numeral After change, the preemphasis digital filtering of the lifting high frequency characteristics with 6dB/ octaves is used in computer before Parameter analysis Device is realized.The usually digital filter of single order, i.e. H (Z)=1-uZ-1, wherein, u values are close to 1, and representative value is 0.94;
(3) adding window framing:Tonic train is the one-dimensional signal on time shaft, in order to carry out signal analysis to it, is needed Assume that audio signal is in stable state in Millisecond other short time, therefore adding window is carried out to audio signal on this basis Framing is operated.The method that contiguous segmentation can be used to the sub-frame processing of audio signal adding window, but it is in order that smoothed between frame and frame Cross and keep its continuity, typically can be using the overlapping method being segmented.Framing is weighted with moveable finite length window Method multiply s (n) to realize, that is, with certain window function w (n), so as to form the audio signal s of adding windoww(n)= s(n)×w(n);
Characteristic parameter extraction includes following in a kind of above-mentioned similarity analysis method based on sound characteristic, step (5) Step:
(1) wave sequence is extracted:The waveform of audio signal is, containing the irregular waveform for relatively enriching frequency distribution, to include sound All temporal signatures of frequency signal, compare the time domain waveform of two audio signals, and comparing audio signal that can be complete is in time domain All minutias, therefore similarity can be calculated using wave-shape amplitude value.Audio signal is that time and amplitude are all continuous The One-dimensional simulation signal of change, wants in a computer to handle it is necessary to first be sampled and quantified, when it is become it Between and amplitude be all discrete data signal.T is defined in the continuous variable on time shaft, and n is the integer value for representing sequence of points, Sampling is exactly, using sampling pulse sequence p (t) " extraction " series of discrete sample value from continuous signal f (t), to obtain sampled signal fs(t).Sampled signal fs(t) data signal f (n) is obtained by the preprocessing process quantified.If TsFor the sampling period, acoustic is treated The highest frequency of frequency signal is fh, meet sampling thheorem, 1/Ts≥2fh.The comparison duration all same of audio to be measured, is set to T, false If two audio time domain functions to be measured are x1And x (t)2(t), t is defined in the continuous variable on time shaft, orderN=T × (1/ TS), TsIt is normalized to 1, so, x1(nTs) and x2(nTs) x can be abbreviated as1And x (n)2(n), then x1And x (n)2(n) width Degree by quantifying, that is, obtains the wave sequence x to be extracted again1' (n) and x2’(n);
(2) envelope sequence is extracted:Signal envelope is the curve of reflected waveform changes in amplitude, can describe the part of the signal The situation of change of maximum.The time domain waveform of audio signal can be with all details compositions of comparing audio signal, and envelope is to compare letter The profile of number waveform.Assuming that two audio time domain functions to be measured are x1And x (t)2(t), t is defined in the continuous change on time shaft Amount, by Wave shape extracting method, can obtain audio volume control sequence x1' (n) and x2’(n).By envelope extraction flow:Audio wave Shape sequence x1' (n) and x2' (n), take absolute value | x1' (n) | and | x2' (n) |, LPF, subtract DC component, finally obtain Audio signal envelope sequence x to be measured1" (n) and x2”(n);
(3) zero-crossing rate sequential extraction procedures:Zero-crossing rate is a kind of simple feature in audio signal time-domain analysis, refers to signal by zero The number of times of value, for continuous audio signal, can observe the situation of time domain waveform passage time axle.For discrete signal, zero passage Number of times is the number of times of signal sampling value sign change.Assuming that two audio time domain functions to be measured are x1And x (t)2(t), t is definition Continuous variable on a timeline, by Wave shape extracting method, can obtain wave sequence x1' (n) and x2' (n), pass through formulaWithMeter Calculate x1' (n) and x2' (n) zero-crossing rate, in formula, LeffIt is sequence x1' (n) and x2' (n) setting time section in calculated The sequence length of zero rate value, per 50ms sequence of calculation zero-crossing rate values, sgn is sign function, Z1、Z2It is sequence x respectively1' (n) and x2' (n) in LeffZero-crossing rate value under length, zero-crossing rate sequence x is obtained by said process1" ' (n) and x2”’(n)。
A kind of above-mentioned similarity analysis method based on sound characteristic, step (6) sound intermediate frequency, which is compared, to be comprised the following steps:
(1) if the audio frequency characteristics parameter extracted is wave sequence, it is to calculate waveform sequence by cross-correlation function that audio, which is compared, The similarity degree of row, the definition of cross-correlation function:
(2) if the audio frequency characteristics parameter extracted is envelope sequence, it is to calculate envelope sequence by cross-correlation function that audio, which is compared, The similarity degree of row, the definition of cross-correlation function:
(3) if the audio frequency characteristics parameter extracted is zero-crossing rate sequence, it is to calculate zero passage by cross-correlation function that audio, which is compared, The similarity degree of rate sequence, the definition of cross-correlation function:
Similarity threshold is set in a kind of above-mentioned similarity analysis method based on sound characteristic, step (7):With mutual The peak value for closing function is maximum cross-correlation coefficient to determine whether comparison audio is similar.In wave sequence alignment algorithm, setting Threshold value is 60%, in envelope and zero-crossing rate sequence alignment algorithms, and given threshold is 80%;
Similarity judges in a kind of above-mentioned similarity analysis method based on sound characteristic, step (8):Wave sequence value Cross-correlation function peak value be determined as more than or equal to 60% similar, less than 60%, be determined as dissmilarity, in envelope and zero-crossing rate sequence In row alignment algorithm, cross-correlation function peak value is determined as similar more than or equal to 80%, less than 80%, is determined as dissmilarity.
The beneficial effects of the invention are as follows:The present invention is compared available for the similarity of audio signal, can be applied in broadcast electricity In terms of monitoring depending on signal.Compared with prior art, inventive algorithm is simple, and theoretical clear, technology is easily realized.
Brief description of the drawings
Fig. 1 is similarity-rough set flow chart of the present invention;
Fig. 2 is that audio signal wave sequence of the present invention extracts flow chart;
Fig. 3 is audio signal envelope sequential extraction procedures flow chart of the present invention;
Embodiment
With reference to the accompanying drawings and detailed description, the invention will be further described.
A kind of similarity analysis method based on sound characteristic, is concretely comprised the following steps:
(1) audio collection:Audio collection is to receive audio to be measured by microphone, and analog signal is converted into digital letter Number;
(2) characteristic parameter extraction:Characteristic parameter, including wave sequence, envelope sequence, mistake are extracted from tonic train to be measured Zero rate sequence;
(3) audio is compared:Three kinds of characteristic sequences of audio to be measured calculate Similarity value by correlation function respectively;
(4) similarity threshold is set:The threshold value of similitude is set, for judging the similitude of audio to be measured.
(5) similarity judges:Similarity Measure result and the threshold value of setting are compared, more than or equal to similarity threshold, Judge that two audios to be measured are similar, otherwise, it is determined that being dissmilarity.
The audio collection by microphone when receiving audio to be measured, it is necessary to set reception channel number;When reception voice During signal, monophonic is set to, when receiving music signal, two-channel is set to;Sample rate meets nyquist sampling theorem, adopts Sample rate fs≥2fh, fhFor signal highest frequency.Receive channel number and be set to monophonic, sample rate is set to 44.1KHz, quantifies essence Spend for 16bit;
The characteristic parameter extraction comprises the following steps:
(1) wave sequence is extracted:The waveform of audio signal is, containing the irregular waveform for relatively enriching frequency distribution, to include sound All temporal signatures of frequency signal, compare the time domain waveform of two audio signals, and comparing audio signal that can be complete is in time domain All minutias, therefore similarity can be calculated using wave-shape amplitude value.Audio signal is that time and amplitude are all continuous The One-dimensional simulation signal of change, wants in a computer to handle it is necessary to first be sampled and quantified, when it is become it Between and amplitude be all discrete data signal.Audio signal is sampled and quantified, audio signal is become into time and amplitude All it is discrete data signal;T is defined in the continuous variable on time shaft, and n is the integer value for representing sequence of points, is sampled as profit Series of discrete sample value is extracted from continuous signal f (t) with sampling pulse sequence p (t), sampled signal f is obtaineds(t), sampling letter Number fs(t) data signal f (n) is obtained by the preprocessing process quantified;If TsFor sampling period, the highest of audio signal to be measured Frequency is fh, meet sampling thheorem, 1/Ts≥2fh;The comparison duration of audio to be measured is identical, is set to T, it is assumed that two audios to be measured Time-domain function is x1And x (t)2(t), t is defined in the continuous variable on time shaft;
OrderN =T × (1/TS), by TsIt is normalized to 1, x1(nTs) and x2(nTs) it is designated as x1And x (n)2(n), then x1And x (n)2(n) width Degree by quantifying, that is, obtains the wave sequence x to be extracted again1' (n) and x2’(n);
(2) envelope sequence is extracted:Signal envelope is the curve of reflected waveform changes in amplitude, can describe the part of the signal The situation of change of maximum.The time domain waveform of audio signal can be with all details compositions of comparing audio signal, and envelope is to compare letter The profile of number waveform.Assuming that two audio time domain functions to be measured are x1And x (t)2(t), t is defined in the continuous change on time shaft Amount, by Wave shape extracting method, obtains audio volume control sequence x1' (n) and x2’(n);By envelope extraction flow:Audio volume control sequence Arrange x1' (n) and x2' (n), take absolute value | x1' (n) | and | x2' (n) |, LPF, subtract DC component, finally obtain to be measured Audio signal envelope sequence x1" (n) and x2”(n);
(3) zero-crossing rate sequential extraction procedures:Zero-crossing rate is a kind of simple feature in audio signal time-domain analysis, refers to signal by zero The number of times of value, for continuous audio signal, can observe the situation of time domain waveform passage time axle.For discrete signal, zero passage Number of times is the number of times of signal sampling value sign change.Assuming that two audio time domain functions to be measured are x1And x (t)2(t), t is definition Continuous variable on a timeline, by Wave shape extracting method, obtains wave sequence x1' (n) and x2' (n),
X is calculated by formula (1) and (2)1' (n) and x2' (n) zero-crossing rate
In formula, LeffIt is sequence x1' (n) and x2' (n) setting time section in calculate zero-crossing rate value sequence length, sgn For sign function, Z1、Z2It is sequence x respectively1' (n) and x2' (n) in LeffZero-crossing rate value under length, obtains zero-crossing rate sequence x1" ' (n) and x2”’(n);
The audio, which is compared, to be comprised the following steps:
(1) if the audio frequency characteristics parameter extracted is wave sequence, it is to calculate waveform sequence by cross-correlation function that audio, which is compared, The similarity degree of row, the definition of cross-correlation function:
(2) if the audio frequency characteristics parameter extracted is envelope sequence, it is to calculate envelope sequence by cross-correlation function that audio, which is compared, The similarity degree of row, the definition of cross-correlation function:
(3) if the audio frequency characteristics parameter extracted is zero-crossing rate sequence, it is to calculate zero passage by cross-correlation function that audio, which is compared, The similarity degree of rate sequence, the definition of cross-correlation function:
The similarity threshold is set as determining to compare audio with the i.e. maximum cross-correlation coefficient of the peak value of cross-correlation function Whether similar, in wave sequence value alignment algorithm, given threshold is 60%, in envelope and zero-crossing rate sequence alignment algorithms, Given threshold is 80%.
The similarity is determined as:
The cross-correlation function peak value of wave sequence value is determined as similar more than or equal to 60%, and not phase is determined as less than 60% Seemingly;In envelope and zero-crossing rate sequence alignment algorithms, cross-correlation function peak value is determined as similar more than or equal to 80%, is less than 80%, it is determined as dissmilarity.
Embodiment 1:The audio similarity analysis of the present invention comprises the following steps:
(1) audio collection:Audio collection is to receive audio to be measured by microphone, and this process needs analog signal to convert For data signal, set microphone to receive the channel number of audio, while setting sample rate, quantified precision, recover in order to undistorted Former continuous signal, sample rate needs to meet nyquist sampling theorem;
(2) when receiving testing audio by microphone, it is necessary to set reception channel number, when receiving voice signal, set For monophonic, when receiving music signal, two-channel is set to.Sample rate meets nyquist sampling theorem, sample rate fs≥2fh, fhFor signal highest frequency,
Channel number will be received and be set to monophonic, sample rate is set to 44.1KHz, and quantified precision is 16bit.
(3) pre-process:Preprocessing process includes:Filtering process, preemphasis processing, adding window framing;
(4) purpose of filtering process has two:Suppress frequency in each frequency component of input signal and exceed fsAll points of/2 Measure (fsFor sample frequency), with anti-aliased interference;(2) 50Hz power supply Hz noise is suppressed.So, wave filter must be one Bandpass filter, if thereon, lower limiting frequency be f respectivelyHAnd fL.Generally take fH=3400Hz, fL=60~100Hz;
(5) preemphasis is handled:The purpose of preemphasis processing is lifting HFS, and making the frequency spectrum of signal becomes flat, is protected Hold in the whole frequency band of low frequency to high frequency, frequency spectrum can be sought with same signal to noise ratio.Preemphasis is usually in voice signal numeral After change, the preemphasis digital filtering of the lifting high frequency characteristics with 6dB/ octaves is used in computer before Parameter analysis Device is realized.The usually digital filter of single order, i.e. H (Z)=1-uZ-1, wherein, u values are close to 1, and representative value is 0.94;
(6) adding window framing:Tonic train is the one-dimensional signal on time shaft, in order to carry out signal analysis to it, is needed Assume that audio signal is in stable state in Millisecond other short time, therefore adding window is carried out to audio signal on this basis Framing is operated.The method that contiguous segmentation can be used to the sub-frame processing of audio signal adding window, but it is in order that smoothed between frame and frame Cross and keep its continuity, typically can be using the overlapping method being segmented.Framing is weighted with moveable finite length window Method multiply s (n) to realize, that is, with certain window function w (n), so as to form the audio signal s of adding windoww(n)= s(n)×w(n)。
(7) data write-in wav file:Pretreated sequence is write wav file, this step can be by writing MATLAB programs are realized;
(8) wav file data are read:The data value in wav file is read, this step can be by writing MATLAB programs To realize;
(9) characteristic parameter extraction:Characteristic parameter, wave sequence, envelope sequence, zero-crossing rate are extracted from tonic train to be measured Sequence;
(10) wave sequence is extracted:The waveform of audio signal is the irregular waveform containing relatively abundant frequency distribution, comprising All temporal signatures of audio signal, compare the time domain waveform of two audio signals, comparing audio signal that can be complete when All minutias in domain, therefore similarity can be calculated using wave-shape amplitude value.Audio signal is that time and amplitude all connect The One-dimensional simulation signal of continuous change, wants in a computer to handle it is necessary to first be sampled and quantified it, it is become Time and amplitude are all discrete data signals.T is defined in the continuous variable on time shaft, and n is the integer for representing sequence of points Value, sampling is exactly, using sampling pulse sequence p (t) " extraction " series of discrete sample value from continuous signal f (t), to be sampled Signal fs(t).Sampled signal fs(t) data signal f (n) is obtained by the preprocessing process quantified.If TsFor the sampling period, treat The highest frequency for surveying audio signal is fh, meet sampling thheorem, 1/Ts≥2fh.The comparison duration all same of audio to be measured, is set to T, it is assumed that two audio time domain functions to be measured are x1And x (t)2(t), t is defined in the continuous variable on time shaft, orderN=T × (1/ TS), TsIt is normalized to 1, so, x1(nTs) and x2(nTs) x can be abbreviated as1And x (n)2(n), then x1And x (n)2(n) width Degree by quantifying, that is, obtains the wave sequence x to be extracted again1' (n) and x2’(n);
(11) envelope sequence is extracted, and signal envelope is the curve of reflected waveform changes in amplitude, can describe the office of the signal The situation of change of portion's maximum.The time domain waveform of audio signal can be with all details compositions of comparing audio signal, and envelope is to compare The profile of signal waveform.Assuming that two audio time domain functions to be measured are x1And x (t)2(t), t is defined in continuous on time shaft Variable, by Wave shape extracting method, can obtain audio volume control sequence x1' (n) and x2’(n).By envelope extraction flow:Audio Wave sequence x1' (n) and x2' (n), take absolute value | x1' (n) | and | x2' (n) |, LPF, subtract DC component, finally To audio signal envelope sequence x to be measured1" (n) and x2”(n);
(12) zero-crossing rate sequential extraction procedures, zero-crossing rate is a kind of simple feature in audio signal time-domain analysis, refers to signal and passes through The number of times of null value, for continuous audio signal, can observe the situation of time domain waveform passage time axle.For discrete signal, mistake Zero degree number is the number of times of signal sampling value sign change.Assuming that two audio time domain functions to be measured are x1And x (t)2(t), t is fixed The continuous variable of justice on a timeline, by Wave shape extracting method, can obtain wave sequence x1' (n) and x2' (n), pass through formulaWithCalculate x1' (n) and x2' (n) zero-crossing rate, in formula, LeffIt is sequence x1' (n) and x2' (n) calculating zero-crossing rate in the section of setting time The sequence length of value, sgn is sign function, Z1、Z2It is sequence x respectively1' (n) and x2' (n) in LeffZero-crossing rate value under length, Zero-crossing rate sequence x is obtained by said process1" ' (n) and x2”’(n)。
(13) audio is compared:Three kinds of characteristic sequences of audio to be measured calculate Similarity value by correlation function respectively;Point Not Ji Suan R (m) draw corresponding correlation, step is as follows:
If (a) the audio frequency characteristics parameter extracted is wave sequence, it is to calculate waveform sequence by cross-correlation function that audio, which is compared, The similarity degree of row, the definition of cross-correlation function:
If (b) the audio frequency characteristics parameter extracted is envelope sequence, it is to calculate envelope sequence by cross-correlation function that audio, which is compared, The similarity degree of row, the definition of cross-correlation function:
If (c) the audio frequency characteristics parameter extracted is zero-crossing rate sequence, it is to calculate zero passage by cross-correlation function that audio, which is compared, The similarity degree of rate sequence, the definition of cross-correlation function:
(14) similarity threshold is set:The threshold value of similitude is set, for judging the similitude of audio to be measured.With cross-correlation The peak value of function is maximum cross-correlation coefficient to determine whether comparison audio is similar.In wave sequence alignment algorithm, threshold is set It is worth for 60%, in envelope and zero-crossing rate sequence alignment algorithms, given threshold is 80%.
(15) similarity judges:Similarity Measure result and the threshold value of setting are compared, more than or equal to similarity threshold, Judge that two audios to be measured are similar, otherwise, it is determined that being dissmilarity;The cross-correlation function peak value of wave sequence value is more than or equal to 60% It is determined as similar, less than 60%, is determined as dissmilarity, in envelope and zero-crossing rate sequence alignment algorithms, cross-correlation function peak value It is determined as more than or equal to 80% similar, less than 80%, is determined as dissmilarity.
Above in association with accompanying drawing to the present invention embodiment be explained in detail, but the present invention be not limited to it is above-mentioned Embodiment, can also be before present inventive concept not be departed from the knowledge that those of ordinary skill in the art possess Put that various changes can be made.

Claims (6)

1. a kind of similarity analysis method based on sound characteristic, it is characterised in that concretely comprise the following steps:
(1) audio collection:Audio collection is to receive audio to be measured by microphone, and analog signal is converted into data signal;
(2) characteristic parameter extraction:Characteristic parameter, including wave sequence, envelope sequence, zero-crossing rate are extracted from tonic train to be measured Sequence;
(3) audio is compared:Three kinds of characteristic sequences of audio to be measured calculate Similarity value by correlation function respectively;
(4) similarity threshold is set:The threshold value of similitude is set, for judging the similitude of audio to be measured.
(5) similarity judges:Similarity Measure result and the threshold value of setting are compared, more than or equal to similarity threshold, are judged Two audios to be measured are similar, otherwise, it is determined that being dissmilarity.
2. the similarity analysis method according to claim 1 based on sound characteristic, it is characterised in that:The audio collection When receiving audio to be measured by microphone, it is necessary to set reception channel number;When receiving voice signal, monophonic is set to, When receiving music signal, two-channel is set to;Sample rate meets nyquist sampling theorem, sample rate fs≥2fh, fhFor signal Highest frequency.
3. the similarity analysis method according to claim 1 based on sound characteristic, it is characterised in that:The characteristic parameter Extraction comprises the following steps:
(1) wave sequence is extracted:Audio signal is sampled and quantified, it is all discrete that audio signal is become into time and amplitude Data signal;T is defined in the continuous variable on time shaft, and n is the integer value for representing sequence of points, is sampled as utilizing arteries and veins of sampling Rush sequence p (t) and series of discrete sample value is extracted from continuous signal f (t), obtain sampled signal fs(t), sampled signal fs(t) pass through Cross the preprocessing process quantified and obtain data signal f (n);If TsFor the sampling period, the highest frequency of audio signal to be measured is fh, Meet sampling thheorem, 1/Ts≥2fh;The comparison duration of audio to be measured is identical, is set to T, it is assumed that two audio time domain functions to be measured are x1And x (t)2(t), t is defined in the continuous variable on time shaft;
Order N=T × (1/TS), by TsIt is normalized to 1, x1(nTs) and x2(nTs) it is designated as x1And x (n)2(n), then x1And x (n)2(n) width Degree by quantifying, that is, obtains the wave sequence x to be extracted again1' (n) and x2’(n);
(2) envelope sequence is extracted:Assuming that two audio time domain functions to be measured are x1And x (t)2(t), t is defined on time shaft Continuous variable, by Wave shape extracting method, obtains audio volume control sequence x1' (n) and x2’(n);By envelope extraction flow:Audio Wave sequence x1' (n) and x2' (n), take absolute value | x1' (n) | and | x2' (n) |, LPF, subtract DC component, finally To audio signal envelope sequence x to be measured1" (n) and x2”(n);
(3) zero-crossing rate sequential extraction procedures:Assuming that two audio time domain functions to be measured are x1And x (t)2(t), t is defined on time shaft Continuous variable, pass through Wave shape extracting method, obtain wave sequence x1' (n) and x2' (n),
X is calculated by formula (1) and (2)1' (n) and x2' (n) zero-crossing rate
In formula, LeffIt is sequence x1' (n) and x2' (n), in the interior sequence length for calculating zero-crossing rate value of the section of setting time, sgn is symbol Number function, Z1、Z2It is sequence x respectively1' (n) and x2' (n) in LeffZero-crossing rate value under length, obtains zero-crossing rate sequence x1”’(n) And x2”’(n)。
4. the similarity analysis method according to claim 1 based on sound characteristic, it is characterised in that:The audio is compared Comprise the following steps:
(1) if the audio frequency characteristics parameter extracted is wave sequence, it is to calculate wave sequence by cross-correlation function that audio, which is compared, Similarity degree, the definition of cross-correlation function:
(2) if the audio frequency characteristics parameter extracted is envelope sequence, it is to calculate envelope sequence by cross-correlation function that audio, which is compared, Similarity degree, the definition of cross-correlation function:
(3) if the audio frequency characteristics parameter extracted is zero-crossing rate sequence, it is to calculate zero-crossing rate sequence by cross-correlation function that audio, which is compared, The similarity degree of row, the definition of cross-correlation function:
5. the similarity analysis method according to claim 1 based on sound characteristic, it is characterised in that:The similitude threshold Value is set as determining whether comparison audio is similar with the i.e. maximum cross-correlation coefficient of the peak value of cross-correlation function, in wave sequence value In alignment algorithm, given threshold is 60%, in envelope and zero-crossing rate sequence alignment algorithms, and given threshold is 80%.
6. the similarity analysis method according to claim 1 based on sound characteristic, it is characterised in that:The similarity is sentenced It is set to:
The cross-correlation function peak value of wave sequence value is determined as similar more than or equal to 60%, and dissmilarity is determined as less than 60%; In envelope and zero-crossing rate sequence alignment algorithms, cross-correlation function peak value be determined as more than or equal to 80% it is similar, less than 80%, judge For dissmilarity.
CN201710305251.4A 2017-05-03 2017-05-03 A kind of similarity analysis method based on sound characteristic Pending CN107274911A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710305251.4A CN107274911A (en) 2017-05-03 2017-05-03 A kind of similarity analysis method based on sound characteristic

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710305251.4A CN107274911A (en) 2017-05-03 2017-05-03 A kind of similarity analysis method based on sound characteristic

Publications (1)

Publication Number Publication Date
CN107274911A true CN107274911A (en) 2017-10-20

Family

ID=60073693

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710305251.4A Pending CN107274911A (en) 2017-05-03 2017-05-03 A kind of similarity analysis method based on sound characteristic

Country Status (1)

Country Link
CN (1) CN107274911A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108231091A (en) * 2018-01-24 2018-06-29 广州酷狗计算机科技有限公司 A kind of whether consistent method and apparatus of left and right acoustic channels for detecting audio
CN108615006A (en) * 2018-04-23 2018-10-02 百度在线网络技术(北京)有限公司 Method and apparatus for output information
CN108711437A (en) * 2018-03-06 2018-10-26 深圳市沃特沃德股份有限公司 Method of speech processing and device
CN108881652A (en) * 2018-07-11 2018-11-23 北京大米科技有限公司 Echo detection method, storage medium and electronic equipment
CN109599104A (en) * 2018-11-20 2019-04-09 北京小米智能科技有限公司 Multi-beam choosing method and device
CN109829265A (en) * 2019-01-30 2019-05-31 杭州拾贝知识产权服务有限公司 A kind of the infringement evidence collecting method and system of audio production
CN110085259A (en) * 2019-05-07 2019-08-02 国家广播电视总局中央广播电视发射二台 Audio comparison method, device and equipment
CN110134819A (en) * 2019-04-25 2019-08-16 广州智伴人工智能科技有限公司 A kind of speech audio screening system
CN110310661A (en) * 2019-07-03 2019-10-08 云南康木信科技有限责任公司 A kind of calculation method of two-way real-time broadcast audio delay and similarity
CN110491413A (en) * 2019-08-21 2019-11-22 中国传媒大学 A kind of audio content consistency monitoring method and system based on twin network
CN110910899A (en) * 2019-11-27 2020-03-24 杭州联汇科技股份有限公司 Real-time audio signal consistency comparison detection method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102456346A (en) * 2010-10-19 2012-05-16 盛乐信息技术(上海)有限公司 Concatenated speech detection system and method
CN103440873A (en) * 2013-08-27 2013-12-11 大连理工大学 Music recommendation method based on similarities
CN105244040A (en) * 2015-07-20 2016-01-13 杭州联汇数字科技有限公司 Audio signal consistency comparison method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102456346A (en) * 2010-10-19 2012-05-16 盛乐信息技术(上海)有限公司 Concatenated speech detection system and method
CN103440873A (en) * 2013-08-27 2013-12-11 大连理工大学 Music recommendation method based on similarities
CN105244040A (en) * 2015-07-20 2016-01-13 杭州联汇数字科技有限公司 Audio signal consistency comparison method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
赵花婷: ""一种基于音频匹配的广告检测算法"", 《计算机与现代化》 *
郭兴吉: "基于特征的音频比对技术", 《河南师范大学学报自然科学版》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108231091A (en) * 2018-01-24 2018-06-29 广州酷狗计算机科技有限公司 A kind of whether consistent method and apparatus of left and right acoustic channels for detecting audio
CN108231091B (en) * 2018-01-24 2021-05-25 广州酷狗计算机科技有限公司 Method and device for detecting whether left and right sound channels of audio are consistent
CN108711437A (en) * 2018-03-06 2018-10-26 深圳市沃特沃德股份有限公司 Method of speech processing and device
CN108615006B (en) * 2018-04-23 2020-04-17 百度在线网络技术(北京)有限公司 Method and apparatus for outputting information
CN108615006A (en) * 2018-04-23 2018-10-02 百度在线网络技术(北京)有限公司 Method and apparatus for output information
CN108881652A (en) * 2018-07-11 2018-11-23 北京大米科技有限公司 Echo detection method, storage medium and electronic equipment
CN108881652B (en) * 2018-07-11 2021-02-26 北京大米科技有限公司 Echo detection method, storage medium and electronic device
CN109599104A (en) * 2018-11-20 2019-04-09 北京小米智能科技有限公司 Multi-beam choosing method and device
CN109599104B (en) * 2018-11-20 2022-04-01 北京小米智能科技有限公司 Multi-beam selection method and device
CN109829265A (en) * 2019-01-30 2019-05-31 杭州拾贝知识产权服务有限公司 A kind of the infringement evidence collecting method and system of audio production
CN110134819A (en) * 2019-04-25 2019-08-16 广州智伴人工智能科技有限公司 A kind of speech audio screening system
CN110085259A (en) * 2019-05-07 2019-08-02 国家广播电视总局中央广播电视发射二台 Audio comparison method, device and equipment
CN110085259B (en) * 2019-05-07 2021-09-17 国家广播电视总局中央广播电视发射二台 Audio comparison method, device and equipment
CN110310661A (en) * 2019-07-03 2019-10-08 云南康木信科技有限责任公司 A kind of calculation method of two-way real-time broadcast audio delay and similarity
CN110310661B (en) * 2019-07-03 2021-06-11 云南康木信科技有限责任公司 Method for calculating two-path real-time broadcast audio time delay and similarity
CN110491413A (en) * 2019-08-21 2019-11-22 中国传媒大学 A kind of audio content consistency monitoring method and system based on twin network
CN110910899A (en) * 2019-11-27 2020-03-24 杭州联汇科技股份有限公司 Real-time audio signal consistency comparison detection method
CN110910899B (en) * 2019-11-27 2022-04-08 杭州联汇科技股份有限公司 Real-time audio signal consistency comparison detection method

Similar Documents

Publication Publication Date Title
CN107274911A (en) A kind of similarity analysis method based on sound characteristic
KR101269296B1 (en) Neural network classifier for separating audio sources from a monophonic audio signal
CN109256127B (en) Robust voice feature extraction method based on nonlinear power transformation Gamma chirp filter
CN103236260A (en) Voice recognition system
EP3701528A1 (en) Segmentation-based feature extraction for acoustic scene classification
KR20060044629A (en) Isolating speech signals utilizing neural networks
Pillos et al. A Real-Time Environmental Sound Recognition System for the Android OS.
CN108682432B (en) Speech emotion recognition device
WO2017045429A1 (en) Audio data detection method and system and storage medium
CN106024010A (en) Speech signal dynamic characteristic extraction method based on formant curves
CN101625860A (en) Method for self-adaptively adjusting background noise in voice endpoint detection
CN112786059A (en) Voiceprint feature extraction method and device based on artificial intelligence
WO2018095167A1 (en) Voiceprint identification method and voiceprint identification system
CN110782915A (en) Waveform music component separation method based on deep learning
Jaafar et al. Automatic syllables segmentation for frog identification system
Labied et al. An overview of automatic speech recognition preprocessing techniques
Pilia et al. Time scaling detection and estimation in audio recordings
Martin et al. Cepstral modulation ratio regression (CMRARE) parameters for audio signal analysis and classification
VH et al. A study on speech recognition technology
CN106653040A (en) Voice audio signal sampling processing method
CN110689875A (en) Language identification method and device and readable storage medium
Zengyuan et al. A speech denoising algorithm based on harmonic regeneration
Nandyala et al. Real time isolated word recognition using adaptive algorithm
Ge et al. Design and Implementation of Intelligent Singer Recognition System
Wei et al. A Survey of Sound-based Biometrics used in Species Recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20171020