CN116682456A - Musical instrument classification method based on time-frequency fine analysis - Google Patents

Musical instrument classification method based on time-frequency fine analysis Download PDF

Info

Publication number
CN116682456A
CN116682456A CN202310583602.3A CN202310583602A CN116682456A CN 116682456 A CN116682456 A CN 116682456A CN 202310583602 A CN202310583602 A CN 202310583602A CN 116682456 A CN116682456 A CN 116682456A
Authority
CN
China
Prior art keywords
harmonic
value
frequency
peak
maximum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310583602.3A
Other languages
Chinese (zh)
Inventor
王一歌
陈浩文
韦岗
曹燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202310583602.3A priority Critical patent/CN116682456A/en
Publication of CN116682456A publication Critical patent/CN116682456A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2131Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on a transform domain processing, e.g. wavelet transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/259Fusion by voting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/06Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2123/00Data types
    • G06F2123/02Data types in the time domain, e.g. time-series data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The invention discloses a musical instrument classification method based on time-frequency fine analysis, which comprises the following steps: fundamental frequency estimation, frequency domain harmonic marking, time domain single-period fundamental frequency sequence estimation, time-frequency fine feature extraction and random forest model. The invention designs a cyclic iteration fundamental frequency estimation algorithm based on autocorrelation and narrowband spectrum energy frequency estimation to carry out fundamental frequency estimation, and then accurately marks harmonic positions and acquires a single-period fundamental frequency sequence based on a harmonic marking algorithm and a time domain single-period fundamental frequency sequence estimation algorithm which are searched by an adaptive window; characterizing the spectrum fine structure and single-period fundamental frequency change through the time-frequency fine characteristics, and extracting; and finally, fusing the time-frequency fine features with 13-dimensional mel frequency cepstrum coefficient features, inputting the fused time-frequency fine features and the 13-dimensional mel frequency cepstrum coefficient features into a random forest model for training, wherein the instrument classification error rate based on the random forest model is reduced by 6% compared with that of the instrument classification error rate based on the mel frequency cepstrum coefficient features.

Description

Musical instrument classification method based on time-frequency fine analysis
Technical Field
The invention relates to the technical field of signal processing, in particular to a musical instrument classification method based on time-frequency fine analysis.
Background
By analyzing the audio file, the musical instrument characteristics of the audio are obtained, and the musical instrument types contained in the audio file can be quickly and accurately identified by musical instrument identification, so that the audio file is beneficial to being applied to the fields of intelligent music classification, musical instrument teaching, music creation and the like.
In the past researches, students obtain a certain classification effect on time-frequency characteristics proposed by musical instrument identification, but the scales of the characteristics are often large, and the described characteristics are often the characteristics of the whole time domain or the whole frequency domain. For example, the amplitude envelope describes the overall variation of the audio amplitude over a period of time, but it does not exactly describe the fine variation between individual time-domain fundamental frequency periods. And the characteristic of the spectrum centroid is just to describe the center of gravity of the frequency domain energy from the macroscopic dimension of the whole spectrogram, but the magnitude relation between different times of harmonic waves and between a certain time of harmonic waves and nearby non-harmonic waves in the frequency domain cannot be finely described, in other words, the spectrum centroid does not have a method for accurately transmitting the information of the spectrum structure. However, the spectral structure often plays a decisive role in the timbre of the instrument. Therefore, the present invention considers that many characteristic researches in the past neglect the fine characterization of the time-frequency characteristics of music, and needs to be solved.
The basis of the time-frequency fine analysis is the accurate estimation of the fundamental frequency, however, the accurate estimation of the fundamental frequency and the accurate marking of the harmonic wave are often complex processes, and complex theory is combined, and abnormal situations of real audio data are considered, such as: spectrum aliasing, noise superposition, spectrum offset, etc., have extremely high requirements on the adaptability and robustness of the fundamental frequency estimation algorithm and the harmonic labeling algorithm. Therefore, it is challenging to implement accurate fundamental frequency estimation and accurate harmonic labeling with codes to accommodate various realistic anomalies, thereby laying a platform foundation for later research.
Disclosure of Invention
The invention aims to solve the defects in the prior art and provides a musical instrument classification method based on time-frequency fine analysis.
The aim of the invention can be achieved by adopting the following technical scheme:
a musical instrument classification method based on time-frequency fine analysis, the musical instrument classification method comprising the steps of:
s1, inputting single-tone audio of a musical instrument, and accurately estimating fundamental frequency of the audio;
s2, carrying out FFT frequency domain harmonic marking according to the size of the fundamental frequency of the audio;
s3, estimating a time domain single-period fundamental frequency sequence according to the fundamental frequency of the audio;
s4, extracting time-frequency fine features and Mel frequency cepstrum coefficient features;
and S5, based on the extracted video fine features and the Mel frequency cepstrum coefficient features, evaluating feature correlation and variation coefficients, and establishing a random forest model to classify musical instruments.
Further, the step S1 is as follows:
s101, detecting a single-tone audio end point of a musical instrument by adopting a short-time energy method, intercepting effective audio, and calculating the short-time energy by the following formula:
In s [ n ]]Is the input musical instrument single tone audio signal, w [ n ]]Is a window function, n is s [ n ]]And w [ n ]]Time domain index, N f Is the frame length of the signal frame, z is the amount of shift of the window function, E z Is short-time energy corresponding to the translation amount z; e corresponding to different translation amounts z Forming a short-time energy sequence; setting a threshold value for short-time energy of a starting and stopping endpoint of the effective audio of the single-tone audio of the musical instrument, recording the midpoint of the current frame as the starting endpoint of the effective audio when the short-time energy of the current frame exceeds the threshold value of the starting endpoint, and recording the midpoint of the current frame as the ending endpoint of the effective audio when the short-time energy of the current frame is lower than the threshold value of the ending endpoint;
s102, roughly estimating effective audio x [ n ] through time domain autocorrelation]Fundamental frequency f of (f) 0 The discrete signal autocorrelation formula is as follows:
where τ represents the hysteresis, i.e., the amount of translation, x [ n ]]Shifting tau units to the left, multiplying the shifted tau units by the original signal, and finally accumulating and summing the multiplied results to obtain an autocorrelation value under the current hysteresis value; the autocorrelation value is maximum when the hysteresis value is zero; finding out maximum value point except zero according to the relation between the autocorrelation value and different hysteresis values, wherein the difference between the maximum value point and the hysteresis value of zero is the approximate estimation of fundamental wave period, thereby estimating fundamental frequency f 0
S103, performing Fast Fourier Transform (FFT) on the audio signal, and recording the frequency f corresponding to the maximum value of the current FFT max The method comprises the steps of carrying out a first treatment on the surface of the Judging f according to the time domain autocorrelation result max Whether it is an interfering component; if so, removing the component from the FFT spectrum and re-acquiring the frequency f corresponding to the maximum FFT value max Repeating the above process until f is considered according to the time domain autocorrelation result max Not interfering components;
s104, adopting narrow-band spectrum energy frequencyCalculating the denoising frequency value f of the maximum peak corresponding frequency value of the fast Fourier transform by the rate estimation c Eliminating noise interference; calculating a denoised frequency value f of the FFT spectrum using the narrowband spectral energy frequency estimate with the index of the FFT spectrum array starting at 1 c The formula of (2) is as follows:
wherein k is 0 Is the FFT index corresponding to the maximum peak of the FFT, k is the index of the FFT spectrum, |X k I is the FFT module value corresponding to the kth point in the FFT spectrum, F s Is the sampling rate of the audio signal, N is the FFT point number;
s105, according to the denoising frequency value f c And the fundamental frequency f of the autocorrelation estimation 0 Calculate that the maximum peak is located at about n m Subharmonic; wherein:
obtaining the fundamental frequency estimated value as f 1 :
S106, according to the fundamental frequency estimation value f 1 Resampling the audio time domain signal, new sampling rate f s Is calculated according to the formula:
f s =f 1 ×2 m formula (6)
Wherein the value of m is as follows:
the new sampling rate is not too large, so that the number N of sampling points cannot cover enough fundamental wave period; re-stepping the re-sampled time domain signalStep S102 starts to be executed until the fundamental frequency estimation value f 1 Compared with the estimated value of the fundamental frequency in the last cycle, the variation amplitude is smaller than 0.001 or the cycle number reaches 50 times of limit, and the estimated value f of the fundamental frequency in the last time is output 1
Further, in the step S2, the FFT frequency domain harmonic labeling process according to the fundamental frequency size of the audio frequency is as follows:
s201, detecting an audio endpoint of the musical instrument by adopting a short-time energy method, and intercepting an effective audio part;
s202, estimating the value f of the fundamental frequency according to the step S1 1 Resampling the audio time domain signal to make the sampling rate of the audio be f 1 To the power of (a), reduces spectrum leakage and converts the FFT spectrum to a logarithmic spectrum;
s203, determining a left endpoint l and a right endpoint r of the next harmonic peak search range on the FFT spectrum, and if the next harmonic peak search range is the first 3 times harmonic, the calculation formula is as follows:
where P is the index of the position of the last harmonic peak in the FFT spectrum, if the position of the first harmonic peak is marked, p=1;
if the harmonic is marked 4 th harmonic and later, the calculation formulas of the left endpoint l and the right endpoint r are as follows:
l=p+bt-2 formula (10)
r=P+1.09T 1 Formula (11)
Wherein bT is half the distance between the third harmonic peak and the first harmonic peak on the FFT spectrum; t (T) 1 The two-harmonic FFT point number interval is the closest to the current marked harmonic of the FFT spectrum;
s204, all maximum points are obtained in the search range, and the maximum point of all the maximum points is obtained as the obtained peak point of the current harmonic; if no maximum point exists in the search range, the left and right boundaries of the search range are expanded when the first fourth harmonic is marked, and only the right boundary is expanded when other harmonics are marked; when expanding the search boundary of the first two harmonics, the left and right endpoints l and r satisfy the following conditions:
when expanding the search boundaries of the 3 rd and 4 th harmonics, the left and right endpoints l and r satisfy the following conditions:
l-P≥0.7T 1 formula (14)
r-P≤1.3T 1 Formula (15)
When the search boundary of the 5 th order and above harmonics is extended, the right end point r satisfies the following condition:
r-P≤1.3T 1 formula (16)
Then, the maximum value point of all maximum value points is calculated in the new searching range and is used as the calculated peak value point of the current harmonic, all FFT logarithmic magnitude values and index values of the maximum value points in the harmonic interval of 0.1 times with the maximum value point as the center are recorded and respectively stored in a matrix HWB and a series HWI, and each row of HWB stores the logarithmic value of each harmonic peak and the frequency spectrum logarithmic value nearby the logarithmic value of each harmonic peak; if the maximum point is still not found, the searching range is continuously expanded, and the method is repeated.
Further, in the step S3, the process of estimating the time domain single period baseband frequency sequence according to the baseband frequency size of the audio frequency is as follows:
s301, detecting an audio endpoint of a musical instrument by adopting a short-time energy method, and intercepting an effective audio part;
s302, filtering low-frequency interference by using a high-pass filter;
s303, marking a demarcation point of a single-base wave period by using a cross-correlation operation, and taking a certain point o in the middle of an audio signal as an origin, extending towards two sides at equal intervals, wherein the extending distance is equal to the length of the single-base wave period obtained by the last operation, the signal in the extending range is a shift vector, and the length of the shift vector is twice the length of the single-base wave period obtained by the last operation, so that the self-adaptive adjustment is characterized; if the first cross-correlation operation is performed, the single period length refers to the fundamental frequency estimation result in the step S2; taking the o point as an origin, and taking signals with four single-base wave period lengths to the left or right as fixed vectors, wherein the single-base wave period length is also the single-base wave period length obtained by the last operation; performing cross-correlation operation by taking the shift vector as a sliding window and a fixed vector to obtain a corresponding relation between a cross-correlation value and relative displacement; recording extreme point sequences and corresponding relative displacement sequences of the cross-correlation values from the corresponding relations; selecting a reasonable value from the relative displacement sequence, and calculating a next single-base wave period separation point by taking the o point as a reference, wherein the distance between the separation point and the o is the single-base wave period length; and circularly reciprocating to obtain a time domain single period fundamental frequency sequence, wherein the calculation formula of the cross-correlation r [ tau ] is as follows:
In the formula, τ represents hysteresis value, i.e. translation quantity, g [ n ] and v [ n ] are two different signals, v [ n ] is translated leftwards by a distance of τ units and then multiplied by g [ n ], and the multiplied results are accumulated and summed, i.e. similarity under the current hysteresis value.
Further, in the step S4, the process of extracting the time-frequency fine feature and mel-frequency cepstrum coefficient feature is as follows:
s401, extracting frequency domain fine features, including: the logarithmic value is larger than zero harmonic energy centroid, harmonic peak burr average attenuation ratio, logarithmic value is larger than zero maximum harmonic frequency, slope bandwidth ratio between two harmonics, harmonic peak-valley ratio, harmonic energy drop rate, odd harmonic to even harmonic energy ratio of the first 6 harmonics, first harmonic peak-valley ratio, slope bandwidth ratio drop rate between harmonics, frequency domain logarithmic value is larger than ratio of zero harmonic energy to total energy, harmonic peak-valley difference ratio half harmonic interval, first harmonic peak-valley difference ratio half harmonic interval;
from the HWB obtained in step S2, a log-amplitude series P [ n ] of the maximum value point of each subharmonic peak can be obtained h ]And setting the number less than or equal to zero in the logarithmic magnitude array to zero, and calculating the index from 1. The calculation formula for the logarithmic value larger than zero harmonic energy centroid c is as follows:
Wherein L is P Is P [ n ] h ]Is also the maximum harmonic order, n h Is the current harmonic frequency, also P [ n ] h ]Index of (2);
the maximum harmonic number with the logarithmic value greater than zero is the logarithmic magnitude array Pn h ]The number less than or equal to zero is set to be zero, and the index is started from 1;
calculation of the average reduction ratio of harmonic peaks and burrs requires that, in the case where the logarithmic spectrum of the FFT spectrum in step S2 is greater than zero, twenty non-harmonic peak maxima A [ n ] closest to the harmonic peak maximum in a harmonic spacing range centered on the harmonic peak maximum are obtained p ],n p Is A [ n ] p ]If the number of the maximum values in the range is less than twenty, taking all the indexes; if the number of maximum values is less than or equal to 10, directly taking an p ]The mean value of (2) is taken as the average amplitude of burrs around the harmonic wave crest; if the maximum number is greater than 10, then at an p ]Searching a maximum value group with the length of 10 continuous, wherein the FFT index span of the maximum value group comprises an index where the maximum value of the harmonic peak is located or no other maximum value exists between the index and the index where the maximum value of the harmonic peak is located and the sum is maximum; obtaining the average value of the maximum value group as the average amplitude of burrs around the harmonic peaks; let the average amplitude of the burrs around the harmonic peaks be a, the harmonic peak burr attenuation ratio PDR of the single harmonic is as follows:
Wherein p is the maximum amplitude of the current harmonic peak, and the average of the harmonic peak burr weakening ratios of a plurality of harmonics is obtained to obtain the average weakening ratio of the harmonic peak burrs;
let the logarithmic spectrum of the FFT spectrum of step S2 be mag [ k ]]HWI as determined in step S2 is mag [ k ]]The index sequence where the maximum value of the medium harmonic peak is located is called hi n for short h ]All indexes of the series start from 1, and the calculation formula of the slope bandwidth ratio SFWR between the 3 rd harmonic and the 4 th harmonic is as follows:
when calculating the bandwidth ratio decline rate of the inter-harmonic slope, firstly calculating the bandwidth ratio of the direct current component and the first-harmonic slope in the logarithmic spectrum as r 1 Calculating the inter-harmonic slope bandwidth ratio of each harmonic spacing between the 5 th harmonic and the 8 th harmonic and taking the average value as r 2 If the 8 th harmonic wave is less than, the highest harmonic wave is obtained; the calculation formula of the inter-harmonic slope bandwidth ratio degradation rate SFWRD is as follows:
calculating the harmonic peak-to-valley ratio based on the part with the logarithmic value larger than zero in the FFT logarithmic spectrum to make mag [ k ]]The elements less than or equal to zero are zero, and a new number array MAG [ k ] is obtained]The method comprises the steps of carrying out a first treatment on the surface of the For a harmonic peak, which has two left and right troughs, the left trough value is v p1 Its value is MAG [ k ]]A mean value within a trough range of the left harmonic spacing; the right wave valley value is v p2 Its value is MAG [ k ]]A mean value within a trough range of the right harmonic spacing; in the calculation of the peak-to-valley ratio of the M-order harmonic p1 The calculation formula of (2) is as follows:
wherein L is the left boundary of the trough value calculation and R is the right boundary of the trough value calculation;
the calculation formula of v2 in the calculation of the M-order harmonic peak-to-valley ratio is as follows:
the logarithmic magnitude sequence of the maximum point of each subharmonic peak is known as Pn h ]The calculation formula of the Mth harmonic peak-to-valley ratio MR is as follows:
where PM represents the logarithmic magnitude of the maximum point of the peak of the Mth harmonic. Calculating the average value of harmonic peak-to-valley ratios of multiple harmonics as a characteristic parameter of the harmonic peak-to-valley ratios;
the first harmonic peak-to-valley ratio is a special case of calculating the Mth harmonic peak-to-valley ratio, and the steps are similar; in the calculation of the first harmonic peak-to-valley ratio v p1 The calculation formula of (2) is as follows:
in the calculation of the first harmonic peak-to-valley ratio v p2 The calculation formula of (2) is as follows:
the first harmonic peak-to-valley ratio FHPVR is calculated as follows:
the calculation of the harmonic energy reduction rate is divided into two cases that the number of the maximum harmonic times num with the value larger than zero is larger than four and smaller than or equal to four; in the first case, let head be the second largest log amplitude of the first third harmonic, its harmonic order be s, let tail be the mean value of the harmonics with log amplitude greater than zero among num-2, num-1, num subharmonics, in which case the calculation formula of the harmonic energy reduction rate HDR is:
If the logarithmic value is greater than zero and the maximum harmonic number num is less than or equal to four, the head is the maximum logarithmic magnitude value in the previous num-1 harmonic, the corresponding harmonic number is s, and tail is the num harmonic logarithmic magnitude value; the calculation formula of the harmonic energy reduction rate HDR in this case is:
in the calculation of the energy ratio of the odd harmonic and the even harmonic of the first 6 times, let the matrix HWB be HB ij Odd harmonic and even harmonic energy ratio R of the first 6 harmonics oe6 The calculation formula of (2) is as follows:
wherein the row-column indices of the matrix are all 1, wid is HB ij The length of a row;
when the frequency domain logarithmic value is greater than the ratio of zero harmonic energy to total energy, the element less than or equal to zero in the matrix HWB is made to be zero to obtain a new matrix UPHB ij The calculation formula for the frequency domain logarithmic value being greater than the ratio of zero harmonic energy to total energy UHTR is as follows:
where row and col are the matrices UPHB, respectively ij Row and column numbers of (a);
the calculation of the harmonic peak-valley difference ratio frequency spacing requires the calculation of the harmonic trough value v and the harmonic peak maximum logarithmic value PM for the M-th harmonic wave]The absolute value of the difference is the M-order harmonic peak-to-valley difference ratio frequency spacing compared with the frequency spacing of the harmonic trough and peak, and the characteristic value is averaged for a plurality of harmonics to obtain the final harmonic peak-to-valley difference ratio frequency spacing value; calculating and averaging harmonics with harmonic peak log values greater than zero in fifth to eighth harmonics in the logarithmic spectrum; for a harmonic peak, which has two left and right troughs, the left trough value is v p1 The right trough value is v p2 ;v p1 And v p2 Is consistent with the algorithm in the M-order harmonic peak-to-valley ratio; thenThe calculation formula of the M-order harmonic peak-to-valley difference ratio frequency spacing DFR is as follows:
calculating DFR for a plurality of harmonic waves and taking an average value to obtain a harmonic crest valley difference ratio frequency interval value;
the first harmonic peak-to-valley difference ratio frequency spacing FDFR is a special case of the M-order harmonic peak-to-valley difference ratio frequency spacing, and has a left-to-right trough value v p1 And v p2 Calculating the peak-to-valley ratio of the first harmonic; the calculation formula is as follows:
the 1 st and 2 nd harmonic energy ratio is the ratio of the maximum logarithmic magnitude of the first harmonic to the maximum logarithmic magnitude of the second harmonic;
s402, extracting time domain fine features, including single-period fundamental frequency sequence standard deviation, single-period fundamental frequency sequence mean value, single-period fundamental frequency sequence median, single-period fundamental frequency sequence mean value and median distance, single-period fundamental frequency sequence unit time average passing times; let the time domain single period base frequency sequence obtained in step S3 be Fn T ]Length of Len, n T Representing the sequence number of a time domain single period. Single period fundamental frequency sequence mean f mean The calculation formula of (2) is as follows:
standard deviation f of single period fundamental frequency sequence std The calculation formula of (2) is as follows:
for F [ n ] T ]Ordering is performed, the median f of the single-period base frequency sequence median The calculation formula of (2) is as follows:
Single period fundamental frequency sequence mean and median distance f mm The calculation formula of (2) is as follows:
f mm =|f mean -f median i formula (45)
The calculation formula of the single-period fundamental frequency sequence per unit time over-average number Nc is as follows:
wherein T is 1 Is the duration of the time-domain audio signal, u (t) is expressed as follows:
s403, extracting static characteristics of 13-dimensional Meier frequency cepstrum coefficients: firstly, preprocessing input audio data, secondly, performing FFT (fast Fourier transform) on the audio, obtaining an energy spectrogram of the audio by square values of FFT modular values, then filtering by using a Mel filter bank to obtain energy output of each filter, and finally, performing logarithmic operation on the filter output and obtaining Mel frequency cepstrum coefficient characteristics by discrete cosine transform DCT.
Further, in the step S5, the characteristic correlation and the coefficient of variation are evaluated, and the process of establishing the random forest model to classify the musical instruments is as follows:
s501, calculating the Pearson correlation coefficient among the elements of the feature vector and evaluating the feature effectiveness;
s502, calculating variation coefficients of elements of the feature vector and evaluating feature effectiveness;
s503, fusing the time-frequency fine features and the mel-frequency cepstrum coefficient features into new feature vectors, and sending the fused features into a random forest model for training, wherein five-fold cross verification is adopted during training;
The random forest model is processed as follows: randomly sampling the training set for a plurality of times by using a self-service method, wherein the training sample obtained by sampling is theta;
for each training sample θ, forming a corresponding decision tree; in each constructed decision tree, Q features are randomly selected from Q features of each training sample theta for classification, wherein Q is less than or equal to Q,and->Representation pairA downward rounding operation; calculating the non-purity of the selected q characteristics, and considering the characteristic with the minimum non-purity of the base as the optimal classification characteristic; according to the selected optimal classification characteristics, dividing the node where the optimal classification characteristics are located into two parts, and selecting sub-optimal classification characteristics from the residual characteristics; repeating the steps until the theta residual feature quantity is 0 or the classification effect of the current decision tree is optimal;
combining all 30 decision trees to obtain a random forest model, and obtaining a final classification result through a voting method.
The random forest model can process input samples with high-dimensional characteristics, and dimension reduction is not needed; the basic unit of the random forest model is a decision tree, and the final result is obtained through voting or averaging by combining a plurality of weak classifiers, so that the result of the whole model has higher accuracy and generalization performance.
Compared with the prior art, the invention has the following advantages and effects:
(1) Compared with the traditional large-scale description of the frequency domain features, the invention focuses on describing the frequency spectrum fine structure, provides a plurality of indexes for representing the fine features of frequency spectrum harmonic peaks, burrs, harmonic troughs and the like, and can more accurately represent the frequency spectrum structure for determining tone. The frequency domain fine features can further represent the frequency spectrum nuances caused by the performance and vibration starting characteristics of the musical instrument, the resonance cavity characteristics of the musical instrument, and the like. For example, small-size and violin logarithmic spectrograms have burrs around the harmonic peak maxima, which have a larger ratio than the harmonic peak maxima, whereas the harmonic peaks in the logarithmic spectrograms of pianos are very sharp, which is not the case. The frequency domain fine features may describe this spectral nuances more accurately than conventional frequency domain features.
(2) Compared with the traditional time domain feature, the time domain fine feature provided by the invention accurately marks the interval of the time domain single fundamental wave period and records the fundamental frequency change sequence, and the fine change of the fundamental frequency is described in an important way. For different musical instruments, the fundamental frequency stability of single-tone audio frequency is different, for example, the fundamental frequency stability is weaker than that of a piano string which freely vibrates after being struck by strings by small numbers of the fundamental frequency controlled by lip vibration, and the time domain fine features provided by the invention can accurately represent the difference, but the traditional time domain features do not pay attention to the characteristics.
(3) The musical instrument classification error rate of the random forest model input with the fusion features is reduced by 5.7 percent compared with that of a random forest model only using 13-dimensional mel frequency cepstrum coefficient static features.
(4) The application designs a cyclic iteration fundamental frequency estimation algorithm based on an autocorrelation algorithm and narrowband spectrum energy frequency estimation to carry out fundamental frequency estimation, and realizes the effect of accurately estimating fundamental frequency under the condition of low signal-to-noise ratio. According to the estimated fundamental frequency, the application designs a harmonic marking algorithm based on self-adaptive window search and a time domain single-period fundamental frequency sequence estimation algorithm to accurately mark the harmonic position of a frequency spectrum and acquire a single-period fundamental frequency sequence, so that the problem of harmonic deviation during harmonic marking is overcome on one hand, the accurate marking of the harmonic is realized, and on the other hand, the single-period fundamental frequency sequence is based on continuous time domain single period and can better represent the continuous change of the time domain fundamental frequency.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a flow chart of a musical instrument classification method based on time-frequency fine analysis in an embodiment of the present application;
FIG. 2 is a flowchart of a cyclic iterative fundamental frequency estimation algorithm based on an autocorrelation algorithm and a narrowband spectral energy frequency estimation algorithm in an embodiment of the invention;
FIG. 3 is a flow chart of a harmonic tagging algorithm based on adaptive window searching in an embodiment of the present invention;
fig. 4 is a flowchart of a time domain single period baseband sequence estimation algorithm in an embodiment of the invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
Fig. 1 is a flowchart of a musical instrument classification method based on time-frequency fine analysis according to an embodiment of the present invention.
S1, inputting single-tone audio of a musical instrument, and accurately estimating fundamental frequency of the audio;
FIG. 2 is a flowchart of a cyclic iterative fundamental frequency estimation algorithm based on an autocorrelation algorithm and a narrowband spectral energy frequency estimation algorithm in an embodiment of the invention;
S101, detecting a single-tone audio end point of a musical instrument by adopting a short-time energy method, intercepting effective audio, and calculating the short-time energy by the following formula:
in s [ n ]]Is the input musical instrument single tone audio signal, w [ n ]]Is a window function, n is s [ n ]]And w [ n ]]Time domain index of (a),N f Is the frame length of the signal frame, z is the amount of shift of the window function, E z Is short-time energy corresponding to the translation amount z; e corresponding to different translation amounts z Forming a short-time energy sequence; setting a threshold value for short-time energy of a starting and stopping endpoint of the effective audio of the single-tone audio of the musical instrument, recording the midpoint of the current frame as the starting endpoint of the effective audio when the short-time energy of the current frame exceeds the threshold value of the starting endpoint, and recording the midpoint of the current frame as the ending endpoint of the effective audio when the short-time energy of the current frame is lower than the threshold value of the ending endpoint;
in this example, the detection of the starting point of the monophonic audio signal is mainly performed, and the short-time energy threshold value adopted is 0.3 times of the maximum value of the short-time energy of all the audio frames;
s102, roughly estimating effective audio x [ n ] through time domain autocorrelation]Fundamental frequency f of (f) 0 The discrete signal autocorrelation formula is as follows:
where τ represents the hysteresis, i.e., the amount of translation, x [ n ]]Shifting tau units to the left, multiplying the shifted tau units by the original signal, and finally accumulating and summing the multiplied results to obtain an autocorrelation value under the current hysteresis value; the autocorrelation value is maximum when the hysteresis value is zero. Finding out maximum value point except zero according to the relation between the autocorrelation value and different hysteresis values, wherein the difference between the maximum value point and the hysteresis value of the zero hysteresis value point is the approximate estimation of fundamental wave period, thereby estimating fundamental frequency f 0
S103, performing fast Fourier transform (Fast Fourier Transform, FFT) on the audio signal, and recording the frequency f corresponding to the maximum value of the current FFT max The method comprises the steps of carrying out a first treatment on the surface of the Judging f according to the time domain autocorrelation result max Whether it is an interfering component; if so, removing the component from the FFT spectrum and re-acquiring the frequency f corresponding to the maximum FFT value max Repeating the above process until f is considered max Not interfering components;
the fundamental frequency f estimated by autocorrelation in this example 0 The FFT point number is 4096 points at less than 400 HzThe FFT point number is 2048 points when the frequency is more than or equal to 400 Hz;
s104, calculating a denoising frequency value f of a maximum peak corresponding frequency value of the fast Fourier transform by adopting narrowband spectrum energy frequency estimation c Eliminating noise interference; calculating a denoised frequency value f of the FFT spectrum using the narrowband spectral energy frequency estimate with the index of the FFT spectrum array starting at 1 c The formula of (2) is as follows:
wherein k is 0 Is the FFT index corresponding to the maximum peak of the FFT, k is the index of the FFT spectrum, |X k I is the FFT module value corresponding to the kth point in the FFT spectrum, F s Is the sampling rate of the audio signal, N is the FFT point number;
s105, according to the denoising frequency value f c And the fundamental frequency f of the autocorrelation estimation 0 Calculate that the maximum peak is located at about n m Subharmonic; wherein:
obtaining the fundamental frequency estimated value as f 1 :
S106, according to the fundamental frequency estimation value f 1 Resampling the audio time domain signal, new sampling rate f s Is calculated according to the formula:
f s =f 1 ×2 m formula (6)
Wherein the value of m is as follows:
ensuring new samplingThe sampling point number N cannot cover enough fundamental wave period due to the fact that the rate is not too large; re-executing the re-sampled time domain signal from step S102 until the fundamental frequency estimated value f 1 Compared with the estimated value of the fundamental frequency in the last cycle, the variation amplitude is smaller than 0.001 or the cycle number reaches 50 times of limit, and the estimated value f of the fundamental frequency in the last time is output 1
S2, carrying out FFT frequency domain harmonic marking according to the size of the fundamental frequency of the audio;
FIG. 3 is a flow of an adaptive window search based harmonic tagging algorithm in an embodiment of the invention;
s201, detecting an audio endpoint of the musical instrument by adopting a short-time energy method, and intercepting an effective audio part; in this example, the detection of the starting point of the monophonic audio signal is mainly performed, and the short-time energy threshold value adopted is 0.3 times of the maximum value of the short-time energy of all the audio frames;
s202, estimating the value f of the fundamental frequency according to the step S1 1 Resampling the audio time domain signal to make the sampling rate of the audio be f 1 To the power of (a), reduces spectrum leakage and converts the FFT spectrum to a logarithmic spectrum; in this embodiment, since a fine spectrum analysis is required, the FFT point number at the time of harmonic labeling is 16384 points.
S203, determining a left endpoint l and a right endpoint r of the next harmonic peak search range on the FFT spectrum, and if the next harmonic peak search range is the first 3 times harmonic, the calculation formula is as follows:
where P is the index of the position of the last harmonic peak in the FFT spectrum, if the position of the first harmonic peak is marked, p=1;
if the harmonic is marked 4 th harmonic and later, the calculation formulas of the left endpoint l and the right endpoint r are as follows:
l=p+bt-2 formula (10)
r=P+1.09T 1 Formula (11)
Wherein bT is half the distance between the third harmonic peak and the first harmonic peak on the FFT spectrum; t (T) 1 The two-harmonic FFT point number interval is the closest to the current marked harmonic of the FFT spectrum;
s204, all maximum points are obtained in the search range, and the maximum point of all the maximum points is obtained as the obtained peak point of the current harmonic; if no maximum point exists in the search range, the left and right boundaries of the search range are expanded when the first fourth harmonic is marked, and only the right boundary is expanded when other harmonics are marked; when expanding the search boundary of the first two harmonics, the left and right endpoints l and r satisfy the following conditions:
when expanding the search boundaries of the 3 rd and 4 th harmonics, the left and right endpoints l and r satisfy the following conditions:
l-P≥0.7T 1 formula (14)
r-P≤1.3T 1 Formula (15)
When the search boundary of the 5 th order and above harmonics is extended, the right end point r satisfies the following condition:
r-P≤1.3T 1 formula (16)
Then, the maximum value point of all maximum value points is calculated in the new searching range and is used as the calculated peak value point of the current harmonic, all FFT logarithmic magnitude values and index values of the maximum value points in the harmonic interval of 0.1 times with the maximum value point as the center are recorded and respectively stored in a matrix HWB and a series HWI, and each row of HWB stores the logarithmic value of each harmonic peak and the frequency spectrum logarithmic value nearby the logarithmic value of each harmonic peak; if the maximum point is still not found, the searching range is continuously expanded, and the method is repeated.
S3, estimating a time domain single period fundamental frequency sequence according to the fundamental frequency of the audio frequency, wherein the process comprises the following steps:
FIG. 4 is a flow chart of a time domain single period baseband sequence estimation algorithm in an embodiment of the invention;
s301, detecting an audio endpoint of a musical instrument by adopting a short-time energy method, and intercepting an effective audio part; in this example, the detection of the starting point of the monophonic audio signal is mainly performed, and the short-time energy threshold value adopted is 0.3 times of the maximum value of the short-time energy of all the audio frames;
s302, filtering low-frequency interference by using a high-pass filter. The cut-off frequency is set in this example to the fundamental frequency estimate f of step S1 1 Is 0.3 times as large as the above.
S303, marking a demarcation point of a single-base wave period by using a cross-correlation operation, and taking a certain point o in the middle of an audio signal as an origin, extending towards two sides at equal intervals, wherein the extending distance is equal to the length of the single-base wave period obtained by the last operation, the signal in the extending range is a shift vector, and the length of the shift vector is twice the length of the single-base wave period obtained by the last operation, so that the self-adaptive adjustment is characterized; if the first cross-correlation operation is performed, the single period length refers to the fundamental frequency estimation result in the step S2; taking the o point as an origin, and taking signals with four single-base wave period lengths to the left or right as fixed vectors, wherein the single-base wave period length is also the single-base wave period length obtained by the last operation; the shift vector is used as a sliding window and a fixed vector to carry out cross-correlation operation, so that the corresponding relation between a cross-correlation value and relative displacement is obtained; recording extreme point sequences and corresponding relative displacement sequences of the cross-correlation values from the corresponding relations; selecting a reasonable value from the relative displacement sequence, and calculating a next single-base wave period separation point by taking the o point as a reference, wherein the distance between the separation point and the o is the single-base wave period length; and circularly reciprocating to obtain a time domain single period fundamental frequency sequence, wherein the calculation formula of the cross-correlation r [ tau ] is as follows:
In the formula, τ represents hysteresis value, i.e. translation quantity, g [ n ] and v [ n ] are two different signals, v [ n ] is translated leftwards by a distance of τ units and then multiplied by g [ n ], and the multiplied results are accumulated and summed, i.e. similarity under the current hysteresis value.
S4, extracting time-frequency fine features and Mel frequency cepstrum coefficient features, wherein the process is as follows:
s401, extracting frequency domain fine features, including: the logarithmic value is larger than zero harmonic energy centroid, harmonic peak burr average attenuation ratio, logarithmic value is larger than zero maximum harmonic frequency, slope bandwidth ratio between two harmonics, harmonic peak-valley ratio, harmonic energy drop rate, odd harmonic to even harmonic energy ratio of the first 6 harmonics, first harmonic peak-valley ratio, slope bandwidth ratio drop rate between harmonics, frequency domain logarithmic value is larger than ratio of zero harmonic energy to total energy, harmonic peak-valley difference ratio half harmonic interval, first harmonic peak-valley difference ratio half harmonic interval;
from the HWB obtained in step S2, a log-amplitude series P [ n ] of the maximum value point of each subharmonic peak can be obtained h ]And setting the number less than or equal to zero in the logarithmic magnitude array to zero, and calculating the index from 1. The calculation formula for the logarithmic value larger than zero harmonic energy centroid c is as follows:
wherein L is P Is P [ n ] h ]Is also the maximum harmonic order, n h Is the current harmonic frequency, also P [ n ] h ]Index of (2);
the maximum harmonic number with the logarithmic value greater than zero is the logarithmic magnitude array Pn h ]The number less than or equal to zero is set to be zero, and the index is started from 1;
calculation of the average reduction ratio of harmonic peaks and burrs requires that, in the case where the logarithmic spectrum of the FFT spectrum in step S2 is greater than zero, twenty non-harmonic peak maxima A [ n ] closest to the harmonic peak maximum in a harmonic spacing range centered on the harmonic peak maximum are obtained p ],n p Is A [ n ] p ]If the number of maxima in the range is less than twenty, then takingAll of them; if the number of maximum values is less than or equal to 10, directly taking an p ]The mean value of (2) is taken as the average amplitude of burrs around the harmonic wave crest; if the maximum number is greater than 10, then at an p ]Searching a maximum value group with the length of 10 continuous, wherein the FFT index span of the maximum value group comprises an index where the maximum value of the harmonic peak is located or no other maximum value exists between the index and the index where the maximum value of the harmonic peak is located and the sum is maximum; obtaining the average value of the maximum value group as the average amplitude of burrs around the harmonic peaks; let the average amplitude of the burrs around the harmonic peaks be a, the harmonic peak burr attenuation ratio PDR of the single harmonic is as follows:
Wherein p is the maximum amplitude of the current harmonic peak, and the average of the harmonic peak burr weakening ratios of a plurality of harmonics is obtained to obtain the average weakening ratio of the harmonic peak burrs;
let the logarithmic spectrum of the FFT spectrum of step S2 be mag [ k ]]HWI as determined in step S2 is mag [ k ]]The index sequence where the maximum value of the medium harmonic peak is located is called hi n for short h ]All indexes of the series start from 1, and the calculation formula of the slope bandwidth ratio SFWR between the 3 rd harmonic and the 4 th harmonic is as follows:
when calculating the bandwidth ratio decline rate of the inter-harmonic slope, firstly calculating the bandwidth ratio of the direct current component and the first-harmonic slope in the logarithmic spectrum as r 1 Calculating the inter-harmonic slope bandwidth ratio of each harmonic spacing between the 5 th harmonic and the 8 th harmonic and taking the average value as r 2 If the 8 th harmonic wave is less than, the highest harmonic wave is obtained; the calculation formula of the inter-harmonic slope bandwidth ratio degradation rate SFWRD is as follows:
for harmonic peak-to-valley ratio baseCalculating the part with the logarithmic value larger than zero in the FFT logarithmic spectrum to make mag [ k ]]The elements less than or equal to zero are zero, and a new number array MAG [ k ] is obtained]The method comprises the steps of carrying out a first treatment on the surface of the For a harmonic peak, which has two left and right troughs, the left trough value is v p1 Its value is MAG [ k ]]A mean value within a trough range of the left harmonic spacing; the right wave valley value is v p2 Its value is MAG [ k ]]A mean value within a trough range of the right harmonic spacing; in the calculation of the peak-to-valley ratio of the M-order harmonic p1 The calculation formula of (2) is as follows:
wherein L is the left boundary of the trough value calculation and R is the right boundary of the trough value calculation;
the calculation formula of v2 in the calculation of the M-order harmonic peak-to-valley ratio is as follows:
/>
the logarithmic magnitude sequence of the maximum point of each subharmonic peak is known as Pn h ]The calculation formula of the Mth harmonic peak-to-valley ratio MR is as follows:
where PM represents the logarithmic magnitude of the maximum point of the peak of the Mth harmonic. Calculating the average value of harmonic peak-to-valley ratios of multiple harmonics as a characteristic parameter of the harmonic peak-to-valley ratios;
the first harmonic peak-to-valley ratio is a special case of calculating the Mth harmonic peak-to-valley ratio, and the steps are similar; in the calculation of the first harmonic peak-to-valley ratio v p1 The calculation formula of (2) is as follows:
in the calculation of the first harmonic peak-to-valley ratio v p2 The calculation formula of (2) is as follows:
the first harmonic peak-to-valley ratio FHPVR is calculated as follows:
the calculation of the harmonic energy reduction rate is divided into two cases that the number of the maximum harmonic times num with the value larger than zero is larger than four and smaller than or equal to four; in the first case, let head be the second largest log amplitude of the first third harmonic, its harmonic order be s, let tail be the mean value of the harmonics with log amplitude greater than zero among num-2, num-1, num subharmonics, in which case the calculation formula of the harmonic energy reduction rate HDR is:
If the logarithmic value is greater than zero and the maximum harmonic number num is less than or equal to four, the head is the maximum logarithmic magnitude value in the previous num-1 harmonic, the corresponding harmonic number is s, and tail is the num harmonic logarithmic magnitude value; the calculation formula of the harmonic energy reduction rate HDR in this case is:
in the calculation of the energy ratio of the odd harmonic and the even harmonic of the first 6 times, let the matrix HWB be HB ij Odd harmonic and even harmonic energy ratio R of the first 6 harmonics oe6 The calculation formula of (2) is as follows:
wherein the row-column indices of the matrix are all 1, wid is HB ij The length of a row;
when the frequency domain logarithmic value is greater than the ratio of zero harmonic energy to total energy, the element less than or equal to zero in the matrix HWB is made to be zero to obtain a new matrix UPHB ij The calculation formula for the frequency domain logarithmic value being greater than the ratio of zero harmonic energy to total energy UHTR is as follows:
where row and col are the matrices UPHB, respectively ij Row and column numbers of (a);
the calculation of the harmonic peak-valley difference ratio frequency spacing requires the calculation of the harmonic trough value v and the harmonic peak maximum logarithmic value PM for the M-th harmonic wave]The absolute value of the difference is the M-order harmonic peak-to-valley difference ratio frequency spacing compared with the frequency spacing of the harmonic trough and peak, and the characteristic value is averaged for a plurality of harmonics to obtain the final harmonic peak-to-valley difference ratio frequency spacing value; calculating and averaging harmonics with harmonic peak log values greater than zero in 5 th to 8 th harmonics in the logarithmic spectrum; for a harmonic peak, which has two left and right troughs, the left trough value is v p1 The right trough value is v p2 ;v p1 And v p2 Is consistent with the algorithm in the M-order harmonic peak-to-valley ratio; the calculation formula of the M harmonic peak-to-valley difference ratio frequency spacing DFR is as follows:
calculating DFR for a plurality of harmonic waves and taking an average value to obtain a harmonic crest valley difference ratio frequency interval value;
the first harmonic peak-to-valley difference ratio frequency spacing FDFR is a special case of the M-order harmonic peak-to-valley difference ratio frequency spacing, and has a left-to-right trough value v p1 And v p2 Calculating the peak-to-valley ratio of the first harmonic; the calculation formula is as follows:
the 1 st and 2 nd harmonic energy ratio is the ratio of the maximum logarithmic magnitude of the first harmonic to the maximum logarithmic magnitude of the second harmonic;
s402, extracting time domain fine features, including single-period fundamental frequency sequence standard deviation, single-period fundamental frequency sequence mean value, single-period fundamental frequency sequence median, single-period fundamental frequency sequence mean value and median distance, single-period fundamental frequency sequence unit time average passing times; let the time domain single period base frequency sequence obtained in step S3 be Fn T ]The length of the fiber is Len,n T representing the sequence number of a time domain single period. Single period fundamental frequency sequence mean f mean The calculation formula of (2) is as follows:
standard deviation f of single period fundamental frequency sequence std The calculation formula of (2) is as follows:
for F [ n ] T ]Ordering is performed, the median f of the single-period base frequency sequence median The calculation formula of (2) is as follows:
Single period fundamental frequency sequence mean and median distance f mm The calculation formula of (2) is as follows:
f mm =|f mean -f median i formula (45)
Single period base frequency sequence unit time over average number N c The calculation formula of (2) is as follows:
wherein T is 1 Is the duration of the time-domain audio signal, u (t) is expressed as follows:
s403, extracting static characteristics of 13-dimensional Meier frequency cepstrum coefficients: firstly, preprocessing input audio data, secondly, performing FFT (fast Fourier transform) on the audio, obtaining an energy spectrogram of the audio by square values of FFT modular values, then filtering by using a Mel filter bank to obtain energy output of each filter, and finally, performing logarithmic operation on the filter output and obtaining Mel frequency cepstrum coefficient characteristics by discrete cosine transform (Discrete Cosine Transform, DCT).
S5, based on the extracted video fine features and the Mel frequency cepstrum coefficient features, performing feature correlation and variation coefficient evaluation, and establishing a random forest model to classify musical instruments as follows:
s501, calculating the Pearson correlation coefficient among the elements of the feature vector and evaluating the feature effectiveness;
s502, calculating variation coefficients of elements of the feature vector and evaluating feature effectiveness;
s503, fusing the time-frequency fine features and the mel frequency cepstrum coefficient features into new feature vectors, inputting the fused features of 822 audios of 5 musical instruments into a random forest model for training, and adopting five-fold cross verification during training; then, the static characteristics of 13-dimensional Meier frequency cepstrum coefficient are independently input into a random forest model for training, and five-fold cross verification is adopted during training; the random forest can process input samples with high-dimensional characteristics, and dimension reduction is not needed; the basic unit of the random forest is a decision tree, and the final result is obtained by voting or averaging through combining a plurality of weak classifiers, so that the result of the overall model has higher accuracy and generalization performance; the final instrument classification error rate of the random forest model input the fusion features obtained in this example was reduced by 6% compared to the random forest model using only 13-dimensional mel frequency cepstrum coefficient static features, as shown in table 1.
TABLE 1 feature class and Classification accuracy Meter
Example 2
Based on the musical instrument classification method based on the time-frequency fine analysis disclosed in the above embodiment 1, the present embodiment continues to provide a classification implementation process of the musical instrument classification method based on the time-frequency fine analysis, which specifically includes the following steps:
s1, referring to the corresponding steps in the embodiment 1, the fundamental frequency f of the input violin single-tone audio is obtained 1
S2, recording all FFT logarithmic magnitude values and index values of maximum value points in 0.1 times harmonic spacing with the maximum value point of each subharmonic as the center, respectively storing the values into a matrix HWB and a series HWI, and storing each harmonic peak logarithmic value and the frequency spectrum logarithmic value nearby each line of the HWB;
s3, referring to the corresponding steps in the embodiment 1, the description is omitted here;
s4, extracting time-frequency fine features and 13-dimensional Meyer frequency cepstrum coefficient static features, and automatically calculating each feature by adopting MATLAB codes to obtain each feature value of the audio as shown in the following table.
TABLE 2 time-frequency Fine characterization and 13-dimensional Meyer frequency cepstrum coefficient static characterization of corresponding Audio
/>
S5, inputting the characteristics into a random forest model of the fusion characteristics based on the first embodiment, and obtaining a classifying result of the violin by inputting the characteristics into the model, wherein the classifying result is correct.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims (6)

1. A musical instrument classification method based on time-frequency fine analysis, characterized in that the musical instrument classification method comprises the following steps:
s1, inputting single-tone audio of a musical instrument, and accurately estimating fundamental frequency of the audio;
s2, carrying out FFT frequency domain harmonic marking according to the size of the fundamental frequency of the audio;
s3, estimating a time domain single-period fundamental frequency sequence according to the fundamental frequency of the audio;
s4, extracting time-frequency fine features and Mel frequency cepstrum coefficient features;
and S5, based on the extracted video fine features and the Mel frequency cepstrum coefficient features, evaluating feature correlation and variation coefficients, and establishing a random forest model to classify musical instruments.
2. The musical instrument classification method based on time-frequency fine analysis according to claim 1, wherein the step S1 process is as follows:
s101, detecting a single-tone audio end point of a musical instrument by adopting a short-time energy method, intercepting effective audio, and calculating the short-time energy by the following formula:
In s [ n ]]Is the input musical instrument single tone audio signal, w [ n ]]Is a window function, n is s [ n ]]And w [ n ]]Time domain index, N f Is the frame length of the signal frame, z is the amount of shift of the window function, E z Is short-time energy corresponding to the translation amount z; e corresponding to different translation amounts z Forming a short-time energy sequence; setting a threshold value for short-time energy of a starting and stopping endpoint of the effective audio of the single-tone audio of the musical instrument, recording the midpoint of the current frame as the starting endpoint of the effective audio when the short-time energy of the current frame exceeds the threshold value of the starting endpoint, and recording the midpoint of the current frame as the ending endpoint of the effective audio when the short-time energy of the current frame is lower than the threshold value of the ending endpoint;
s102, roughly estimating effective audio x [ n ] through time domain autocorrelation]Fundamental frequency f of (f) 0 The discrete signal autocorrelation formula is as follows:
where τ represents the hysteresis, i.e., the amount of translation, x [ n ]]Shifting tau units to the left, multiplying the shifted tau units by the original signal, and finally accumulating and summing the multiplied results to obtain an autocorrelation value under the current hysteresis value; the autocorrelation value is maximum when the hysteresis value is zero; finding out maximum value point except zero according to the relation between the autocorrelation value and different hysteresis values, wherein the difference between the maximum value point and the hysteresis value of zero is the approximate estimation of fundamental wave period, thereby estimating fundamental frequency f 0
S103, performing Fast Fourier Transform (FFT) on the audio signal, and recording the frequency f corresponding to the maximum value of the current FFT max The method comprises the steps of carrying out a first treatment on the surface of the Judging f according to the time domain autocorrelation result max Whether it is an interfering component; if so, removing the component from the FFT spectrum and re-acquiring the frequency f corresponding to the maximum FFT value max Repeating the above process until f is considered according to the time domain autocorrelation result max Not interfering components;
s104, calculating a denoising frequency value f of a maximum peak corresponding frequency value of the fast Fourier transform by adopting narrowband spectrum energy frequency estimation c Eliminating noise interference; calculating a denoised frequency value f of the FFT spectrum using the narrowband spectral energy frequency estimate with the index of the FFT spectrum array starting at 1 c The formula of (2) is as follows:
wherein k is 0 Is the FFT index corresponding to the maximum peak of the FFT, k is the index of the FFT spectrum, |X k I is the FFT module value corresponding to the kth point in the FFT spectrum, F s Is the sampling rate of the audio signal, N is the FFT point number;
s105, according to the denoising frequency value f c And autocorrelation estimated basisFrequency f 0 Calculate that the maximum peak is located at about n m Subharmonic; wherein:
obtaining the fundamental frequency estimated value as f 1 :
S106, according to the fundamental frequency estimation value f 1 Resampling the audio time domain signal, new sampling rate f s Is calculated according to the formula:
f s =f 1 ×2 m formula (6)
Wherein the value of m is as follows:
re-executing the re-sampled time domain signal from step S102 until the fundamental frequency estimated value f 1 Compared with the estimated value of the fundamental frequency in the last cycle, the variation amplitude is smaller than 0.001 or the cycle number reaches 50 times of limit, and the estimated value f of the fundamental frequency in the last time is output 1
3. The method according to claim 1, wherein in the step S2, the FFT frequency domain harmonic labeling process according to the size of the fundamental frequency of the audio is as follows:
s201, detecting an audio endpoint of the musical instrument by adopting a short-time energy method, and intercepting an effective audio part;
s202, estimating the value f of the fundamental frequency according to the step S1 1 Resampling the audio time domain signal to make the sampling rate of the audio be f 1 To the power of (a), and converting the FFT spectrum to a logarithmic spectrum;
s203, determining a left endpoint l and a right endpoint r of the next harmonic peak search range on the FFT spectrum, and if the next harmonic peak search range is the first 3 times harmonic, the calculation formula is as follows:
where P is the index of the position of the last harmonic peak in the FFT spectrum, if the position of the first harmonic peak is marked, p=1;
if the harmonic is marked 4 th harmonic and later, the calculation formulas of the left endpoint l and the right endpoint r are as follows:
l=p+bt-2 formula (10)
r=P+1.09T 1 Formula (11)
Wherein bT is half the distance between the 3 rd harmonic peak and the 1 st harmonic peak on the FFT spectrum; t (T) 1 The two-harmonic FFT point number interval is the closest to the current marked harmonic of the FFT spectrum;
s204, all maximum points are obtained in the search range, and the maximum point of all the maximum points is obtained as the obtained peak point of the current harmonic; if no maximum point exists in the search range, the left and right boundaries of the search range are expanded when the first 4 times of harmonic waves are marked, and only the right boundary is expanded when the other times of harmonic waves are marked; when expanding the search boundary of the first two harmonics, the left and right endpoints l and r satisfy the following conditions:
when expanding the search boundaries of the 3 rd and 4 th harmonics, the left and right endpoints l and r satisfy the following conditions:
l-P≥0.7T 1 formula (14)
r-P≤1.3T 1 Formula (15)
When the search boundary of the 5 th order and above harmonics is extended, the right end point r satisfies the following condition:
r-P≤1.3T 1 formula (16)
Then, the maximum value point of all maximum value points is calculated in the new searching range and is used as the calculated peak value point of the current harmonic, all FFT logarithmic magnitude values and index values of the maximum value points in the harmonic interval of 0.1 times with the maximum value point as the center are recorded and respectively stored in a matrix HWB and a series HWI, and each row of HWB stores the logarithmic value of each harmonic peak and the frequency spectrum logarithmic value nearby the logarithmic value of each harmonic peak; if the maximum point is still not found, the search range is continuously expanded, and the loop is repeated until the maximum point is found.
4. The musical instrument classification method based on time-frequency fine analysis according to claim 1, wherein in the step S3, the process of estimating the time-domain single-period fundamental frequency sequence according to the fundamental frequency size of the audio is as follows:
s301, detecting an audio endpoint of a musical instrument by adopting a short-time energy method, and intercepting an effective audio part;
s302, filtering low-frequency interference by using a high-pass filter;
s303, marking a demarcation point of a single-base wave period by using a cross-correlation operation, and taking a certain point o in the middle of an audio signal as an origin, extending towards two sides at equal intervals, wherein the extending distance is equal to the length of the single-base wave period obtained by the last operation, the signal in the extending range is a shift vector, and the length of the shift vector is twice the length of the single-base wave period obtained by the last operation, so that the self-adaptive adjustment is characterized; if the first cross-correlation operation is performed, the single period length is the fundamental frequency estimation result of the step S2; taking the o point as an origin, and taking signals with four single-base wave period lengths to the left or right as fixed vectors, wherein the single-base wave period length is also the single-base wave period length obtained by the last operation; performing cross-correlation operation by taking the shift vector as a sliding window and a fixed vector to obtain a corresponding relation between a cross-correlation value and relative displacement; recording extreme point sequences and corresponding relative displacement sequences of the cross-correlation values from the corresponding relations; selecting a reasonable value from the relative displacement sequence, and calculating a next single-base wave period separation point by taking the o point as a reference, wherein the distance between the separation point and the o is the single-base wave period length; and circularly reciprocating to obtain a time domain single period fundamental frequency sequence, wherein the calculation formula of the cross-correlation r [ tau ] is as follows:
In the formula, τ represents hysteresis value, i.e. translation quantity, g [ n ] and v [ n ] are two different signals, v [ n ] is translated leftwards by a distance of τ units and then multiplied by g [ n ], and the multiplied results are accumulated and summed, i.e. similarity under the current hysteresis value.
5. The musical instrument classification method based on time-frequency fine analysis according to claim 4, wherein in the step S4, the process of extracting the time-frequency fine feature and mel-frequency cepstrum coefficient feature is as follows:
s401, extracting frequency domain fine features, including: the logarithmic value is larger than zero harmonic energy centroid, harmonic peak burr average attenuation ratio, logarithmic value is larger than zero maximum harmonic frequency, slope bandwidth ratio between two harmonics, harmonic peak-valley ratio, harmonic energy drop rate, odd harmonic to even harmonic energy ratio of the first 6 harmonics, first harmonic peak-valley ratio, slope bandwidth ratio drop rate between harmonics, frequency domain logarithmic value is larger than ratio of zero harmonic energy to total energy, harmonic peak-valley difference ratio half harmonic interval, first harmonic peak-valley difference ratio half harmonic interval;
from the HWB obtained in step S2, a log-amplitude series P [ n ] of the maximum value point of each subharmonic peak can be obtained h ]Setting the number smaller than or equal to zero in the logarithmic magnitude array to zero, calculating the index from 1, and calculating the harmonic energy centroid c with the logarithmic value larger than zero by the following formula:
Wherein L is P Is P [ n ] h ]Is also the maximum harmonic order, n h Is the current harmonic frequency, also P [ n ] h ]Index of (2);
the maximum harmonic number with the logarithmic value greater than zero is the logarithmic magnitude array Pn h ]The number less than or equal to zero is set to be zero, and the index is started from 1;
the calculation of the average attenuation ratio of harmonic peaks and burrs requires that 20 non-harmonic peak maxima A [ n ] closest to the harmonic peak maximum in a harmonic spacing range centered on the harmonic peak maximum be obtained in the case where the logarithmic spectrum of the FFT spectrum in step S2 is greater than zero p ],n p Is A [ n ] p ]If the number of maxima in the range is less than 20, taking all; if the number of maximum values is less than or equal to 10, directly taking an p ]The mean value of (2) is taken as the average amplitude of burrs around the harmonic wave crest; if the maximum number is greater than 10, then at an p ]Searching a maximum value group with the length of 10 continuous, wherein the FFT index span of the maximum value group comprises an index where the maximum value of the harmonic peak is located or no other maximum value exists between the index and the index where the maximum value of the harmonic peak is located and the sum is maximum; obtaining the average value of the maximum value group as the average amplitude of burrs around the harmonic peaks; let the average amplitude of the burrs around the harmonic peaks be a, the harmonic peak burr attenuation ratio PDR of the single harmonic is as follows:
Wherein p is the maximum amplitude of the current harmonic peak, and the average of the harmonic peak burr weakening ratios of a plurality of harmonics is obtained to obtain the average weakening ratio of the harmonic peak burrs;
let the logarithmic spectrum of the FFT spectrum obtained in step S2 be mag [ k ]]HWI as determined in step S2 is mag [ k ]]The index sequence where the maximum value of the medium harmonic peak is located is called hi n for short h ]The indexes of the series all start from 1, then the 3 rd and 4 th harmonicsThe calculation formula of the inter-slope bandwidth ratio SFWR is as follows:
when calculating the bandwidth ratio decline rate of the inter-harmonic slope, firstly calculating the bandwidth ratio of the direct current component and the first-harmonic slope in the logarithmic spectrum as r 1 Calculating the inter-harmonic slope bandwidth ratio of each harmonic spacing between the 5 th harmonic and the 8 th harmonic and taking the average value as r 2 If the 8 th harmonic wave is less than, the highest harmonic wave is obtained; the calculation formula of the inter-harmonic slope bandwidth ratio degradation rate SFWRD is as follows:
calculating the harmonic peak-to-valley ratio based on the part with the logarithmic value larger than zero in the FFT logarithmic spectrum to make mag [ k ]]The elements less than or equal to zero are zero, and a new number array MAG [ k ] is obtained]The method comprises the steps of carrying out a first treatment on the surface of the For a harmonic peak, which has two left and right troughs, the left trough value is v p1 Its value is MAG [ k ]]A mean value within a trough range of the left harmonic spacing; the right wave valley value is v p2 Its value is MAG [ k ]]A mean value within a trough range of the right harmonic spacing; in the calculation of the peak-to-valley ratio of the M-order harmonic p1 The calculation formula of (2) is as follows:
wherein L is the left boundary of the trough value calculation and R is the right boundary of the trough value calculation;
the calculation formula of v2 in the calculation of the M-order harmonic peak-to-valley ratio is as follows:
the logarithmic magnitude sequence of the maximum point of each subharmonic peak is known as Pn h ]The calculation formula of the Mth harmonic peak-to-valley ratio MR is as follows:
where PM represents the logarithmic magnitude of the maximum point of the peak of the Mth harmonic. Calculating the average value of harmonic peak-to-valley ratios of multiple harmonics as a characteristic parameter of the harmonic peak-to-valley ratios;
the first harmonic peak-to-valley ratio is a special case of calculating the Mth harmonic peak-to-valley ratio, and the steps are similar; in the calculation of the first harmonic peak-to-valley ratio v p1 The calculation formula of (2) is as follows:
in the calculation of the first harmonic peak-to-valley ratio v p2 The calculation formula of (2) is as follows:
the first harmonic peak-to-valley ratio FHPVR is calculated as follows:
the calculation of the harmonic energy reduction rate is divided into two cases that the number of the maximum harmonic times num with the value larger than zero is larger than four and smaller than or equal to four; in the first case, let head be the second largest log amplitude of the first third harmonic, its harmonic order be s, let tail be the mean value of the harmonics with log amplitude greater than zero among num-2, num-1, num subharmonics, in which case the calculation formula of the harmonic energy reduction rate HDR is:
If the logarithmic value is greater than zero and the maximum harmonic number num is less than or equal to four, the head is the maximum logarithmic magnitude value in the previous num-1 harmonic, the corresponding harmonic number is s, and tail is the num harmonic logarithmic magnitude value; the calculation formula of the harmonic energy reduction rate HDR in this case is:
in the calculation of the energy ratio of the odd harmonic and the even harmonic of the first 6 times, let the matrix HWB be HB ij Odd harmonic and even harmonic energy ratio R of the first 6 harmonics oe6 The calculation formula of (2) is as follows:
wherein the row-column indices of the matrix are all 1, wid is HB ij The length of a row;
when the frequency domain logarithmic value is greater than the ratio of zero harmonic energy to total energy, the element less than or equal to zero in the matrix HWB is made to be zero to obtain a new matrix UPHB ij The calculation formula for the frequency domain logarithmic value being greater than the ratio of zero harmonic energy to total energy UHTR is as follows:
where row and col are the matrices UPHB, respectively ij Row and column numbers of (a);
the calculation of the harmonic peak-valley difference ratio frequency spacing requires the calculation of the harmonic trough value v and the harmonic peak maximum logarithmic value PM for the M-th harmonic wave]The absolute value of the difference is the M-order harmonic peak-to-valley difference ratio frequency spacing compared with the frequency spacing of the harmonic trough and peak, and the characteristic value is averaged for a plurality of harmonics to obtain the final harmonic peak-to-valley difference ratio frequency spacing value; calculating and averaging harmonics with harmonic peak log values greater than zero in fifth to eighth harmonics in the logarithmic spectrum; for a harmonic peak, which has two left and right troughs, the left trough value is v p1 The right trough value is v p2 ;v p1 And v p2 Is consistent with the algorithm in the M-order harmonic peak-to-valley ratio; the calculation formula of the M harmonic peak-to-valley difference ratio frequency spacing DFR is as follows:
calculating DFR for a plurality of harmonic waves and taking an average value to obtain a harmonic crest valley difference ratio frequency interval value;
the first harmonic peak-to-valley difference ratio frequency spacing FDFR is a special case of the M-order harmonic peak-to-valley difference ratio frequency spacing, and has a left-to-right trough value v p1 And v p2 Calculating the peak-to-valley ratio of the first harmonic; the calculation formula is as follows:
the 1 st and 2 nd harmonic energy ratio is the ratio of the maximum logarithmic magnitude of the first harmonic to the maximum logarithmic magnitude of the second harmonic;
s402, extracting time domain fine features, including single-period fundamental frequency sequence standard deviation, single-period fundamental frequency sequence mean value, single-period fundamental frequency sequence median, single-period fundamental frequency sequence mean value and median distance, single-period fundamental frequency sequence unit time average passing times; let the time domain single period base frequency sequence obtained in step S3 be Fn T ]Length is Len, n T A sequence number representing a time domain single period; single period fundamental frequency sequence mean f mean The calculation formula of (2) is as follows:
standard deviation f of single period fundamental frequency sequence std The calculation formula of (2) is as follows:
for F [ n ] T ]Ordering is performed, the median f of the single-period base frequency sequence median The calculation formula of (2) is as follows:
Single period fundamental frequency sequence mean and median distance f mm The calculation formula of (2) is as follows:
f mm =|f mean -f median i formula (45)
Single period base frequency sequence unit time over average number N c The calculation formula of (2) is as follows:
wherein T is 1 Is the duration of the time-domain audio signal, u (t) is expressed as follows:
s403, extracting static characteristics of 13-dimensional Meier frequency cepstrum coefficients: firstly, preprocessing input audio data, secondly, performing FFT (fast Fourier transform) on the audio, obtaining an energy spectrogram of the audio by square values of FFT modular values, then filtering by using a Mel filter bank to obtain energy output of each filter, and finally, performing logarithmic operation on the filter output and obtaining Mel frequency cepstrum coefficient characteristics by discrete cosine transform DCT.
6. The method for classifying musical instruments based on time-frequency fine analysis according to claim 1, wherein in step S5, the characteristic correlation and the coefficient of variation are evaluated, and the process of classifying musical instruments by establishing a random forest model is as follows:
s501, calculating the Pearson correlation coefficient among the elements of the feature vector and evaluating the feature effectiveness;
s502, calculating variation coefficients of elements of the feature vector and evaluating feature effectiveness;
s503, fusing the time-frequency fine features and the mel-frequency cepstrum coefficient features into new feature vectors, and sending the fused features into a random forest model for training, wherein five-fold cross verification is adopted during training;
The processing process of the random forest model is as follows: randomly sampling the training set for a plurality of times by using a self-service method, wherein the training sample obtained by sampling is theta;
for each training sample θ, forming a corresponding decision tree; in each constructed decision tree, Q features are randomly selected from Q features of each training sample theta for classification, wherein Q is less than or equal to Q,and->Representation pair->A downward rounding operation; calculating the non-purity of the selected q characteristics, and considering the characteristic with the minimum non-purity of the base as the optimal classification characteristic; according to the selected optimal classification characteristics, dividing the node where the optimal classification characteristics are located into two parts, and selecting sub-optimal classification characteristics from the residual characteristics; repeating the steps until the theta residual feature quantity is 0 or the classification effect of the current decision tree is optimal;
combining all 30 decision trees to obtain a random forest model, and obtaining a final classification result through a voting method.
CN202310583602.3A 2023-05-23 2023-05-23 Musical instrument classification method based on time-frequency fine analysis Pending CN116682456A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310583602.3A CN116682456A (en) 2023-05-23 2023-05-23 Musical instrument classification method based on time-frequency fine analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310583602.3A CN116682456A (en) 2023-05-23 2023-05-23 Musical instrument classification method based on time-frequency fine analysis

Publications (1)

Publication Number Publication Date
CN116682456A true CN116682456A (en) 2023-09-01

Family

ID=87780198

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310583602.3A Pending CN116682456A (en) 2023-05-23 2023-05-23 Musical instrument classification method based on time-frequency fine analysis

Country Status (1)

Country Link
CN (1) CN116682456A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117727313A (en) * 2024-02-18 2024-03-19 百鸟数据科技(北京)有限责任公司 Intelligent noise reduction method for wild bird sound data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117727313A (en) * 2024-02-18 2024-03-19 百鸟数据科技(北京)有限责任公司 Intelligent noise reduction method for wild bird sound data
CN117727313B (en) * 2024-02-18 2024-04-23 百鸟数据科技(北京)有限责任公司 Intelligent noise reduction method for wild bird sound data

Similar Documents

Publication Publication Date Title
US7567900B2 (en) Harmonic structure based acoustic speech interval detection method and device
US8535236B2 (en) Apparatus and method for analyzing a sound signal using a physiological ear model
JP2019507912A (en) Song melody information processing method, server, and storage medium
JP5127982B2 (en) Music search device
CN112750442B (en) Crested mill population ecological system monitoring system with wavelet transformation and method thereof
CN116682456A (en) Musical instrument classification method based on time-frequency fine analysis
Cho et al. Sparse music representation with source-specific dictionaries and its application to signal separation
US9305570B2 (en) Systems, methods, apparatus, and computer-readable media for pitch trajectory analysis
US9570060B2 (en) Techniques of audio feature extraction and related processing apparatus, method, and program
Kumar et al. Wavelet transform-based multipitch estimation in polyphonic music
Ouzounov A robust feature for speech detection
KR100766170B1 (en) Music summarization apparatus and method using multi-level vector quantization
CN112735442B (en) Wetland ecology monitoring system with audio separation voiceprint recognition function and audio separation method thereof
Dziubiński et al. High accuracy and octave error immune pitch detection algorithms
Derrien A very low latency pitch tracker for audio to MIDI conversion
Nurdiyah et al. Gamelan orchestra transcription using neural network
RU2295163C1 (en) Method for recognition of music compositions and device for realization of method
Masuyama et al. Modal decomposition of musical instrument sounds via optimization-based non-linear filtering
Zhang et al. Maximum likelihood study for sound pattern separation and recognition
Rao et al. A comparative study of various pitch detection algorithms
Loni et al. Singing voice identification using harmonic spectral envelope
Kothe et al. Musical instrument recognition using wavelet coefficient histograms
NSKI et al. High accuracy and octave error immune pitch detection algorithms
JP4378098B2 (en) Sound source selection apparatus and method
Gaikwad et al. Tonic note extraction in indian music using hps and pole focussing technique

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination