US8779271B2 - Tonal component detection method, tonal component detection apparatus, and program - Google Patents
Tonal component detection method, tonal component detection apparatus, and program Download PDFInfo
- Publication number
- US8779271B2 US8779271B2 US13/780,179 US201313780179A US8779271B2 US 8779271 B2 US8779271 B2 US 8779271B2 US 201313780179 A US201313780179 A US 201313780179A US 8779271 B2 US8779271 B2 US 8779271B2
- Authority
- US
- United States
- Prior art keywords
- time
- fitting
- frequency
- peak
- tonal component
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H7/00—Instruments in which the tones are synthesised from a data store, e.g. computer organs
- G10H7/02—Instruments in which the tones are synthesised from a data store, e.g. computer organs in which amplitudes at successive sample points of a tone waveform are stored in one or more memories
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/02—Means for controlling the tone frequencies, e.g. attack or decay; Means for producing special musical effects, e.g. vibratos or glissandos
- G10H1/06—Circuits for establishing the harmonic content of tones, or other arrangements for changing the tone colour
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
Definitions
- the present technology relates to a tonal component detection method, a tonal component detection apparatus, and a program.
- Components constituting a one-dimensional time signal such as voice or music are broadly classified into three types of representations: (1) a tonal component, (2) a stationary noise component, and (3) a transient noise component.
- the tonal component corresponds to a component caused by the stationary and periodic vibration of a sound source.
- the stationary noise component corresponds to a component caused by a stationary but non-periodic phenomenon such as friction or turbulence.
- the transient noise component corresponds to a component caused by a non-stationary phenomenon such as a blow or a sudden change in a sound condition.
- the tonal component is a component that faithfully represents the intrinsic properties of a sound source itself, and thus it is particularly important when analyzing the sound.
- the tonal component obtainable from an actual sound may often be a plurality of sinusoidal components which are gradually changed over time.
- the tonal component may be represented, for example, as a horizontal stripe-shaped pattern on a spectrogram representing amplitudes of the short-time Fourier transform with a time series, as shown in FIG. 8 .
- FIG. 9 illustrates a spectrum in which frames in the vicinity of 0.2 seconds on the time axis in FIG. 8 are extracted.
- true tonal components to be detected for reference are indicated by directional arrows.
- the high-accuracy detection of the time and frequency in which the tonal components are present from such a spectrum becomes a fundamental process for many application techniques such as sound analysis, coding, noise reduction, and high-quality sound reproduction.
- a typical technique of detecting tonal components includes a method of obtaining an amplitude spectrum at each of the short time frames, detecting local peaks of the amplitude spectrum, and regarding all of the detected peaks as tonal components.
- One disadvantage of this method is that a large number of erroneous detections are made, because none of the local peaks becomes necessarily tonal components.
- FIG. 10 shows results obtained by detecting local peaks in amplitude spectrum on the spectrogram of FIG. 8 and the results are indicated by black dots. It will be found that the black horizontal stripes, i.e. tonal components shown in FIG. 8 are detected in the form of a horizontal line shape in FIG. 10 as well. However, on the other hand, it will be found that a large number of peaks are also detected from portions such as noise components.
- FIG. 11 shows results obtained by similarly detecting local peaks based on the spectrum of FIG. 9 , and the results are indicated by black dots. It will be found that there are a large number of erroneously detected peaks in FIG. 11 as compared to accurately detected tonal components in FIG. 9 .
- an approach for improving the detection accuracy may include, for example, (A) method of setting a threshold for the height of each local peak and then not detecting local peaks having a smaller value than the threshold, and (B) method of connecting local peaks across multiple frames in a time direction according to the local neighbor rule and then excluding components which are not connected more than a certain number of times.
- the method of (A) is assumed that the magnitude of tonal components is greater than that of noise components at all times. However, this assumption is unreasonable and is not true in many cases, thus its performance improvement will be limited. Actually, the magnitude of the peak erroneously detected in the vicinity of 2 kHz on the frequency axis of FIG. 11 is almost the same as that of the tonal component in the vicinity of 3.9 kHz, thus this assumption is not true.
- the method of (B) is disclosed in, for example, R. J. McAulay and T. F. Quatieri: “Speech Analysis/Synthesis Based on a Sinusoidal Representation,” IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 34, No. 4, 744/754 (August 1986), and J. O. Smith III and X. Serra, “PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation”, Proceedings of the International Computer Music Conference (1987).
- This method employs a property that tonal components have temporal continuity (e.g., in case of music, a tonal component is often continued for a period of time more than 100 ms). However, because peaks in any other components than the tonal components may be continued and a shortly segmented tonal component is not detected, it is not necessarily mean that sufficient accuracy can be achieved in many applications.
- a tonal component detection method including performing a time-frequency transformation on an input time signal to obtain a time-frequency distribution, detecting a peak in a frequency direction at a time frame of the time-frequency distribution, fitting a tone model in a neighboring region of the detected peak, and obtaining a score indicating tonal component likeness of the detected peak based on a result obtained by the fitting.
- the time-frequency distribution (spectrogram) can be obtained by performing the time-frequency transformation on the input time signal.
- the time-frequency transformation of the input time signal may be performed using a short-time Fourier transform.
- the time-frequency transformation of the input time signal may be performed using other transformation techniques such as a wavelet transform.
- the peak of the frequency direction is detected at each of the time frames in the time-frequency distribution.
- the tone model is fitted in a neighboring region of each of the detected peaks.
- a quadratic polynomial function in which a time and a frequency are set to variables may be used as the tone model.
- a cubic or higher-order polynomial function may be used.
- the fitting may be performed, for example, based on a least square error criterion of the tone model and a time-frequency distribution in the vicinity of each of the detected peaks.
- the fitting may be performed based on a minimum fourth-power error criterion, a minimum entropy criterion, and so on.
- a score indicating tonal component likeness of the detected peak may be obtained based on a result obtained by the fitting.
- the score indicating the tonal component likeness of the detected peak may be obtained using at least a fitting error extracted based on the result obtained by the fitting.
- the score indicating the tonal component likeness of the detected peak may be obtained using at least a peak curvature in a frequency direction extracted based on the result obtained by the fitting.
- the score indicating the tonal component likeness of the detected peak may be obtained by extracting a predetermined number of features and by combining the predetermined number of extracted features, based on the result obtained by the fitting.
- a non-linear function may be applied to the predetermined number of extracted features to obtain a weighted sum.
- the predetermined number of features may be at least one of a fitting error, a peak curvature in a frequency direction, a frequency of a peak, an amplitude value in a peak position, a rate of a change in a frequency, or a rate of a change in amplitude that are obtained by the tone model on which the fitting is performed.
- the tone model can be fitted in a neighboring region of each peak in the frequency direction detected from the time-frequency distribution (spectrogram), and the score indicating the tonal component likeness of each of the detected peaks can be obtained based on results obtained by the fitting. Therefore, it is possible to accurately detect tonal components.
- FIG. 1 is a block diagram illustrating an exemplary configuration of a tonal component detection apparatus according to an embodiment of the present technology
- FIG. 2 is a schematic diagram for explaining the property that a quadratic polynomial function is well fitted in the vicinity of a tonal spectral peak but it is not well fitted in the vicinity of a noise spectral peak;
- FIG. 3 is a schematic diagram illustrating the change of tonal peaks in the time direction and the fitting performed in a small region F on the spectrogram;
- FIG. 4 is a block diagram illustrating an exemplary configuration of computer equipment which performs a tonal component detection process in software
- FIG. 5 is a flowchart illustrating an exemplary procedure of the tonal component detection process performed by a CPU of the computer equipment
- FIG. 6 is a diagram illustrating an example of a tonal component detection result to explain advantageous effects obtainable from an embodiment of the present technology
- FIG. 7 is a diagram illustrating an example of a tonal component detection result to explain advantageous effects obtainable from an embodiment of the present technology
- FIG. 8 is a diagram illustrating an example of a voice spectrogram
- FIG. 9 is a diagram illustrating a spectrum in which predetermined time frames of the spectrogram are extracted.
- FIG. 10 is a diagram illustrating results obtained by detecting local peaks in amplitude spectrum of each frame on the spectrogram and representing the results by black dots;
- FIG. 11 is a diagram illustrating results obtained by detecting local peaks on the spectrum in which a predetermined time frame of the spectrogram is extracted.
- FIG. 1 illustrates an exemplary configuration of a tonal component detection apparatus 100 .
- the tonal component detection apparatus 100 includes a time-frequency transformation unit 101 , a peak detection unit 102 , a fitting unit 103 , a feature extraction unit 104 , and a scoring unit 105 .
- the time-frequency transformation unit 101 transforms an input time signal f(t) such as voice or music into a time-frequency representation to obtain a time-frequency signal F(n,k).
- t is the discrete time
- n is the time frame number
- k is the discrete frequency.
- the time-frequency conversion unit 101 obtains the time-frequency signal F(n,k) by transforming the input time signal f(t) into a time-frequency representation, for example, using a short-time Fourier transform, as given in the following Equation (1).
- W(t) is the window function
- M is the size of the window function
- R is the frame time interval (hop size).
- the time-frequency signal F(n,k) indicates a logarithmic amplitude value of the frequency component in the time frame n and frequency k, i.e., it is a spectrogram (time-frequency distribution).
- the peak detection unit 102 detects peaks in the frequency direction at each time frame of the spectrogram obtained by the time-frequency transformation unit 101 . Specifically, the peak detection unit 102 detects whether peaks (maximum values) are found in the frequency direction for all of the frames and all of the frequencies on the spectrogram.
- the fitting unit 103 fits a tone model in a neighboring region of each of the peaks detected by the peak detection unit 102 , as described below.
- the fitting unit 103 initially performs a coordinate transformation into a coordinate with a target peak as the origin and then sets up a neighboring time-frequency region as given in the following Equation (3).
- Equation (3) ⁇ N is the neighboring region in the time direction (e.g., three points), and ⁇ k is the neighboring region in the frequency direction (e.g., two points).
- ⁇ [ ⁇ N ⁇ n ⁇ N ] ⁇ [ ⁇ K ⁇ k ⁇ K ] (3)
- the fitting unit 103 fits, for example, a tone model of the quadratic polynomial function as given in the following Equation (4), with respect to the time-frequency signal within the neighboring region.
- the fitting unit 103 performs the fitting, for example, based on the least square error criterion of the tone model and the time-frequency distribution in the vicinity of the peak.
- Y ( k,n ) ak 2 +bk+ckn+dn 2 +en+g (4)
- the fitting unit 103 performs the fitting by obtaining coefficients that minimize the square error as given in the following Equation (5), in the neighboring region of the time-frequency signal and the polynomial function.
- the coefficients are determined as given in the following Equation (6).
- This quadratic polynomial function has the property that it is well fitted in the vicinity of the tonal spectral peak (smaller margin of error) but it is not well fitted in the vicinity of the noise spectral peak (larger margin of error).
- This property of the function is schematically shown in FIG. 2( a ) and FIG. 2( b ).
- FIG. 2( a ) schematically shows the spectrum in the vicinity of the tonal peak of the n-th frame obtained from the above Equation (1).
- FIG. 2( b ) shows how the quadratic function f 0 (k) which is given in the following Equation (7) is fitted to the spectrum shown in FIG. 2( a ).
- Equation (7) a is the peak curvature, k 0 is the true peak frequency, and g 0 is the logarithm amplitude value at a true peak position.
- the quadratic function is well fitted to the tonal component spectral peak, but the margin of error tends to be large in the noise peaks.
- ⁇ 0 ( k ) a ( k ⁇ k 0 ) 2 +g 0 (7)
- FIG. 3( a ) schematically shows the change of the tonal peaks in the time direction.
- the tonal peak has the amplitude and frequency that are being changed while maintaining its overall shape in the previous and subsequent time frames.
- the obtained spectrum is actually formed by discrete points, but the spectrum is drawn with a curved line in the figure for descriptive purposes. Specifically, the dashed line represents the previous frames, the solid line represents the current frames, and the dotted line represents the subsequent frames.
- the tonal components have a certain extent of time continuity and involve some changes in frequency and time, but the tonal components can be represented by the shift of substantially the same form of quadratic function.
- This change Y(k,n) is given by the following Equation (8).
- the spectrum is represented by logarithmic amplitudes, and thus the amplitudes are changed between the top and bottom of the spectrum. This is the reason why the addition of the term f 1 (n) indicating the change in amplitude is necessary.
- ⁇ is the rate of change in amplitude
- f 1 (n) is the time function indicating the change in amplitude at the peak position.
- Equation (9) If f 1 (n) is approximated by the quadratic function in the time direction, the change Y(k,n) is given by the following Equation (9).
- Equation (9) a, k 0 , ⁇ , d 1 , e 1 , and g 0 are constants, and thus Equation (9) will be equivalent to Equation (8) by converting them to appropriate variables.
- FIG. 3( b ) schematically shows a fitting performed in the small region ⁇ on the spectrogram.
- Equation (4) tends to be well fitted for the tonal component, because the tonal peaks with a similar shape are gradually changed over time. However, the shape or frequency of the peaks are varied in the vicinity of the noise peaks, and then Equation (4) is not well fitted. In other words, even when the fitting is performed optimally, the error becomes large.
- Equation (6) shows the calculation in which the fitting is performed for all of the coefficients a, b, c, d, e, and g.
- some of the coefficients may be previously fixed to the constant values and the fitting may be performed on them.
- the fitting may be performed using the quadratic and higher order polynomial function.
- the feature extraction unit 104 extracts the features (x 0 , x 1 , x 2 , x 3 , x 4 , x 5 ) as given in the following Equation (10) based on the results (see Equation (6)) obtained by fitting each of the peaks in the fitting unit 103 .
- Each of the features indicates the property of frequency component in each peak and it can be used in analyzing a voice, music, or the like without any modification.
- the scoring unit 105 obtains scores indicating the tonal component likeness of each peak using the features extracted by the feature extraction unit 104 for each peak in order to quantify the tonal component likeness of each peak.
- the scoring unit 105 obtains the score S(n,k) as given in the following Equation (11) using one or a plurality of features (x 0 , x 1 , x 2 , x 3 , x 4 , x 5 ). In this case, at least the fitting normalization error x 5 or the peak curvature x 0 in the frequency direction is used.
- Equation (11) Sigm(x) is the sigmoid function, w i is the predetermined weighting factor, H i (x i ) is the predetermined non-linear function performed for the i-th feature x i .
- the function as given in the following Equation (12) can be used as the non-linear function H i (x i ).
- u i and v i are the predetermined weighting factors.
- w i , u i , and v i may be previously set to any suitable constants, or alternatively, they may be automatically determined by performing the steepest decent learning procedure or the like using a large amount of data.
- H i ( x i ) Sigm( u i x i +v i ) (12)
- the scoring unit 105 finds the S(n,k) which indicates the tonal component likeness of each peak by using Equation (11). In addition, the scoring unit 105 sets the score S(n,k) in the position (n,k) having no peak to zero. The scoring unit 105 obtains the score S(n,k) indicating the tonal component likeness at each of the times and frequencies of the time-frequency signal f(n,k). The score S(n,k) takes a value between 0 and 1. Then, the scoring unit 105 outputs the obtained score S(n,k) as tonal component detection results.
- the determination can be made using an appropriate threshold S Thsd as given in the following Equation (13).
- An input time signal f(t) such as voice or music is supplied to the time-frequency transformation unit 101 .
- the time-frequency transformation unit 101 transforms the input time signal f(t) into a time-frequency representation to obtain a time-frequency signal F(n,k).
- the time-frequency signal F(n,k) indicates a logarithmic amplitude value of frequency component in the time frame n and frequency k, i.e., it is a spectrogram (time-frequency distribution). This spectrogram is supplied to the peak detection unit 102 .
- the peak detection unit 102 detects whether peaks are found in the frequency direction at all of the frames and all of the frequencies on the spectrogram.
- the peak detection results are supplied to the fitting unit 103 .
- the fitting unit 103 fits a tone model in a neighboring region of the peak for each of the peaks. This fitting allows the coefficients of the quadratic polynomial function constituting the tone model (see Equation (4)) to be obtained so that the square error may be minimized.
- the results obtained by the fitting are supplied to the feature extraction unit 104 .
- the feature extraction unit 104 extracts a various types of features based on the results (see Equation (6)) obtained by fitting each of the peaks in the fitting unit 103 (see Equation (10)). For example, features such as the curvature of peak, the frequency of peak, the logarithmic amplitude value of peak, the rate of change in amplitude, and the fitting normalization error are extracted. The extracted features are supplied to the scoring unit 105 .
- the scoring unit 105 obtains the score S(n,k) indicating the tonal component likeness of each of the peaks using the features (see Equation (11)).
- the score S(n,k) takes a value between 0 and 1.
- the scoring unit 105 outputs the obtained score S(n,k) as tonal component detection results.
- the scoring unit 105 sets the score S(n,k) in the position (n,k) at which there is no peak to zero.
- the tonal component detection apparatus 100 shown in FIG. 1 may be implemented in software as well as hardware.
- computer equipment 200 shown in FIG. 4 can perform the tonal component detection process similar to that described above by causing the computer equipment to perform functions of the respective portions of the tonal component detection apparatus 100 shown in FIG. 1 .
- the computer equipment 200 includes a CPU (Central Processing Unit) 181 , a ROM (Read Only Memory) 182 , a RAM (random Access Memory) 183 , a data input/output unit (data I/O) 184 , and a mHDD (Hard Disk Drive) 185 .
- the ROM 182 stores the processing programs to be performed by the CPU 181 .
- the RAM 181 serves as a work area for the CPU 181 .
- the CPU 181 reads out the processing programs stored in the ROM 182 as necessary, and sends the readout processing programs to the RAM 183 , so that the processing program is loaded in the RAM 183 . Thereafter, the CPU 181 reads out the loaded programs to execute the tonal component detection process.
- the computer equipment 200 receives an input time signal f(t) through the data I/O 184 and accumulates it to the HDD 185 .
- the CPU 181 performs the tonal component detection process on the input time signal f(t) accumulated in the HDD 185 .
- the tonal component detection result S(n,k) is outputted to outside through the data I/O 184 .
- the flowchart of FIG. 5 shows an exemplary procedure of the tonal component detection process performed by the CPU 181 .
- the CPU 181 starts the process, and then the process proceeds to step ST 2 .
- the CPU 181 transforms the input time signal f(t) into a time-frequency representation to obtain a time-frequency signal F(n,k), i.e. a spectrogram (time-frequency distribution).
- step ST 3 the CPU 181 sets the number n of the frame (time frame) to zero. Then, in step ST 4 , the CPU 181 determines whether n ⁇ N. In addition, frames in the spectrogram (time-frequency distribution) are assumed to be between 0 and N ⁇ 1. If it is determined that n is greater than or equal to N (n ⁇ N), then the CPU 181 determines that processes for all of the frames are completed, and terminates the process at step ST 5 .
- step ST 6 If it is determined that n is less than N (n ⁇ N), then the CPU 181 , in step ST 6 , sets the discrete frequency k to zero. In step ST 7 , the CPU 181 determines whether k ⁇ K. In addition, the discrete frequency k of the spectrogram (time-frequency distribution) is assumed to be between 0 and K ⁇ 1. If it is determined that k is greater than or equal to K (k ⁇ K), then the CPU 181 determines that processes for all of the discrete frequencies are completed, and, in step ST 8 , increments the n by 1. Subsequently, the flow returns to step ST 4 , and then a process for the next frame is performed.
- step ST 9 determines whether the F(n,k) is a peak. If the F(n,k) is not a peak, then the CPU 181 , in step ST 10 , sets the score S(n,k) to zero, and then, in step ST 11 , increments the k by 1. Subsequently, the flow returns to step ST 7 , and then a process for the next discrete frequency is performed.
- step ST 9 if it is determined that the F(n,k) is a peak, then the CPU 181 performs a process of step ST 12 .
- step ST 12 the CPU 181 performs a fitting on a tone model in a neighboring region of the peak.
- the CPU 181 in step ST 13 , extracts a various types of features (x 0 , x 1 , x 2 , x 3 , x 4 , x 5 ) based on the results obtained by the fitting.
- step ST 14 obtains the score S(n,k) indicating the tonal component likeness of each of the peaks using the features extracted in step ST 13 .
- the score S(n,k) takes a value between 0 and 1.
- step ST 14 the CPU 181 increments the k by 1 at step ST 11 .
- the flow returns to step ST 7 , and then a process for the next discrete frequency is performed.
- the tonal component detection apparatus 100 shown in FIG. 1 performs a fitting on a tonal mode at a neighboring region of each peak in the frequency direction detected from the time-frequency distribution (spectrogram) F(n,k), and obtains a score S(n,k) indicating the tonal component likeness of each peak based on the results obtained by the fitting. Therefore, the tonal components can be detected accurately. Thus, useful information for many application techniques such as voice analysis, coding, noise reduction, and high-quality sound reproduction can be obtained.
- FIG. 6 illustrates an example of the score S(n,k) indicating the tonal component likeness detected using the method according to the embodiment of the present technology from the input time signal f(t) by which the spectrogram as shown in FIG. 8 is obtained.
- the darker color is displayed as the magnitude of the score S(n,k) becomes larger, thus it will be found that noise peaks are not generally detected, but the peaks of tonal component (component drawn with thick black horizontal lines in FIG. 8 ) are generally detected.
- FIG. 7 illustrates results obtained by detecting the tonal components for the spectrum of FIG. 9 . Many non-tonal peaks are erroneously detected by using the methods of FIGS. 10 and 11 . However, the tonal peaks can be accurately detected by the method according to the embodiment of the present technology.
- the tonal component detection apparatus 100 shown in FIG. 1 can also detect the properties such as a curvature of peak, accurate frequency, accurate amplitude value of peak, rate of change in frequency, and rate of change in amplitude in each time of each tonal component (see Equation (10)). These properties are useful for application techniques such as voice analysis, coding, noise reduction, and high-quality sound reproduction.
- the input time signal is transformed into a time-frequency representation using other transformation techniques such as the wavelet transform.
- the fitting performed using the least square error criterion of the tone model and the time-frequency distribution in the vicinity of each of the detected peaks has been described in the above embodiments, it can be considered that the fitting can be performed using the minimum fourth-power error criterion, the minimum entropy criterion, and so on.
- present technology may also be configured as below.
- a time-frequency transformation unit configured to perform a time-frequency transformation on an input time signal to obtain a time-frequency distribution
- a peak detection unit configured to detect a peak in a frequency direction at a time frame of the time-frequency distribution
- a fitting unit configured to perform fitting on a tone model in a neighboring region of the detected peak
- a scoring unit configured to obtain a score indicating tonal component likeness of the detected peak based on a result obtained by the fitting.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- General Engineering & Computer Science (AREA)
- Complex Calculations (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
Abstract
Description
F(n,k−1)<F(n,k) and F(n,k)>F(n,k+1) (2)
Γ=[−ΔN ≦n≦Δ N]×[−ΔK ≦k≦Δ K] (3)
Y(k,n)=ak 2 +bk+ckn+dn 2 +en+g (4)
ƒ0(k)=a(k−k 0)2 +g 0 (7)
Y(k,n)=ƒ0(k−βn)+ƒ1(n) (8)
H i(x i)=Sigm(u i x i +v i) (12)
- (1) A tonal component detection method including:
- (2) The tonal component detection method according to (1), wherein, in the step of performing the time-frequency transformation, the time-frequency transformation is performed on the input time signal using a short-time Fourier transform.
- (3) The tonal component detection method according to (1) or (2), wherein, in the step of fitting the tone model, a quadratic polynomial function in which a time and a frequency are set as variables is used as the tone model.
- (4) The tonal component detection method according to any one of (1) to (3), wherein, in the step of fitting the tone model, the fitting is performed based on a time-frequency distribution in a vicinity of the detected peak and a least square error criterion of the tone model.
- (5) The tonal component detection method according to any one of (1) to (4), wherein, in the step of obtaining the score, the score indicating the tonal component likeness of the detected peak is obtained using at least a fitting error extracted based on the result obtained by the fitting.
- (6) The tonal component detection method according to any one of (1) to (4), wherein, in the step of obtaining the score, the score indicating the tonal component likeness of the detected peak is obtained using at least a peak curvature in a frequency direction extracted based on the result obtained by the fitting.
- (7) The tonal component detection method according to any one of (1) to (4), wherein, in the step of obtaining the score, the score indicating the tonal component likeness of the detected peak is obtained by extracting a predetermined number of features and by combining the predetermined number of extracted features, based on the result obtained by the fitting.
- (8) The tonal component detection method according to (7), wherein, in the step of obtaining the score, when the predetermined number of extracted features are combined, a non-linear function is applied to the predetermined number of extracted features to obtain a weighted sum.
- (9) The tonal component detection method according to (7) or (8), wherein the predetermined number of features is at least one of a fitting error, a peak curvature in a frequency direction, a frequency of a peak, an amplitude value in a peak position, a rate of a change in a frequency, or a rate of a change in amplitude that are obtained by the tone model on which the fitting is performed.
- (10) A tonal component detection apparatus, including:
- (11) A program for causing a computer to function as:
Claims (11)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2012-078320 | 2012-03-29 | ||
JP2012078320A JP2013205830A (en) | 2012-03-29 | 2012-03-29 | Tonal component detection method, tonal component detection apparatus, and program |
Publications (2)
Publication Number | Publication Date |
---|---|
US20130255473A1 US20130255473A1 (en) | 2013-10-03 |
US8779271B2 true US8779271B2 (en) | 2014-07-15 |
Family
ID=49233121
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/780,179 Expired - Fee Related US8779271B2 (en) | 2012-03-29 | 2013-02-28 | Tonal component detection method, tonal component detection apparatus, and program |
Country Status (2)
Country | Link |
---|---|
US (1) | US8779271B2 (en) |
JP (1) | JP2013205830A (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013205830A (en) * | 2012-03-29 | 2013-10-07 | Sony Corp | Tonal component detection method, tonal component detection apparatus, and program |
US9484044B1 (en) | 2013-07-17 | 2016-11-01 | Knuedge Incorporated | Voice enhancement and/or speech features extraction on noisy audio signals using successively refined transforms |
US9530434B1 (en) * | 2013-07-18 | 2016-12-27 | Knuedge Incorporated | Reducing octave errors during pitch determination for noisy audio signals |
US9208794B1 (en) | 2013-08-07 | 2015-12-08 | The Intellisis Corporation | Providing sound models of an input signal using continuous and/or linear fitting |
CN106991852B (en) * | 2017-05-18 | 2020-11-24 | 北京音悦荚科技有限责任公司 | Online teaching method and device |
US11501102B2 (en) * | 2019-11-21 | 2022-11-15 | Adobe Inc. | Automated sound matching within an audio recording |
US11461649B2 (en) * | 2020-03-19 | 2022-10-04 | Adobe Inc. | Searching for music |
Citations (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5229716A (en) * | 1989-03-22 | 1993-07-20 | Institut National De La Sante Et De La Recherche Medicale | Process and device for real-time spectral analysis of complex unsteady signals |
US20020138795A1 (en) * | 2001-01-24 | 2002-09-26 | Nokia Corporation | System and method for error concealment in digital audio transmission |
US20020143530A1 (en) * | 2000-11-03 | 2002-10-03 | International Business Machines Corporation | Feature-based audio content identification |
US20020181711A1 (en) * | 2000-11-02 | 2002-12-05 | Compaq Information Technologies Group, L.P. | Music similarity function based on signal analysis |
US6542869B1 (en) * | 2000-05-11 | 2003-04-01 | Fuji Xerox Co., Ltd. | Method for automatic analysis of audio including music and speech |
US20040165736A1 (en) * | 2003-02-21 | 2004-08-26 | Phil Hetherington | Method and apparatus for suppressing wind noise |
US20040211260A1 (en) * | 2003-04-28 | 2004-10-28 | Doron Girmonsky | Methods and devices for determining the resonance frequency of passive mechanical resonators |
US20040260540A1 (en) * | 2003-06-20 | 2004-12-23 | Tong Zhang | System and method for spectrogram analysis of an audio signal |
US20050177372A1 (en) * | 2002-04-25 | 2005-08-11 | Wang Avery L. | Robust and invariant audio pattern matching |
US20060095254A1 (en) * | 2004-10-29 | 2006-05-04 | Walker John Q Ii | Methods, systems and computer program products for detecting musical notes in an audio signal |
US20060229878A1 (en) * | 2003-05-27 | 2006-10-12 | Eric Scheirer | Waveform recognition method and apparatus |
US20070010999A1 (en) * | 2005-05-27 | 2007-01-11 | David Klein | Systems and methods for audio signal analysis and modification |
US7276656B2 (en) * | 2004-03-31 | 2007-10-02 | Ulead Systems, Inc. | Method for music analysis |
US20080133223A1 (en) * | 2006-12-04 | 2008-06-05 | Samsung Electronics Co., Ltd. | Method and apparatus to extract important frequency component of audio signal and method and apparatus to encode and/or decode audio signal using the same |
US20080148924A1 (en) * | 2000-03-13 | 2008-06-26 | Perception Digital Technology (Bvi) Limited | Melody retrieval system |
US20090125298A1 (en) * | 2007-11-02 | 2009-05-14 | Melodis Inc. | Vibrato detection modules in a system for automatic transcription of sung or hummed melodies |
US20090282966A1 (en) * | 2004-10-29 | 2009-11-19 | Walker Ii John Q | Methods, systems and computer program products for regenerating audio performances |
US20110015931A1 (en) * | 2007-07-18 | 2011-01-20 | Hideki Kawahara | Periodic signal processing method,periodic signal conversion method,periodic signal processing device, and periodic signal analysis method |
US20110071824A1 (en) * | 2009-09-23 | 2011-03-24 | Carol Espy-Wilson | Systems and Methods for Multiple Pitch Tracking |
US7978862B2 (en) * | 2002-02-01 | 2011-07-12 | Cedar Audio Limited | Method and apparatus for audio signal processing |
US20110194702A1 (en) * | 2009-10-15 | 2011-08-11 | Huawei Technologies Co., Ltd. | Method and Apparatus for Detecting Audio Signals |
US20110243349A1 (en) * | 2010-03-30 | 2011-10-06 | Cambridge Silicon Radio Limited | Noise Estimation |
US20120046771A1 (en) * | 2009-02-17 | 2012-02-23 | Kyoto University | Music audio signal generating system |
US20120067196A1 (en) * | 2009-06-02 | 2012-03-22 | Indian Institute of Technology Autonomous Research and Educational Institution | System and method for scoring a singing voice |
US20120103166A1 (en) * | 2010-10-29 | 2012-05-03 | Takashi Shibuya | Signal Processing Device, Signal Processing Method, and Program |
US20120157857A1 (en) * | 2010-12-15 | 2012-06-21 | Sony Corporation | Respiratory signal processing apparatus, respiratory signal processing method, and program |
US20120197420A1 (en) * | 2011-01-28 | 2012-08-02 | Toshiyuki Kumakura | Signal processing device, signal processing method, and program |
US8255214B2 (en) * | 2001-10-22 | 2012-08-28 | Sony Corporation | Signal processing method and processor |
US20120243705A1 (en) * | 2011-03-25 | 2012-09-27 | The Intellisis Corporation | Systems And Methods For Reconstructing An Audio Signal From Transformed Audio Information |
US20120266742A1 (en) * | 2011-04-19 | 2012-10-25 | Keisuke Touyama | Music section detecting apparatus and method, program, recording medium, and music signal detecting apparatus |
US20120266743A1 (en) * | 2011-04-19 | 2012-10-25 | Takashi Shibuya | Music search apparatus and method, program, and recording medium |
US20130255473A1 (en) * | 2012-03-29 | 2013-10-03 | Sony Corporation | Tonal component detection method, tonal component detection apparatus, and program |
US20130282373A1 (en) * | 2012-04-23 | 2013-10-24 | Qualcomm Incorporated | Systems and methods for audio signal processing |
US8588427B2 (en) * | 2007-09-26 | 2013-11-19 | Frauhnhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for extracting an ambient signal in an apparatus and method for obtaining weighting coefficients for extracting an ambient signal and computer program |
-
2012
- 2012-03-29 JP JP2012078320A patent/JP2013205830A/en active Pending
-
2013
- 2013-02-28 US US13/780,179 patent/US8779271B2/en not_active Expired - Fee Related
Patent Citations (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5229716A (en) * | 1989-03-22 | 1993-07-20 | Institut National De La Sante Et De La Recherche Medicale | Process and device for real-time spectral analysis of complex unsteady signals |
US20080148924A1 (en) * | 2000-03-13 | 2008-06-26 | Perception Digital Technology (Bvi) Limited | Melody retrieval system |
US6542869B1 (en) * | 2000-05-11 | 2003-04-01 | Fuji Xerox Co., Ltd. | Method for automatic analysis of audio including music and speech |
US20020181711A1 (en) * | 2000-11-02 | 2002-12-05 | Compaq Information Technologies Group, L.P. | Music similarity function based on signal analysis |
US20020143530A1 (en) * | 2000-11-03 | 2002-10-03 | International Business Machines Corporation | Feature-based audio content identification |
US6604072B2 (en) * | 2000-11-03 | 2003-08-05 | International Business Machines Corporation | Feature-based audio content identification |
US20020138795A1 (en) * | 2001-01-24 | 2002-09-26 | Nokia Corporation | System and method for error concealment in digital audio transmission |
US8255214B2 (en) * | 2001-10-22 | 2012-08-28 | Sony Corporation | Signal processing method and processor |
US20110235823A1 (en) * | 2002-02-01 | 2011-09-29 | Cedar Audio Limited | Method and apparatus for audio signal processing |
US7978862B2 (en) * | 2002-02-01 | 2011-07-12 | Cedar Audio Limited | Method and apparatus for audio signal processing |
US20050177372A1 (en) * | 2002-04-25 | 2005-08-11 | Wang Avery L. | Robust and invariant audio pattern matching |
US7627477B2 (en) * | 2002-04-25 | 2009-12-01 | Landmark Digital Services, Llc | Robust and invariant audio pattern matching |
US20090265174A9 (en) * | 2002-04-25 | 2009-10-22 | Wang Avery L | Robust and invariant audio pattern matching |
US20040165736A1 (en) * | 2003-02-21 | 2004-08-26 | Phil Hetherington | Method and apparatus for suppressing wind noise |
US20110123044A1 (en) * | 2003-02-21 | 2011-05-26 | Qnx Software Systems Co. | Method and Apparatus for Suppressing Wind Noise |
US20040211260A1 (en) * | 2003-04-28 | 2004-10-28 | Doron Girmonsky | Methods and devices for determining the resonance frequency of passive mechanical resonators |
US20060229878A1 (en) * | 2003-05-27 | 2006-10-12 | Eric Scheirer | Waveform recognition method and apparatus |
US20040260540A1 (en) * | 2003-06-20 | 2004-12-23 | Tong Zhang | System and method for spectrogram analysis of an audio signal |
US7276656B2 (en) * | 2004-03-31 | 2007-10-02 | Ulead Systems, Inc. | Method for music analysis |
US20060095254A1 (en) * | 2004-10-29 | 2006-05-04 | Walker John Q Ii | Methods, systems and computer program products for detecting musical notes in an audio signal |
US20100000395A1 (en) * | 2004-10-29 | 2010-01-07 | Walker Ii John Q | Methods, Systems and Computer Program Products for Detecting Musical Notes in an Audio Signal |
US7598447B2 (en) * | 2004-10-29 | 2009-10-06 | Zenph Studios, Inc. | Methods, systems and computer program products for detecting musical notes in an audio signal |
US20090282966A1 (en) * | 2004-10-29 | 2009-11-19 | Walker Ii John Q | Methods, systems and computer program products for regenerating audio performances |
US20070010999A1 (en) * | 2005-05-27 | 2007-01-11 | David Klein | Systems and methods for audio signal analysis and modification |
US8315857B2 (en) * | 2005-05-27 | 2012-11-20 | Audience, Inc. | Systems and methods for audio signal analysis and modification |
US20080133223A1 (en) * | 2006-12-04 | 2008-06-05 | Samsung Electronics Co., Ltd. | Method and apparatus to extract important frequency component of audio signal and method and apparatus to encode and/or decode audio signal using the same |
US20110015931A1 (en) * | 2007-07-18 | 2011-01-20 | Hideki Kawahara | Periodic signal processing method,periodic signal conversion method,periodic signal processing device, and periodic signal analysis method |
US8588427B2 (en) * | 2007-09-26 | 2013-11-19 | Frauhnhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for extracting an ambient signal in an apparatus and method for obtaining weighting coefficients for extracting an ambient signal and computer program |
US20090125298A1 (en) * | 2007-11-02 | 2009-05-14 | Melodis Inc. | Vibrato detection modules in a system for automatic transcription of sung or hummed melodies |
US20120046771A1 (en) * | 2009-02-17 | 2012-02-23 | Kyoto University | Music audio signal generating system |
US20120067196A1 (en) * | 2009-06-02 | 2012-03-22 | Indian Institute of Technology Autonomous Research and Educational Institution | System and method for scoring a singing voice |
US20110071824A1 (en) * | 2009-09-23 | 2011-03-24 | Carol Espy-Wilson | Systems and Methods for Multiple Pitch Tracking |
US8116463B2 (en) * | 2009-10-15 | 2012-02-14 | Huawei Technologies Co., Ltd. | Method and apparatus for detecting audio signals |
US20110194702A1 (en) * | 2009-10-15 | 2011-08-11 | Huawei Technologies Co., Ltd. | Method and Apparatus for Detecting Audio Signals |
US20110243349A1 (en) * | 2010-03-30 | 2011-10-06 | Cambridge Silicon Radio Limited | Noise Estimation |
US20120103166A1 (en) * | 2010-10-29 | 2012-05-03 | Takashi Shibuya | Signal Processing Device, Signal Processing Method, and Program |
US20120157857A1 (en) * | 2010-12-15 | 2012-06-21 | Sony Corporation | Respiratory signal processing apparatus, respiratory signal processing method, and program |
US20120197420A1 (en) * | 2011-01-28 | 2012-08-02 | Toshiyuki Kumakura | Signal processing device, signal processing method, and program |
US20120243705A1 (en) * | 2011-03-25 | 2012-09-27 | The Intellisis Corporation | Systems And Methods For Reconstructing An Audio Signal From Transformed Audio Information |
US20120266742A1 (en) * | 2011-04-19 | 2012-10-25 | Keisuke Touyama | Music section detecting apparatus and method, program, recording medium, and music signal detecting apparatus |
US20120266743A1 (en) * | 2011-04-19 | 2012-10-25 | Takashi Shibuya | Music search apparatus and method, program, and recording medium |
US20130255473A1 (en) * | 2012-03-29 | 2013-10-03 | Sony Corporation | Tonal component detection method, tonal component detection apparatus, and program |
US20130282373A1 (en) * | 2012-04-23 | 2013-10-24 | Qualcomm Incorporated | Systems and methods for audio signal processing |
Non-Patent Citations (2)
Title |
---|
McAulay, R.J., et al. "Speech Analysis/Synthesis Based on Sinusoidal Representation" IEEE Transactions on Acoustics, Speech and Signal Processing, Aug. 4, 1986, pp. 744-754. |
Smith, Julius O., et al. "Parshl: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on Sinusoidal Representation" Center for Computer Research in Music and Acoustics, Department of Music, Stanford University, Stanford, CA, 1987, pp. 1-23. |
Also Published As
Publication number | Publication date |
---|---|
US20130255473A1 (en) | 2013-10-03 |
JP2013205830A (en) | 2013-10-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8779271B2 (en) | Tonal component detection method, tonal component detection apparatus, and program | |
US7822600B2 (en) | Method and apparatus for extracting pitch information from audio signal using morphology | |
US10014005B2 (en) | Harmonicity estimation, audio classification, pitch determination and noise estimation | |
JP5732994B2 (en) | Music searching apparatus and method, program, and recording medium | |
US8831942B1 (en) | System and method for pitch based gender identification with suspicious speaker detection | |
US20060253285A1 (en) | Method and apparatus using spectral addition for speaker recognition | |
US20140177853A1 (en) | Sound processing device, sound processing method, and program | |
US20230402048A1 (en) | Method and Apparatus for Detecting Correctness of Pitch Period | |
EP2927906B1 (en) | Method and apparatus for detecting voice signal | |
US7860708B2 (en) | Apparatus and method for extracting pitch information from speech signal | |
US8532986B2 (en) | Speech signal evaluation apparatus, storage medium storing speech signal evaluation program, and speech signal evaluation method | |
JP6439682B2 (en) | Signal processing apparatus, signal processing method, and signal processing program | |
CN103165127A (en) | Sound segmentation equipment, sound segmentation method and sound detecting system | |
Khadem-hosseini et al. | Error correction in pitch detection using a deep learning based classification | |
Nongpiur et al. | Impulse-noise suppression in speech using the stationary wavelet transform | |
Yarra et al. | A mode-shape classification technique for robust speech rate estimation and syllable nuclei detection | |
CN104036785A (en) | Speech signal processing method, speech signal processing device and speech signal analyzing system | |
JP6157926B2 (en) | Audio processing apparatus, method and program | |
US8103512B2 (en) | Method and system for aligning windows to extract peak feature from a voice signal | |
US9398387B2 (en) | Sound processing device, sound processing method, and program | |
JP7152112B2 (en) | Signal processing device, signal processing method and signal processing program | |
US10381023B2 (en) | Speech evaluation apparatus and speech evaluation method | |
JP2009086476A (en) | Speech processing device, speech processing method and program | |
CN120030424A (en) | Tunnel lining void detection method and related equipment based on improved autoencoder combined with single-classification support vector machine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SONY CORPORATION, JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ABE, MOTOTSUGU;NISHIGUCHI, MASAYUKI;SIGNING DATES FROM 20130225 TO 20130226;REEL/FRAME:029896/0468 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551) Year of fee payment: 4 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20220715 |