CN115472143A - Tonal music note starting point detection and note decoding method and device - Google Patents
Tonal music note starting point detection and note decoding method and device Download PDFInfo
- Publication number
- CN115472143A CN115472143A CN202211110245.0A CN202211110245A CN115472143A CN 115472143 A CN115472143 A CN 115472143A CN 202211110245 A CN202211110245 A CN 202211110245A CN 115472143 A CN115472143 A CN 115472143A
- Authority
- CN
- China
- Prior art keywords
- note
- frequency
- music
- decoding
- tonal music
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000001228 spectrum Methods 0.000 claims abstract description 33
- 230000006870 function Effects 0.000 claims abstract description 11
- 230000003595 spectral effect Effects 0.000 description 22
- 238000000605 extraction Methods 0.000 description 9
- 238000004458 analytical method Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 238000005070 sampling Methods 0.000 description 7
- 238000013461 design Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 230000033764 rhythmic process Effects 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000004907 flux Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000005236 sound signal Effects 0.000 description 3
- 230000007704 transition Effects 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000000877 morphologic effect Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 240000001879 Digitalis lutea Species 0.000 description 1
- 102100032352 Leukemia inhibitory factor Human genes 0.000 description 1
- 108090000581 Leukemia inhibitory factor Proteins 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000003825 pressing Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 125000000391 vinyl group Chemical group [H]C([*])=C([H])[H] 0.000 description 1
- 229920002554 vinyl polymer Polymers 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
- G10H1/0016—Means for indicating which keys, frets or strings are to be actuated, e.g. using lights or leds
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/02—Means for controlling the tone frequencies, e.g. attack or decay; Means for producing special musical effects, e.g. vibratos or glissandos
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/36—Accompaniment arrangements
- G10H1/40—Rhythm
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/051—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or detection of onsets of musical sounds or notes, i.e. note attack timings
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
The invention discloses a method and a device for detecting a note starting point and decoding notes of tonal music, wherein the method comprises the following steps: drawing eta (m) as an end point detection function curve, and detecting an extreme point of the curve as the starting time position of each tonal music note; successively at intermediate positions of the start times of adjacent notesSearching out the corresponding FFT spectrumPeak spectrum number k of p And calculates the corresponding pitch frequency k p Δ f; the pitch frequency is compared to the reference frequencies of the individual notes of the 12-equal temperament note-pitch chartAnd comparing, and finding out the note with the minimum frequency difference as the note decoding result of the current time period. The device comprises: a processor and a memory.
Description
Technical Field
The invention relates to the field of music information retrieval, relates to the technical field of signal analysis and processing, and particularly relates to a tonal music note starting point detection and note decoding method and device.
Background
Music is onThe color ink is ubiquitous in life and is a very thick ink heavy color in a long river which is developed by human history. In recent years, with the rapid development of internet technology, music has spread more widely, and audio compression technology represented by MP3 has started to be applied in a large scale, which has caused music media such as conventional vinyl record, magnetic tape, etc. to almost disappear, and digital music has been transmitted, downloaded, and listened to over the internet instead. In the face of massive digital Music, how to effectively extract, retrieve and organize Music Information is widely concerned by academic and Information circles, so that Music Information Retrieval (MIR) topic is generated [1] . Compared with the monotone music, the tonal music has a tone as a center around which the formation of the chord and the progression of the tune are performed [2] . Music in this mode is composed of a series of beats, and has a strong sense of direction. In this music analysis, one of the most basic tasks is note onset detection [3][4] 。
Obviously, the starting point detection is a precondition for note decoding, and is also a fundamental subject for solving other various MIR problems. For example, the beat point is generally accompanied with the starting point of the note, and further research on rhythm analysis and beat tracking can be performed through the information of the starting point of the note; for melody searching subject, under the condition that there is note starting point information, the frame overlapping rate can be reduced, and the retrieval speed is improved; for the problems of fundamental frequency estimation and pitch identification, the method is also carried out on the basis of correctly detecting the note starting point; in addition, the detection of the onset can also help to improve Automated Music Transcription (AMT). In general, the core problem to be solved by the note Onset Detection method is the design of the Onset Detection Function (ODF). The audio signal may be converted to a start detection function that should have a fairly low value (close to zero) most of the time, but a distinct peak at the start time.
With the prevalence and development of deep learning, more and more initial point detection researches start from a learning network to optimize and improve related algorithms. Exemplary Hi-learning based toolThe initial point detection process comprises the following steps: framing, feature extraction, network structure and peak selection based on ODF generation [5-7] . Jan Schl ü ter et al [8] ODF is learned from the input feature vector using CNN (convolutional neural network) network to detect the starting point. Erik Marchi et al [9] And respectively carrying out wavelet transformation and short-time Fourier transformation on the audio, and inputting the audio into an RNN (recurrent neural network) network and an LSTM (long-short term memory) network for note starting point detection. Peter Steiner et al [10] An Echo State network (Echo State Networks, ESNs) is introduced [11] To learn ODF and propose a new stacked ESN algorithm based thereon.
The study of deep learning networks typically ignores the characteristics of the music signal itself. Since the network is composed of a large number of layers, it is difficult to determine the roles of these layers individually, thus preventing generalization of the detection of other types of music onsets. For example: the minority nationality music with extremely special morphological characteristics is not suitable for the minority nationality music in China because the audio morphological characteristic identification and retrieval method based on the western music harmony and structure system is widely adopted in the field of the current world music calculation.
Meanwhile, in the algorithm flow of deep learning, both feature extraction and ODF generation are complicated. In particular, feature extraction involves short-time fourier transforms, filter banks, spectral flux extraction, feature vector construction, and the like. More complicated than this, to obtain the optimal ODF, a series of learning networks, such as sample labeling and training, weight update operations based on error propagation, pattern testing, etc., are required to explore the relationship between the feature vectors and the starting time points, and although these operations may result in high detection accuracy, they are inefficient and consume a large amount of computational resources.
Reference to the literature
[1]Fingerhut M.Music Information Retrieval,or how to search for(and maybe find)music and do away with incipits[J].International Association of Music Libraries International Association of Sound&Audiovisual Archives Congress,2004.
[2] Construction of the concept of Munberg music by old soldiers [ J ] Chinese music 2006 (1): 2.
[3]Schreiber H,Weis C,Muller M.Local Key Estimation In Classical Music Recordings:A Cross-Version Study on Schubert's Winterreise[C]//ICASSP 2020-2020IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2020.
[4]Dixon,Simon.Evaluation of the Audio Beat Tracking System BeatRoot[J].Journal of New Music Research,2007,36(1):39-50.
[5]Grosche P,MüM,ller.Extracting Predominant Local Pulse Information From Music Recordings[J].IEEE Transactions on Audio,Speech,and Language Processing,2011.
[6]Percival G,Tzanetakis G.Streamlined Tempo Estimation Based on Autocorrelation and Cross-correlation With Pulses[J].IEEE/ACM Transactions on Audio Speech&Language Processing,2014,22(12):1765-1776.
[7]Steiner P,Jalalvand A,Stone S,et al.Feature Engineering and Stacked Echo State Networks for Musical Onset Detection[C]//ICPR 2020.2021.
[8]Schlüter J,S.Improved musical onset detection with Convolutional Neural Networks[C]//IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2014:6979-6983.
[9]Marchi E,Ferroni G,Eyben F,et al.Audio Onset Detection:A Wavelet Packet Based Approach with Recurrent Neural Networks[C]//2014International Joint Conference on Neural Networks(IJCNN).IEEE,2014.
[10]Steiner P,Jalalvand A,Stone S,et al.Feature Engineering and Stacked Echo State Networks for Musical Onset Detection[C]//ICPR 2020.2021.
[11]Steiner P,Stone S,Birkholz P.Note Onset Detection using Echo State Networks[C]//Elektronische Sprachsignalverarbeitung(ESSV)2020.2020.
[12]Herremans D,Chew E.MorpheuS:generating structured music with constrained patterns and tension[J].IEEE Transactions on Affective Computing,2017:1-1.
Disclosure of Invention
The invention provides a device for detecting and decoding note starting points of tonal music, which realizes the detection of the starting time of notes and the decoding of the notes; the invention explores the internal relation between the starting time point and the tonal rhythm from the characteristics of the audio signal to obtain an ODF curve, so that the precision and the accuracy are improved, and simultaneously, the model is more universal and is described in detail as follows:
a tonal music note onset detection and note decoding method, the method comprising:
drawing eta (m) as an end point detection function curve, and detecting an extreme point of the curve as the starting time position of each tonal music note;
successively at intermediate positions of the start times of adjacent notesSearching out the corresponding FFT spectrumPeak spectrum number k of p And calculating the corresponding pitch frequency k p Δf;
Comparing the pitch frequency with the reference frequency of each note in the 12 equal temperament note-pitch table, and finding out the note with the smallest frequency difference as the note decoding result of the current time period.
The method models the tonal music into a segmented harmonic model, extends the model into the actual tonal music and is represented as follows:
wherein, { A k ,f k ,θ k },k=1,...,K 1 And are andrepresenting the amplitude, frequency and initial phase of the previous beat and the next beat, respectively.
Wherein the time period is at least twice the FFT size, N T >2N。
Further, the drawing η (m) as an endpoint detection function curve specifically includes:
the frequency range corresponds to the index range of the FFT as follows:
k∈[k min ,k max ],k min =f min /Δf,k max =f max /Δf
when the observation window is moved to the position of m Δ t, it corresponds to the windowed FFT spectrum X (m, k) extracted from the STFT time-frequency distribution, where the peaks are represented as:
the index is defined as:
wherein "card (-) means forAn operator for statistical counting, wherein alpha is a designated threshold ratio; using this metric, an ODF curve can be drawn.
Comparing the pitch frequency with the reference frequency of each note in the 12-equal law note-pitch table, and finding the note with the minimum frequency difference as the note decoding result of the current time period specifically comprises:
suppose that the start time t of the m-th note has been detected m And the start time t of the (m + 1) th note m+1 An intermediate time (t) m +t m+1 ) Per 2 is considered as the reliability point for realizing music decoding, corresponding to STFT index of:
Wherein [ ·]"represents the integer rounding operator, the windowed FFT spectrumAs a peak, and find the peak spectrum index as:
the melody frequency of this dominant frequency spectrum is estimated as:
and finally, finding out the closest frequency by searching a piano pitch table so as to judge which piano key is pressed.
An apparatus for tonal music note onset detection and note decoding, the apparatus comprising: a processor and a memory, the processor and the memory having stored therein program instructions, the processor invoking the program instructions stored in the memory to cause the apparatus to perform any of the method steps described.
The technical scheme provided by the invention has the beneficial effects that:
1. the invention realizes the note decoding and the note starting time detection, provides a direction for the research breakthrough point of music information retrieval, namely, the method does not need to be applied to a simple deep learning algorithm, but turns the attention to the special physical characteristics of time domain, frequency domain, phase and the like of the music, and combines the two to obtain more breakthroughs;
2. the method takes note starting point detection as one of basic research directions of music information retrieval, and lays a foundation for advanced tasks such as rhythm analysis, beat tracking, fundamental frequency estimation, pitch identification and the like.
Drawings
FIG. 1 is a flow chart of a note onset detection and note decoding apparatus for tonal music, i.e., a schematic diagram of the onset detection and note decoding of notes;
FIG. 2 is a schematic diagram of a simple waveform of a harmonic model of the pitch segment;
FIG. 3 is a representation of spectral leakage for different segments of a regularization segmentation model;
FIG. 4 is a graph showing the results of the initial time measurement;
wherein, (a) is STFT time-frequency energy diagram; and (b) is an ODF curve based on the spectral leakage characteristics.
FIG. 5 is a chart of chords and melodic notes for the music "cradle music";
FIG. 6 is a diagram illustrating the note decoding result;
FIG. 7 is a diagram of a hardware implementation of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
The complicated technique changes the accuracy of the detection of the starting point, so that the nature of the starting point needs to be known. In this respect, the embodiment of the present invention attempts to essentially explore the intrinsic relationship between the starting time point and the key tempo, and design a definite processing operation according to the theoretical knowledge of signal processing. The signal processing method does not involve complex operations resulting from network training and optimization, resulting in higher efficiency and consuming fewer computational resources. It should be noted that signal processing to emphasize spectral flux is critical in terms of ODF design [3,4,12] . However, if some spectral parameters are not properly set or the viewing angle is not proper, it is difficult to find the change law of the spectral flux.
The embodiment of the invention provides a starting time detector based on spectrum leakage characteristic extraction, which can decode notes on the basis, and has the main novelty of three aspects:
(1) The embodiment of the invention provides a segmental harmonic model of tonal music; (2) The embodiment of the invention defines a good measurement for evaluating the spectrum leakage degree of different positions of the music piece, and an ODF curve can be drawn to capture the start time of the beat through the peak value; (3) The embodiment of the invention provides a note decoding method.
The above contributions are based on a profound understanding of the spectrum of the modulated music. Both theoretical analysis and experimental results will verify the high accuracy and efficiency of the scheme.
Example 1
The embodiment of the invention is based on the proposed tonal music segmented harmonic model, and finds that the spectrum leakage effect at the boundary of adjacent notes becomes very serious, so that the onset of notes is successfully detected. In order to obtain a good evaluation of the effect, the embodiment of the present invention also derives a series of parameters (including down-sampling coefficients, FFT size, step size, etc.) of the tonal music. Under the condition that the conditions are met, a good measure reflecting the spectrum leakage degree is defined, and an intuitive note starting point detection function curve can be drawn, so that the starting position is searched through peak value selection. The note starting point detection result is combined with a standard piano pitch table, and final notes can be decoded one by one through simple similar frequency matching. Experiments prove that the scheme has higher precision in both detection tasks and decoding tasks.
1. General flow for embodiments of the invention
The specific flow of the tonal music note starting point detection and note decoding device based on spectrum leakage feature extraction provided by the embodiment of the invention is shown in fig. 1:
the method comprises the following specific steps:
input: tonal music sample, lower limit of detection frequency f min =210Hz,f max =840Hz, line threshold ratio α =0.1, 12 equal law note-pitch table.
Step 1: down-sampling the input audio to reduce its sampling rate to F s =6300samples/s;
Step 2: sliding FFT analysis is carried out on the down-sampled audio sequence by using a window with the length of 1024 Hanning (corresponding to the frequency resolution delta f)=f s N =6.15 Hz), the sliding step is 1 sample point, and a short-time Fourier transform (STFT) time-frequency spectrogram | X (m, k) | (m is a time index and k is a frequency index) is generated as the window covers the entire audio sequence;
step 3: for each time instant m, atSearching in maximum value X of | X (m, k) | max And at k ∈ [ k ] min ,k max ]Internally counting that the FFT amplitude spectrum is larger than alpha X max Of the spectral lineAnd calculating the number of the spectral lines and the percentage eta (m) of the total number of the spectral lines;
step 4: drawing eta (m) as an endpoint Detection Function curve (ODF) by using the time m, and detecting an extreme point of the curve as the initial time position of each note;
step 5: successively at intermediate positions of the start times of adjacent notesSearching out the corresponding FFT spectrumPeak spectrum number k of p And calculating the corresponding pitch frequency k p Δf;
Step 6: will k p Δ f is compared with the reference frequencies of the respective notes (e.g., "Do" notes) of the 12-equal temperament note-pitch table, and the note with the smallest frequency difference is found as the note decoding result of the current time period.
In the above steps, all note starting point detection is completed by stopping at Step 4, and all note decoding is completed by stopping at Step 6. It is emphasized that the above operations are all based on the spectrum leakage degree of the tonal music, and the technical principle thereof is detailed as follows.
2. Technical principle of the invention
2.1 segmental harmonic model of tonal music
The embodiment of the invention models the tonal music into a segmented harmonic model. Specifically, a piece of music consists of a series of beats, each of which differs in duration and harmonic component.
To elaborate the piecewise harmonic model, we take the signal x (t) with time 3.0938 seconds as an example, and the formula is:
in the formula (2), the first half of time of x (t) is frequency f 1 Tone of =7.1Hz, the second half of the time being at frequency f 2 Tone of =16.3 Hz. The waveform of x (t) is shown in fig. 2.
As shown in FIG. 2, embodiments of the present invention utilize a sampling rate f s =64samples/s, 3 fragments with the same time length (1 second) are extracted from x (t), t e [ 0.04696, 1 ], t e [ 1.04696, 2.0313s, respectively],t∈[2.0313s,3.0313s]. In addition, these three segments are fast Fourier transformed using a 64-point Hanning window, whose magnitude spectrum | X 1 (k)|,|X 2 (k)|,|X 3 (k) Shown in fig. 3.
From fig. 3, the 3 spectra | X can be found 1 (k)|,|X 2 (k)|,|X 3 (k) L exhibit different degrees of spectral leakage. Specifically, for the first and third parts, both of which contain only a single pure tone, the degree of spectral leakage is small, and the spectral energy is mainly concentrated in a narrow region centered at the peak (k =6 and k =16, respectively). However, for the second part, the spectral leakage is very severe, showing that the side lobes are spread over a wide area around the two peaks, since this part is not pure and is mixed with two tonalities. In addition, a peak | X can be found 2 (k) The height of | is less than | X 1 (k)|,|X 3 (k) The height of the peak in | reflects the height of the peak that spectral leakage would attenuate.
The above example is a simplified description of the modulation music. The embodiment of the present invention considers that the signal model in equation (2) can be extended to the actual tonal music, and can be expressed as (for the area covering the adjacent beats):
wherein, { A k ,f k ,θ k },k=1,...,K 1 And are each selected fromRepresenting the amplitude, frequency and initial phase of the previous beat and the next beat, respectively.
As will be described in detail later, a high degree of spectral leakage also implies a transition from one musical note to another.
2.2 STFT parameter setting conditions for Start Point detection
Parameters related to STFT include FFT size N, window type and step size Δ d. Furthermore, as described above, in order to reduce the computational complexity, it is necessary to down-sample the input music recordings. For a song of a piece of tonal music, the individual beats usually fluctuate around the period T, which can be roughly known in advance. At the same time, in order to implement note decoding, it is necessary to associate these STFT parameters with the pitch table.
Assume an original sampling rate of F s (usually F) s =44.1 ksamples/s), the downsampling factor is D. Thus, the sampling rate is:
f s =F s /D (4)
the amount of samples in a beat period is therefore:
N T =Tf s =TF s /D (5)
therefore, the frequency resolution Δ f of the FFT is equal to:
Δf=f s /N=F s /(DN) (6)
to ensure that the FFT window can fit within a complete beat time period without covering any region of the previous beat duration and the next beat duration, the time period should be at least twice the FFT size, i.e. the following inequality should hold:
N T >2N (7)
furthermore, in order to be able to spectrally distinguish the notes of a piano, the minimum spacing of the pitch-frequency table (i.e. the O4, O5 regions listed in the table) should exceed twice the frequency resolution. As shown in Table 1, the minimum interval is equal to the sum of notes A and A b B (i.e., 233.082Hz-220hz = 13.082hz), the following inequality should also hold:
Δf<13.082/2=6.541 (8)
TABLE 1 Piano Pitch and corresponding frequency Table
In consideration of the parameter conditions (4) to (8), the embodiment of the present invention sets the STFT parameter to:
downsampling factor D =7 and FFT size N =1024. Therefore, with equation (4), the sampling rate is reduced to f s =F s /D =44100/7=6300samples/s. Substituting a conventional beat period value T =0.41s into equation (5) yields the amount of samples N within one beat period T =Tf s 2583, then, considering that the FFT size should be an integer power of 2, a feasible FFT size can be determined from the inequality in equation (7) and N =1024, from which the frequency resolution can be calculated as:
Δf=f s /N=6.1523Hz
the step size of STFT is designated Δ d = N/16=64, and the step size Δ t = Δ d/f s =0.0102 corresponds, indicating a sufficiently high tracking accuracy. In addition, a Hanning window may be selected as the commonly used windowType (b).
Note that Δ f also satisfies inequality (8), which demonstrates that all of the above STFT-related parameters are suitable for note onset detection.
2.3 ODF construction based on spectral leakage feature extraction
Essentially, the start time detection function is constructed to define an indicator for evaluating the extent of spectral leakage. In particular, as the observation time window moves, this index should be small when the observation time window completely enters the beat duration without overlapping any portion of the adjacent beat duration. Thereafter, as the viewing window continues to move forward, the overlap must increase, which also results in an increase in the metric. However, as it gradually goes to the next beat, the index also tends to decrease. While the variation as such is periodic.
In practice, to quantitatively compute the metric, a frequency range f e f should be given min ,f max ]The frequency range corresponds to the index range of FFT as follows:
k∈[k min ,k max ],k min =f min /Δf,k max =f max /Δf (9)
furthermore, when the observation window is moved to the position of m Δ t, it corresponds to the windowed FFT spectrum X (m, k) extracted from the STFT time-frequency distribution, where the peaks are represented as:
the index is defined as:
wherein "card (-) means forThe operator for statistical counting, alpha is the specified threshold ratio (suggested value around 0.1)。
By using this metric, an ODF curve can be drawn in which the peak position can be selected as the note onset time, as shown in later experimental results.
2.4 Note decoding
After detecting the onset time point of a note, the note can also be decoded by analyzing the spectral distribution within a given frequency range.
Take piano music as an example. A brilliant piano tune is usually played with two hands: the left hand strikes the left piano key to produce a chord (spectral components are distributed in a relatively low frequency region), and the right hand strikes the right piano key to produce a melody (spectral components are distributed in a relatively high frequency region). In essence, chords are designed according to melodies. Thus, embodiments of the present invention are actually intended to achieve a frequency range in f e f min ,f max ]And (4) note decoding of the melody in the interval.
Specifically, as shown in FIG. 1, assume that the start time t of the mth note has been detected m And the start time t of the (m + 1) th note m+1 . Since they are adjacent notes, the intermediate time (t) can be set m +t m+1 ) The/2 is considered as a reliable point for achieving music decoding, where the observation window is likely to lie completely within the mth beat duration. This time corresponds to the STFT index:
wherein [ ·]"stands for integer rounding operator. Thus, the windowed FFT spectrum can be usedAs a peak, and find the peak spectrum index as:
from this, the melody frequency of this main spectrum can be estimated as:
finally, the closest frequency can be found by looking up the piano pitch table, thereby judging which piano key was pressed. Further, given a reference frequency (e.g., a frequency corresponding to the note "Do"), its note can be easily represented.
Example 2
Verification experiment
3.1.ODF graph
A piano tune "cradle tune" was recorded and the parameter settings mentioned in section 2.2 were performed. According to the above-mentioned starting time detection process, the time-frequency distribution shown in fig. 4 is obtained in the embodiment of the present invention.
Note decoding
After detecting the onset of the note duration, embodiments of the present invention further performed music decoding experiments (reference frequency of 369.994Hz instead of "Do"). For this music, "cradle music", the chord and melody score is shown in fig. 5.
Following the note decoding process described above, all notes are retrieved. For the convenience of demonstration, the note decoding result is displayed on the STFT time-frequency distribution spectrogram, as shown in FIG. 6.
The design principle and design result of tonal music start time detection and note decoding based on spectrum leakage feature extraction and the verification experiment performed on the model are given in detail above, and it can be seen that the embodiment of the present invention has the following beneficial effects:
(1) For the note-onset detection result, it can be seen from fig. 4 (a):
a. there are obviously many horizontal stripes (from the variation of each note) distributed on this STFT time-frequency plot, showing a good energy concentration effect, in fact due to the parameter settings described in section 2.2.
b. There are also clear vertical stripes at the boundaries of adjacent notes that are a side reflection of spectral leakage as the viewing window traverses the transition regions where the viewing window passes from one note to the next.
c. Note that there are a series of horizontal bars at different frequency locations during any note time, which can verify the correctness of the segmented harmonic model for tonal music presented in section 2.1.
As can be seen from fig. 4, diagram (b):
the ODF curve will show the fluctuating changes described in section 2.3, i.e. the ODF curve will become smaller when the viewing window is fully contained within the note and larger when moving to the transition region between adjacent note durations.
b. The peak locations (marked with asterisks) in fig. 4 (b) occur exactly where the vertical stripes (representing the boundaries of adjacent beat durations) are located in fig. 4 (a). The result proves that the provided starting time detection method has higher positioning precision.
(2) For the note decoding results, in conjunction with fig. 5 and 6, it can be seen that:
a. as shown in FIG. 6, the note decoding result is highly consistent with the true double-staff spectrum.
b. In addition, the embodiment of the invention has only a few decoding errors. For example, at time 8s, the desired decoding result should be the treble of the note "5", but it is decoded to the bass of the note "5". This is because the collected sound is a mixture of left-hand chords and right-hand rhythms. For treble with melody "5", the corresponding chord is also exactly bass "5". Therefore, the decoding result is completely dependent on the difference in the pressing strength of both hands, and it is reasonable to reflect the decoding result.
Example 3
The hardware implementation diagram is as shown in fig. 7, the collected audio signal x (t) is sampled by an a/D (analog-to-digital converter) to obtain a sample sequence x (n), the sample sequence x (n) enters a DSP chip in a digital input mode, and the signal after note decoding is output after internal algorithm processing of the DSP chip.
The DSP (Digital Signal Processor) in fig. 7 is a core device, and the internal program flow is as shown in fig. 1, and includes two parts: note onset detection and note decoding.
Note onset detection: first, the input audio is down-sampled using the D factor. Then, the signal is subjected to STFT, which is set according to the note start point detection STFT parameter setting conditions described in section 2.2. Next, by evaluating the degree of spectral leakage, an ODF curve is plotted, and a note onset is found by a peak extraction algorithm.
And (3) note decoding: with the aid of the above-retrieved note onsets, the frequency with the largest energy is found from the corresponding peak bin. The frequency is then note decoded according to the pitch table. All notes can be retrieved by repeating the above operation, going through all times.
The core algorithm of the tonal music note starting point detection and note decoding device extracted based on the frequency spectrum leakage characteristics is implanted into a DSP device, and high-precision, low-complexity and high-efficiency music signal analysis is completed based on the core algorithm.
In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.
Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.
Claims (6)
1. A method for tonal music note onset detection and note decoding, the method comprising:
drawing eta (m) as an end point detection function curve, and detecting an extreme point of the curve as the starting time position of each tonal music note;
successively at intermediate positions of the start times of adjacent notesSearching out the corresponding FFT spectrumPeak spectrum number k of p And calculates the corresponding pitch frequency k p Δf;
Comparing the pitch frequency with the reference frequency of each note in the 12 equal temperament note-pitch table, and finding out the note with the smallest frequency difference as the note decoding result of the current time period.
2. A method for detecting note onset and decoding note of tonal music according to claim 1, wherein said method models tonal music as a segmented harmonic model, extended to actual tonal music, expressed as:
3. The method of claim 2, wherein the time period is at least twice the FFT size, N, for detecting note onset and decoding notes of a tonal music T >2N。
4. The method as claimed in claim 1, wherein the step of drawing η (m) as an end point detection function curve comprises:
the frequency range corresponds to the index range of the FFT as follows:
k∈[k min ,k max ],k min =f min /Δf,k max =f max /Δf
when the observation window is moved to the position of m Δ t, it corresponds to the windowed FFT spectrum X (m, k) extracted from the STFT time-frequency distribution, where the peaks are represented as:
the index is defined as:
5. The method for detecting and decoding note onset for tonal music as recited in claim 1, wherein the step of comparing the pitch frequency with the reference frequencies of the notes in the 12 equal temperament note-pitch table to find the note with the smallest frequency difference as the note decoding result in the current time interval comprises:
suppose that the start time t of the m-th note has been detected m And the start time t of the (m + 1) th note m+1 An intermediate time (t) m +t m+1 ) The/2 is regarded as the reliable point for realizing music decoding, and corresponds to the STFT index as:
wherein [ ·]"represents the integer rounding operator, the windowed FFT spectrumAs a peak, and find the peak spectrum index as:
the melody frequency of this main spectrum is estimated as:
and finally, finding out the closest frequency by searching a piano pitch table so as to judge which piano key is pressed.
6. An apparatus for tonal music note onset detection and note decoding, the apparatus comprising: a processor and a memory, the processor and the memory having stored therein program instructions, the processor invoking the program instructions stored in the memory to cause the apparatus to perform any of the method steps described.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211110245.0A CN115472143A (en) | 2022-09-13 | 2022-09-13 | Tonal music note starting point detection and note decoding method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211110245.0A CN115472143A (en) | 2022-09-13 | 2022-09-13 | Tonal music note starting point detection and note decoding method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115472143A true CN115472143A (en) | 2022-12-13 |
Family
ID=84333318
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211110245.0A Pending CN115472143A (en) | 2022-09-13 | 2022-09-13 | Tonal music note starting point detection and note decoding method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115472143A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002084641A1 (en) * | 2001-04-10 | 2002-10-24 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e. V. | Method for converting a music signal into a note-based description and for referencing a music signal in a data bank |
US20060075884A1 (en) * | 2004-10-11 | 2006-04-13 | Frank Streitenberger | Method and device for extracting a melody underlying an audio signal |
CN102129858A (en) * | 2011-03-16 | 2011-07-20 | 天津大学 | Musical note segmenting method based on Teager energy entropy |
CN112259063A (en) * | 2020-09-08 | 2021-01-22 | 华南理工大学 | Multi-tone overestimation method based on note transient dictionary and steady dictionary |
CN112420071A (en) * | 2020-11-09 | 2021-02-26 | 上海交通大学 | Constant Q transformation based polyphonic electronic organ music note identification method |
CN112509601A (en) * | 2020-11-18 | 2021-03-16 | 中电海康集团有限公司 | Note starting point detection method and system |
-
2022
- 2022-09-13 CN CN202211110245.0A patent/CN115472143A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002084641A1 (en) * | 2001-04-10 | 2002-10-24 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e. V. | Method for converting a music signal into a note-based description and for referencing a music signal in a data bank |
US20060075884A1 (en) * | 2004-10-11 | 2006-04-13 | Frank Streitenberger | Method and device for extracting a melody underlying an audio signal |
CN102129858A (en) * | 2011-03-16 | 2011-07-20 | 天津大学 | Musical note segmenting method based on Teager energy entropy |
CN112259063A (en) * | 2020-09-08 | 2021-01-22 | 华南理工大学 | Multi-tone overestimation method based on note transient dictionary and steady dictionary |
CN112420071A (en) * | 2020-11-09 | 2021-02-26 | 上海交通大学 | Constant Q transformation based polyphonic electronic organ music note identification method |
CN112509601A (en) * | 2020-11-18 | 2021-03-16 | 中电海康集团有限公司 | Note starting point detection method and system |
Non-Patent Citations (2)
Title |
---|
TIAN CHENG ET AL.: ""Improving piano note tracking by HMM smoothing"", 《2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO)》, 28 December 2015 (2015-12-28) * |
马新建: ""基于稀疏分解的音符起始点检测"", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 05, 15 May 2015 (2015-05-15) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Paulus et al. | Measuring the similarity of Rhythmic Patterns. | |
Ryynänen et al. | Automatic transcription of melody, bass line, and chords in polyphonic music | |
Gillet et al. | Transcription and separation of drum signals from polyphonic music | |
Klapuri et al. | Analysis of the meter of acoustic musical signals | |
EP1895506B1 (en) | Sound analysis apparatus and program | |
Lee et al. | Acoustic chord transcription and key extraction from audio using key-dependent HMMs trained on synthesized audio | |
Goto | A robust predominant-F0 estimation method for real-time detection of melody and bass lines in CD recordings | |
Benetos et al. | Polyphonic music transcription using note onset and offset detection | |
Dressler | Pitch estimation by the pair-wise evaluation of spectral peaks | |
JP2003330460A (en) | Method of comparing at least two audio works, program for realizing the method on computer, and method of determining beat spectrum of audio work | |
CN110599987A (en) | Piano note recognition algorithm based on convolutional neural network | |
Benetos et al. | Joint multi-pitch detection using harmonic envelope estimation for polyphonic music transcription | |
McLeod | Fast, accurate pitch detection tools for music analysis | |
Kumar et al. | Musical onset detection on carnatic percussion instruments | |
Teixeira et al. | Ulises: a agent-based system for timbre classification | |
Pratama et al. | Human vocal type classification using MFCC and convolutional neural network | |
Nagavi et al. | An extensive analysis of query by singing/humming system through query proportion | |
Barbancho et al. | Transcription of piano recordings | |
Gurunath Reddy et al. | Predominant melody extraction from vocal polyphonic music signal by time-domain adaptive filtering-based method | |
Emiya et al. | Multipitch estimation of quasi-harmonic sounds in colored noise | |
CN115472143A (en) | Tonal music note starting point detection and note decoding method and device | |
Tang et al. | Melody Extraction from Polyphonic Audio of Western Opera: A Method based on Detection of the Singer's Formant. | |
Müller et al. | Tempo and Beat Tracking | |
Yu et al. | Research on piano performance strength evaluation system based on gesture recognition | |
Tian | A cross-cultural analysis of music structure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Gan Lin Inventor after: Huang Xiangdong Inventor after: Wei Yuyan Inventor before: Huang Xiangdong Inventor before: Wei Yuyan Inventor before: Gan Lin |
|
CB03 | Change of inventor or designer information |