CN115472143A - Tonal music note starting point detection and note decoding method and device - Google Patents

Tonal music note starting point detection and note decoding method and device Download PDF

Info

Publication number
CN115472143A
CN115472143A CN202211110245.0A CN202211110245A CN115472143A CN 115472143 A CN115472143 A CN 115472143A CN 202211110245 A CN202211110245 A CN 202211110245A CN 115472143 A CN115472143 A CN 115472143A
Authority
CN
China
Prior art keywords
note
frequency
music
decoding
tonal music
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211110245.0A
Other languages
Chinese (zh)
Inventor
黄翔东
魏雨言
甘霖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202211110245.0A priority Critical patent/CN115472143A/en
Publication of CN115472143A publication Critical patent/CN115472143A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • G10H1/0016Means for indicating which keys, frets or strings are to be actuated, e.g. using lights or leds
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/02Means for controlling the tone frequencies, e.g. attack or decay; Means for producing special musical effects, e.g. vibratos or glissandos
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/40Rhythm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/051Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or detection of onsets of musical sounds or notes, i.e. note attack timings

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The invention discloses a method and a device for detecting a note starting point and decoding notes of tonal music, wherein the method comprises the following steps: drawing eta (m) as an end point detection function curve, and detecting an extreme point of the curve as the starting time position of each tonal music note; successively at intermediate positions of the start times of adjacent notes
Figure DDA0003843783170000011
Searching out the corresponding FFT spectrum
Figure DDA0003843783170000012
Peak spectrum number k of p And calculates the corresponding pitch frequency k p Δ f; the pitch frequency is compared to the reference frequencies of the individual notes of the 12-equal temperament note-pitch chartAnd comparing, and finding out the note with the minimum frequency difference as the note decoding result of the current time period. The device comprises: a processor and a memory.

Description

Tonal music note starting point detection and note decoding method and device
Technical Field
The invention relates to the field of music information retrieval, relates to the technical field of signal analysis and processing, and particularly relates to a tonal music note starting point detection and note decoding method and device.
Background
Music is onThe color ink is ubiquitous in life and is a very thick ink heavy color in a long river which is developed by human history. In recent years, with the rapid development of internet technology, music has spread more widely, and audio compression technology represented by MP3 has started to be applied in a large scale, which has caused music media such as conventional vinyl record, magnetic tape, etc. to almost disappear, and digital music has been transmitted, downloaded, and listened to over the internet instead. In the face of massive digital Music, how to effectively extract, retrieve and organize Music Information is widely concerned by academic and Information circles, so that Music Information Retrieval (MIR) topic is generated [1] . Compared with the monotone music, the tonal music has a tone as a center around which the formation of the chord and the progression of the tune are performed [2] . Music in this mode is composed of a series of beats, and has a strong sense of direction. In this music analysis, one of the most basic tasks is note onset detection [3][4]
Obviously, the starting point detection is a precondition for note decoding, and is also a fundamental subject for solving other various MIR problems. For example, the beat point is generally accompanied with the starting point of the note, and further research on rhythm analysis and beat tracking can be performed through the information of the starting point of the note; for melody searching subject, under the condition that there is note starting point information, the frame overlapping rate can be reduced, and the retrieval speed is improved; for the problems of fundamental frequency estimation and pitch identification, the method is also carried out on the basis of correctly detecting the note starting point; in addition, the detection of the onset can also help to improve Automated Music Transcription (AMT). In general, the core problem to be solved by the note Onset Detection method is the design of the Onset Detection Function (ODF). The audio signal may be converted to a start detection function that should have a fairly low value (close to zero) most of the time, but a distinct peak at the start time.
With the prevalence and development of deep learning, more and more initial point detection researches start from a learning network to optimize and improve related algorithms. Exemplary Hi-learning based toolThe initial point detection process comprises the following steps: framing, feature extraction, network structure and peak selection based on ODF generation [5-7] . Jan Schl ü ter et al [8] ODF is learned from the input feature vector using CNN (convolutional neural network) network to detect the starting point. Erik Marchi et al [9] And respectively carrying out wavelet transformation and short-time Fourier transformation on the audio, and inputting the audio into an RNN (recurrent neural network) network and an LSTM (long-short term memory) network for note starting point detection. Peter Steiner et al [10] An Echo State network (Echo State Networks, ESNs) is introduced [11] To learn ODF and propose a new stacked ESN algorithm based thereon.
The study of deep learning networks typically ignores the characteristics of the music signal itself. Since the network is composed of a large number of layers, it is difficult to determine the roles of these layers individually, thus preventing generalization of the detection of other types of music onsets. For example: the minority nationality music with extremely special morphological characteristics is not suitable for the minority nationality music in China because the audio morphological characteristic identification and retrieval method based on the western music harmony and structure system is widely adopted in the field of the current world music calculation.
Meanwhile, in the algorithm flow of deep learning, both feature extraction and ODF generation are complicated. In particular, feature extraction involves short-time fourier transforms, filter banks, spectral flux extraction, feature vector construction, and the like. More complicated than this, to obtain the optimal ODF, a series of learning networks, such as sample labeling and training, weight update operations based on error propagation, pattern testing, etc., are required to explore the relationship between the feature vectors and the starting time points, and although these operations may result in high detection accuracy, they are inefficient and consume a large amount of computational resources.
Reference to the literature
[1]Fingerhut M.Music Information Retrieval,or how to search for(and maybe find)music and do away with incipits[J].International Association of Music Libraries International Association of Sound&Audiovisual Archives Congress,2004.
[2] Construction of the concept of Munberg music by old soldiers [ J ] Chinese music 2006 (1): 2.
[3]Schreiber H,Weis C,Muller M.Local Key Estimation In Classical Music Recordings:A Cross-Version Study on Schubert's Winterreise[C]//ICASSP 2020-2020IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2020.
[4]Dixon,Simon.Evaluation of the Audio Beat Tracking System BeatRoot[J].Journal of New Music Research,2007,36(1):39-50.
[5]Grosche P,MüM,ller.Extracting Predominant Local Pulse Information From Music Recordings[J].IEEE Transactions on Audio,Speech,and Language Processing,2011.
[6]Percival G,Tzanetakis G.Streamlined Tempo Estimation Based on Autocorrelation and Cross-correlation With Pulses[J].IEEE/ACM Transactions on Audio Speech&Language Processing,2014,22(12):1765-1776.
[7]Steiner P,Jalalvand A,Stone S,et al.Feature Engineering and Stacked Echo State Networks for Musical Onset Detection[C]//ICPR 2020.2021.
[8]Schlüter J,
Figure BDA0003843783150000021
S.Improved musical onset detection with Convolutional Neural Networks[C]//IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2014:6979-6983.
[9]Marchi E,Ferroni G,Eyben F,et al.Audio Onset Detection:A Wavelet Packet Based Approach with Recurrent Neural Networks[C]//2014International Joint Conference on Neural Networks(IJCNN).IEEE,2014.
[10]Steiner P,Jalalvand A,Stone S,et al.Feature Engineering and Stacked Echo State Networks for Musical Onset Detection[C]//ICPR 2020.2021.
[11]Steiner P,Stone S,Birkholz P.Note Onset Detection using Echo State Networks[C]//Elektronische Sprachsignalverarbeitung(ESSV)2020.2020.
[12]Herremans D,Chew E.MorpheuS:generating structured music with constrained patterns and tension[J].IEEE Transactions on Affective Computing,2017:1-1.
Disclosure of Invention
The invention provides a device for detecting and decoding note starting points of tonal music, which realizes the detection of the starting time of notes and the decoding of the notes; the invention explores the internal relation between the starting time point and the tonal rhythm from the characteristics of the audio signal to obtain an ODF curve, so that the precision and the accuracy are improved, and simultaneously, the model is more universal and is described in detail as follows:
a tonal music note onset detection and note decoding method, the method comprising:
drawing eta (m) as an end point detection function curve, and detecting an extreme point of the curve as the starting time position of each tonal music note;
successively at intermediate positions of the start times of adjacent notes
Figure BDA0003843783150000031
Searching out the corresponding FFT spectrum
Figure BDA0003843783150000032
Peak spectrum number k of p And calculating the corresponding pitch frequency k p Δf;
Comparing the pitch frequency with the reference frequency of each note in the 12 equal temperament note-pitch table, and finding out the note with the smallest frequency difference as the note decoding result of the current time period.
The method models the tonal music into a segmented harmonic model, extends the model into the actual tonal music and is represented as follows:
Figure BDA0003843783150000033
wherein, { A k ,f kk },k=1,...,K 1 And are and
Figure BDA0003843783150000034
representing the amplitude, frequency and initial phase of the previous beat and the next beat, respectively.
Wherein the time period is at least twice the FFT size, N T >2N。
Further, the drawing η (m) as an endpoint detection function curve specifically includes:
the frequency range corresponds to the index range of the FFT as follows:
k∈[k min ,k max ],k min =f min /Δf,k max =f max /Δf
when the observation window is moved to the position of m Δ t, it corresponds to the windowed FFT spectrum X (m, k) extracted from the STFT time-frequency distribution, where the peaks are represented as:
Figure BDA0003843783150000041
the index is defined as:
Figure BDA0003843783150000042
wherein "card (-) means for
Figure BDA0003843783150000043
An operator for statistical counting, wherein alpha is a designated threshold ratio; using this metric, an ODF curve can be drawn.
Comparing the pitch frequency with the reference frequency of each note in the 12-equal law note-pitch table, and finding the note with the minimum frequency difference as the note decoding result of the current time period specifically comprises:
suppose that the start time t of the m-th note has been detected m And the start time t of the (m + 1) th note m+1 An intermediate time (t) m +t m+1 ) Per 2 is considered as the reliability point for realizing music decoding, corresponding to STFT index of:
Figure BDA0003843783150000044
Wherein [ ·]"represents the integer rounding operator, the windowed FFT spectrum
Figure BDA0003843783150000045
As a peak, and find the peak spectrum index as:
Figure BDA0003843783150000046
the melody frequency of this dominant frequency spectrum is estimated as:
Figure BDA0003843783150000047
and finally, finding out the closest frequency by searching a piano pitch table so as to judge which piano key is pressed.
An apparatus for tonal music note onset detection and note decoding, the apparatus comprising: a processor and a memory, the processor and the memory having stored therein program instructions, the processor invoking the program instructions stored in the memory to cause the apparatus to perform any of the method steps described.
The technical scheme provided by the invention has the beneficial effects that:
1. the invention realizes the note decoding and the note starting time detection, provides a direction for the research breakthrough point of music information retrieval, namely, the method does not need to be applied to a simple deep learning algorithm, but turns the attention to the special physical characteristics of time domain, frequency domain, phase and the like of the music, and combines the two to obtain more breakthroughs;
2. the method takes note starting point detection as one of basic research directions of music information retrieval, and lays a foundation for advanced tasks such as rhythm analysis, beat tracking, fundamental frequency estimation, pitch identification and the like.
Drawings
FIG. 1 is a flow chart of a note onset detection and note decoding apparatus for tonal music, i.e., a schematic diagram of the onset detection and note decoding of notes;
FIG. 2 is a schematic diagram of a simple waveform of a harmonic model of the pitch segment;
FIG. 3 is a representation of spectral leakage for different segments of a regularization segmentation model;
FIG. 4 is a graph showing the results of the initial time measurement;
wherein, (a) is STFT time-frequency energy diagram; and (b) is an ODF curve based on the spectral leakage characteristics.
FIG. 5 is a chart of chords and melodic notes for the music "cradle music";
FIG. 6 is a diagram illustrating the note decoding result;
FIG. 7 is a diagram of a hardware implementation of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
The complicated technique changes the accuracy of the detection of the starting point, so that the nature of the starting point needs to be known. In this respect, the embodiment of the present invention attempts to essentially explore the intrinsic relationship between the starting time point and the key tempo, and design a definite processing operation according to the theoretical knowledge of signal processing. The signal processing method does not involve complex operations resulting from network training and optimization, resulting in higher efficiency and consuming fewer computational resources. It should be noted that signal processing to emphasize spectral flux is critical in terms of ODF design [3,4,12] . However, if some spectral parameters are not properly set or the viewing angle is not proper, it is difficult to find the change law of the spectral flux.
The embodiment of the invention provides a starting time detector based on spectrum leakage characteristic extraction, which can decode notes on the basis, and has the main novelty of three aspects:
(1) The embodiment of the invention provides a segmental harmonic model of tonal music; (2) The embodiment of the invention defines a good measurement for evaluating the spectrum leakage degree of different positions of the music piece, and an ODF curve can be drawn to capture the start time of the beat through the peak value; (3) The embodiment of the invention provides a note decoding method.
The above contributions are based on a profound understanding of the spectrum of the modulated music. Both theoretical analysis and experimental results will verify the high accuracy and efficiency of the scheme.
Example 1
The embodiment of the invention is based on the proposed tonal music segmented harmonic model, and finds that the spectrum leakage effect at the boundary of adjacent notes becomes very serious, so that the onset of notes is successfully detected. In order to obtain a good evaluation of the effect, the embodiment of the present invention also derives a series of parameters (including down-sampling coefficients, FFT size, step size, etc.) of the tonal music. Under the condition that the conditions are met, a good measure reflecting the spectrum leakage degree is defined, and an intuitive note starting point detection function curve can be drawn, so that the starting position is searched through peak value selection. The note starting point detection result is combined with a standard piano pitch table, and final notes can be decoded one by one through simple similar frequency matching. Experiments prove that the scheme has higher precision in both detection tasks and decoding tasks.
1. General flow for embodiments of the invention
The specific flow of the tonal music note starting point detection and note decoding device based on spectrum leakage feature extraction provided by the embodiment of the invention is shown in fig. 1:
the method comprises the following specific steps:
input: tonal music sample, lower limit of detection frequency f min =210Hz,f max =840Hz, line threshold ratio α =0.1, 12 equal law note-pitch table.
Step 1: down-sampling the input audio to reduce its sampling rate to F s =6300samples/s;
Step 2: sliding FFT analysis is carried out on the down-sampled audio sequence by using a window with the length of 1024 Hanning (corresponding to the frequency resolution delta f)=f s N =6.15 Hz), the sliding step is 1 sample point, and a short-time Fourier transform (STFT) time-frequency spectrogram | X (m, k) | (m is a time index and k is a frequency index) is generated as the window covers the entire audio sequence;
step 3: for each time instant m, at
Figure BDA0003843783150000061
Searching in maximum value X of | X (m, k) | max And at k ∈ [ k ] min ,k max ]Internally counting that the FFT amplitude spectrum is larger than alpha X max Of the spectral line
Figure BDA0003843783150000062
And calculating the number of the spectral lines and the percentage eta (m) of the total number of the spectral lines;
Figure BDA0003843783150000063
step 4: drawing eta (m) as an endpoint Detection Function curve (ODF) by using the time m, and detecting an extreme point of the curve as the initial time position of each note;
step 5: successively at intermediate positions of the start times of adjacent notes
Figure BDA0003843783150000064
Searching out the corresponding FFT spectrum
Figure BDA0003843783150000065
Peak spectrum number k of p And calculating the corresponding pitch frequency k p Δf;
Step 6: will k p Δ f is compared with the reference frequencies of the respective notes (e.g., "Do" notes) of the 12-equal temperament note-pitch table, and the note with the smallest frequency difference is found as the note decoding result of the current time period.
In the above steps, all note starting point detection is completed by stopping at Step 4, and all note decoding is completed by stopping at Step 6. It is emphasized that the above operations are all based on the spectrum leakage degree of the tonal music, and the technical principle thereof is detailed as follows.
2. Technical principle of the invention
2.1 segmental harmonic model of tonal music
The embodiment of the invention models the tonal music into a segmented harmonic model. Specifically, a piece of music consists of a series of beats, each of which differs in duration and harmonic component.
To elaborate the piecewise harmonic model, we take the signal x (t) with time 3.0938 seconds as an example, and the formula is:
Figure BDA0003843783150000071
in the formula (2), the first half of time of x (t) is frequency f 1 Tone of =7.1Hz, the second half of the time being at frequency f 2 Tone of =16.3 Hz. The waveform of x (t) is shown in fig. 2.
As shown in FIG. 2, embodiments of the present invention utilize a sampling rate f s =64samples/s, 3 fragments with the same time length (1 second) are extracted from x (t), t e [ 0.04696, 1 ], t e [ 1.04696, 2.0313s, respectively],t∈[2.0313s,3.0313s]. In addition, these three segments are fast Fourier transformed using a 64-point Hanning window, whose magnitude spectrum | X 1 (k)|,|X 2 (k)|,|X 3 (k) Shown in fig. 3.
From fig. 3, the 3 spectra | X can be found 1 (k)|,|X 2 (k)|,|X 3 (k) L exhibit different degrees of spectral leakage. Specifically, for the first and third parts, both of which contain only a single pure tone, the degree of spectral leakage is small, and the spectral energy is mainly concentrated in a narrow region centered at the peak (k =6 and k =16, respectively). However, for the second part, the spectral leakage is very severe, showing that the side lobes are spread over a wide area around the two peaks, since this part is not pure and is mixed with two tonalities. In addition, a peak | X can be found 2 (k) The height of | is less than | X 1 (k)|,|X 3 (k) The height of the peak in | reflects the height of the peak that spectral leakage would attenuate.
The above example is a simplified description of the modulation music. The embodiment of the present invention considers that the signal model in equation (2) can be extended to the actual tonal music, and can be expressed as (for the area covering the adjacent beats):
Figure BDA0003843783150000072
wherein, { A k ,f kk },k=1,...,K 1 And are each selected from
Figure BDA0003843783150000073
Representing the amplitude, frequency and initial phase of the previous beat and the next beat, respectively.
As will be described in detail later, a high degree of spectral leakage also implies a transition from one musical note to another.
2.2 STFT parameter setting conditions for Start Point detection
Parameters related to STFT include FFT size N, window type and step size Δ d. Furthermore, as described above, in order to reduce the computational complexity, it is necessary to down-sample the input music recordings. For a song of a piece of tonal music, the individual beats usually fluctuate around the period T, which can be roughly known in advance. At the same time, in order to implement note decoding, it is necessary to associate these STFT parameters with the pitch table.
Assume an original sampling rate of F s (usually F) s =44.1 ksamples/s), the downsampling factor is D. Thus, the sampling rate is:
f s =F s /D (4)
the amount of samples in a beat period is therefore:
N T =Tf s =TF s /D (5)
therefore, the frequency resolution Δ f of the FFT is equal to:
Δf=f s /N=F s /(DN) (6)
to ensure that the FFT window can fit within a complete beat time period without covering any region of the previous beat duration and the next beat duration, the time period should be at least twice the FFT size, i.e. the following inequality should hold:
N T >2N (7)
furthermore, in order to be able to spectrally distinguish the notes of a piano, the minimum spacing of the pitch-frequency table (i.e. the O4, O5 regions listed in the table) should exceed twice the frequency resolution. As shown in Table 1, the minimum interval is equal to the sum of notes A and A b B (i.e., 233.082Hz-220hz = 13.082hz), the following inequality should also hold:
Δf<13.082/2=6.541 (8)
TABLE 1 Piano Pitch and corresponding frequency Table
Figure BDA0003843783150000081
Figure BDA0003843783150000091
In consideration of the parameter conditions (4) to (8), the embodiment of the present invention sets the STFT parameter to:
downsampling factor D =7 and FFT size N =1024. Therefore, with equation (4), the sampling rate is reduced to f s =F s /D =44100/7=6300samples/s. Substituting a conventional beat period value T =0.41s into equation (5) yields the amount of samples N within one beat period T =Tf s 2583, then, considering that the FFT size should be an integer power of 2, a feasible FFT size can be determined from the inequality in equation (7) and N =1024, from which the frequency resolution can be calculated as:
Δf=f s /N=6.1523Hz
the step size of STFT is designated Δ d = N/16=64, and the step size Δ t = Δ d/f s =0.0102 corresponds, indicating a sufficiently high tracking accuracy. In addition, a Hanning window may be selected as the commonly used windowType (b).
Note that Δ f also satisfies inequality (8), which demonstrates that all of the above STFT-related parameters are suitable for note onset detection.
2.3 ODF construction based on spectral leakage feature extraction
Essentially, the start time detection function is constructed to define an indicator for evaluating the extent of spectral leakage. In particular, as the observation time window moves, this index should be small when the observation time window completely enters the beat duration without overlapping any portion of the adjacent beat duration. Thereafter, as the viewing window continues to move forward, the overlap must increase, which also results in an increase in the metric. However, as it gradually goes to the next beat, the index also tends to decrease. While the variation as such is periodic.
In practice, to quantitatively compute the metric, a frequency range f e f should be given min ,f max ]The frequency range corresponds to the index range of FFT as follows:
k∈[k min ,k max ],k min =f min /Δf,k max =f max /Δf (9)
furthermore, when the observation window is moved to the position of m Δ t, it corresponds to the windowed FFT spectrum X (m, k) extracted from the STFT time-frequency distribution, where the peaks are represented as:
Figure BDA0003843783150000092
the index is defined as:
Figure BDA0003843783150000101
wherein "card (-) means for
Figure BDA0003843783150000102
The operator for statistical counting, alpha is the specified threshold ratio (suggested value around 0.1)。
By using this metric, an ODF curve can be drawn in which the peak position can be selected as the note onset time, as shown in later experimental results.
2.4 Note decoding
After detecting the onset time point of a note, the note can also be decoded by analyzing the spectral distribution within a given frequency range.
Take piano music as an example. A brilliant piano tune is usually played with two hands: the left hand strikes the left piano key to produce a chord (spectral components are distributed in a relatively low frequency region), and the right hand strikes the right piano key to produce a melody (spectral components are distributed in a relatively high frequency region). In essence, chords are designed according to melodies. Thus, embodiments of the present invention are actually intended to achieve a frequency range in f e f min ,f max ]And (4) note decoding of the melody in the interval.
Specifically, as shown in FIG. 1, assume that the start time t of the mth note has been detected m And the start time t of the (m + 1) th note m+1 . Since they are adjacent notes, the intermediate time (t) can be set m +t m+1 ) The/2 is considered as a reliable point for achieving music decoding, where the observation window is likely to lie completely within the mth beat duration. This time corresponds to the STFT index:
Figure BDA0003843783150000103
wherein [ ·]"stands for integer rounding operator. Thus, the windowed FFT spectrum can be used
Figure BDA0003843783150000104
As a peak, and find the peak spectrum index as:
Figure BDA0003843783150000105
from this, the melody frequency of this main spectrum can be estimated as:
Figure BDA0003843783150000106
finally, the closest frequency can be found by looking up the piano pitch table, thereby judging which piano key was pressed. Further, given a reference frequency (e.g., a frequency corresponding to the note "Do"), its note can be easily represented.
Example 2
Verification experiment
3.1.ODF graph
A piano tune "cradle tune" was recorded and the parameter settings mentioned in section 2.2 were performed. According to the above-mentioned starting time detection process, the time-frequency distribution shown in fig. 4 is obtained in the embodiment of the present invention.
Note decoding
After detecting the onset of the note duration, embodiments of the present invention further performed music decoding experiments (reference frequency of 369.994Hz instead of "Do"). For this music, "cradle music", the chord and melody score is shown in fig. 5.
Following the note decoding process described above, all notes are retrieved. For the convenience of demonstration, the note decoding result is displayed on the STFT time-frequency distribution spectrogram, as shown in FIG. 6.
The design principle and design result of tonal music start time detection and note decoding based on spectrum leakage feature extraction and the verification experiment performed on the model are given in detail above, and it can be seen that the embodiment of the present invention has the following beneficial effects:
(1) For the note-onset detection result, it can be seen from fig. 4 (a):
a. there are obviously many horizontal stripes (from the variation of each note) distributed on this STFT time-frequency plot, showing a good energy concentration effect, in fact due to the parameter settings described in section 2.2.
b. There are also clear vertical stripes at the boundaries of adjacent notes that are a side reflection of spectral leakage as the viewing window traverses the transition regions where the viewing window passes from one note to the next.
c. Note that there are a series of horizontal bars at different frequency locations during any note time, which can verify the correctness of the segmented harmonic model for tonal music presented in section 2.1.
As can be seen from fig. 4, diagram (b):
the ODF curve will show the fluctuating changes described in section 2.3, i.e. the ODF curve will become smaller when the viewing window is fully contained within the note and larger when moving to the transition region between adjacent note durations.
b. The peak locations (marked with asterisks) in fig. 4 (b) occur exactly where the vertical stripes (representing the boundaries of adjacent beat durations) are located in fig. 4 (a). The result proves that the provided starting time detection method has higher positioning precision.
(2) For the note decoding results, in conjunction with fig. 5 and 6, it can be seen that:
a. as shown in FIG. 6, the note decoding result is highly consistent with the true double-staff spectrum.
b. In addition, the embodiment of the invention has only a few decoding errors. For example, at time 8s, the desired decoding result should be the treble of the note "5", but it is decoded to the bass of the note "5". This is because the collected sound is a mixture of left-hand chords and right-hand rhythms. For treble with melody "5", the corresponding chord is also exactly bass "5". Therefore, the decoding result is completely dependent on the difference in the pressing strength of both hands, and it is reasonable to reflect the decoding result.
Example 3
The hardware implementation diagram is as shown in fig. 7, the collected audio signal x (t) is sampled by an a/D (analog-to-digital converter) to obtain a sample sequence x (n), the sample sequence x (n) enters a DSP chip in a digital input mode, and the signal after note decoding is output after internal algorithm processing of the DSP chip.
The DSP (Digital Signal Processor) in fig. 7 is a core device, and the internal program flow is as shown in fig. 1, and includes two parts: note onset detection and note decoding.
Note onset detection: first, the input audio is down-sampled using the D factor. Then, the signal is subjected to STFT, which is set according to the note start point detection STFT parameter setting conditions described in section 2.2. Next, by evaluating the degree of spectral leakage, an ODF curve is plotted, and a note onset is found by a peak extraction algorithm.
And (3) note decoding: with the aid of the above-retrieved note onsets, the frequency with the largest energy is found from the corresponding peak bin. The frequency is then note decoded according to the pitch table. All notes can be retrieved by repeating the above operation, going through all times.
The core algorithm of the tonal music note starting point detection and note decoding device extracted based on the frequency spectrum leakage characteristics is implanted into a DSP device, and high-precision, low-complexity and high-efficiency music signal analysis is completed based on the core algorithm.
In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.
Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.

Claims (6)

1. A method for tonal music note onset detection and note decoding, the method comprising:
drawing eta (m) as an end point detection function curve, and detecting an extreme point of the curve as the starting time position of each tonal music note;
successively at intermediate positions of the start times of adjacent notes
Figure FDA0003843783140000011
Searching out the corresponding FFT spectrum
Figure FDA0003843783140000012
Peak spectrum number k of p And calculates the corresponding pitch frequency k p Δf;
Comparing the pitch frequency with the reference frequency of each note in the 12 equal temperament note-pitch table, and finding out the note with the smallest frequency difference as the note decoding result of the current time period.
2. A method for detecting note onset and decoding note of tonal music according to claim 1, wherein said method models tonal music as a segmented harmonic model, extended to actual tonal music, expressed as:
Figure FDA0003843783140000013
wherein, { A k ,f kk },k=1,...,K 1 And are and
Figure FDA0003843783140000014
representing the amplitude, frequency and initial phase of the previous and next beat, respectively.
3. The method of claim 2, wherein the time period is at least twice the FFT size, N, for detecting note onset and decoding notes of a tonal music T >2N。
4. The method as claimed in claim 1, wherein the step of drawing η (m) as an end point detection function curve comprises:
the frequency range corresponds to the index range of the FFT as follows:
k∈[k min ,k max ],k min =f min /Δf,k max =f max /Δf
when the observation window is moved to the position of m Δ t, it corresponds to the windowed FFT spectrum X (m, k) extracted from the STFT time-frequency distribution, where the peaks are represented as:
Figure FDA0003843783140000015
the index is defined as:
Figure FDA0003843783140000016
wherein "card (-) means for
Figure FDA0003843783140000017
An operator for performing statistical counting, wherein alpha refers to a specified threshold ratio; by using this metric, an ODF curve can be drawn.
5. The method for detecting and decoding note onset for tonal music as recited in claim 1, wherein the step of comparing the pitch frequency with the reference frequencies of the notes in the 12 equal temperament note-pitch table to find the note with the smallest frequency difference as the note decoding result in the current time interval comprises:
suppose that the start time t of the m-th note has been detected m And the start time t of the (m + 1) th note m+1 An intermediate time (t) m +t m+1 ) The/2 is regarded as the reliable point for realizing music decoding, and corresponds to the STFT index as:
Figure FDA0003843783140000021
wherein [ ·]"represents the integer rounding operator, the windowed FFT spectrum
Figure FDA0003843783140000022
As a peak, and find the peak spectrum index as:
Figure FDA0003843783140000023
the melody frequency of this main spectrum is estimated as:
Figure FDA0003843783140000024
and finally, finding out the closest frequency by searching a piano pitch table so as to judge which piano key is pressed.
6. An apparatus for tonal music note onset detection and note decoding, the apparatus comprising: a processor and a memory, the processor and the memory having stored therein program instructions, the processor invoking the program instructions stored in the memory to cause the apparatus to perform any of the method steps described.
CN202211110245.0A 2022-09-13 2022-09-13 Tonal music note starting point detection and note decoding method and device Pending CN115472143A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211110245.0A CN115472143A (en) 2022-09-13 2022-09-13 Tonal music note starting point detection and note decoding method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211110245.0A CN115472143A (en) 2022-09-13 2022-09-13 Tonal music note starting point detection and note decoding method and device

Publications (1)

Publication Number Publication Date
CN115472143A true CN115472143A (en) 2022-12-13

Family

ID=84333318

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211110245.0A Pending CN115472143A (en) 2022-09-13 2022-09-13 Tonal music note starting point detection and note decoding method and device

Country Status (1)

Country Link
CN (1) CN115472143A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002084641A1 (en) * 2001-04-10 2002-10-24 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e. V. Method for converting a music signal into a note-based description and for referencing a music signal in a data bank
US20060075884A1 (en) * 2004-10-11 2006-04-13 Frank Streitenberger Method and device for extracting a melody underlying an audio signal
CN102129858A (en) * 2011-03-16 2011-07-20 天津大学 Musical note segmenting method based on Teager energy entropy
CN112259063A (en) * 2020-09-08 2021-01-22 华南理工大学 Multi-tone overestimation method based on note transient dictionary and steady dictionary
CN112420071A (en) * 2020-11-09 2021-02-26 上海交通大学 Constant Q transformation based polyphonic electronic organ music note identification method
CN112509601A (en) * 2020-11-18 2021-03-16 中电海康集团有限公司 Note starting point detection method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002084641A1 (en) * 2001-04-10 2002-10-24 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e. V. Method for converting a music signal into a note-based description and for referencing a music signal in a data bank
US20060075884A1 (en) * 2004-10-11 2006-04-13 Frank Streitenberger Method and device for extracting a melody underlying an audio signal
CN102129858A (en) * 2011-03-16 2011-07-20 天津大学 Musical note segmenting method based on Teager energy entropy
CN112259063A (en) * 2020-09-08 2021-01-22 华南理工大学 Multi-tone overestimation method based on note transient dictionary and steady dictionary
CN112420071A (en) * 2020-11-09 2021-02-26 上海交通大学 Constant Q transformation based polyphonic electronic organ music note identification method
CN112509601A (en) * 2020-11-18 2021-03-16 中电海康集团有限公司 Note starting point detection method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TIAN CHENG ET AL.: ""Improving piano note tracking by HMM smoothing"", 《2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO)》, 28 December 2015 (2015-12-28) *
马新建: ""基于稀疏分解的音符起始点检测"", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 05, 15 May 2015 (2015-05-15) *

Similar Documents

Publication Publication Date Title
Paulus et al. Measuring the similarity of Rhythmic Patterns.
Ryynänen et al. Automatic transcription of melody, bass line, and chords in polyphonic music
Gillet et al. Transcription and separation of drum signals from polyphonic music
Klapuri et al. Analysis of the meter of acoustic musical signals
EP1895506B1 (en) Sound analysis apparatus and program
Lee et al. Acoustic chord transcription and key extraction from audio using key-dependent HMMs trained on synthesized audio
Goto A robust predominant-F0 estimation method for real-time detection of melody and bass lines in CD recordings
Benetos et al. Polyphonic music transcription using note onset and offset detection
Dressler Pitch estimation by the pair-wise evaluation of spectral peaks
JP2003330460A (en) Method of comparing at least two audio works, program for realizing the method on computer, and method of determining beat spectrum of audio work
CN110599987A (en) Piano note recognition algorithm based on convolutional neural network
Benetos et al. Joint multi-pitch detection using harmonic envelope estimation for polyphonic music transcription
McLeod Fast, accurate pitch detection tools for music analysis
Kumar et al. Musical onset detection on carnatic percussion instruments
Teixeira et al. Ulises: a agent-based system for timbre classification
Pratama et al. Human vocal type classification using MFCC and convolutional neural network
Nagavi et al. An extensive analysis of query by singing/humming system through query proportion
Barbancho et al. Transcription of piano recordings
Gurunath Reddy et al. Predominant melody extraction from vocal polyphonic music signal by time-domain adaptive filtering-based method
Emiya et al. Multipitch estimation of quasi-harmonic sounds in colored noise
CN115472143A (en) Tonal music note starting point detection and note decoding method and device
Tang et al. Melody Extraction from Polyphonic Audio of Western Opera: A Method based on Detection of the Singer's Formant.
Müller et al. Tempo and Beat Tracking
Yu et al. Research on piano performance strength evaluation system based on gesture recognition
Tian A cross-cultural analysis of music structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Gan Lin

Inventor after: Huang Xiangdong

Inventor after: Wei Yuyan

Inventor before: Huang Xiangdong

Inventor before: Wei Yuyan

Inventor before: Gan Lin

CB03 Change of inventor or designer information