CN115472143A

CN115472143A - Tonal music note starting point detection and note decoding method and device

Info

Publication number: CN115472143A
Application number: CN202211110245.0A
Authority: CN
Inventors: 黄翔东; 魏雨言; 甘霖
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2022-09-13
Filing date: 2022-09-13
Publication date: 2022-12-13

Abstract

The invention discloses a method and a device for detecting a note starting point and decoding notes of tonal music, wherein the method comprises the following steps: drawing eta (m) as an end point detection function curve, and detecting an extreme point of the curve as the starting time position of each tonal music note; successively at intermediate positions of the start times of adjacent notes

Searching out the corresponding FFT spectrum

Peak spectrum number k of _p And calculates the corresponding pitch frequency k _p Δ f; the pitch frequency is compared to the reference frequencies of the individual notes of the 12-equal temperament note-pitch chartAnd comparing, and finding out the note with the minimum frequency difference as the note decoding result of the current time period. The device comprises: a processor and a memory.

Description

Tonal music note starting point detection and note decoding method and device

Technical Field

The invention relates to the field of music information retrieval, relates to the technical field of signal analysis and processing, and particularly relates to a tonal music note starting point detection and note decoding method and device.

Background

Music is onThe color ink is ubiquitous in life and is a very thick ink heavy color in a long river which is developed by human history. In recent years, with the rapid development of internet technology, music has spread more widely, and audio compression technology represented by MP3 has started to be applied in a large scale, which has caused music media such as conventional vinyl record, magnetic tape, etc. to almost disappear, and digital music has been transmitted, downloaded, and listened to over the internet instead. In the face of massive digital Music, how to effectively extract, retrieve and organize Music Information is widely concerned by academic and Information circles, so that Music Information Retrieval (MIR) topic is generated ^[1] . Compared with the monotone music, the tonal music has a tone as a center around which the formation of the chord and the progression of the tune are performed ^[2] . Music in this mode is composed of a series of beats, and has a strong sense of direction. In this music analysis, one of the most basic tasks is note onset detection ^[3][4] 。

Obviously, the starting point detection is a precondition for note decoding, and is also a fundamental subject for solving other various MIR problems. For example, the beat point is generally accompanied with the starting point of the note, and further research on rhythm analysis and beat tracking can be performed through the information of the starting point of the note; for melody searching subject, under the condition that there is note starting point information, the frame overlapping rate can be reduced, and the retrieval speed is improved; for the problems of fundamental frequency estimation and pitch identification, the method is also carried out on the basis of correctly detecting the note starting point; in addition, the detection of the onset can also help to improve Automated Music Transcription (AMT). In general, the core problem to be solved by the note Onset Detection method is the design of the Onset Detection Function (ODF). The audio signal may be converted to a start detection function that should have a fairly low value (close to zero) most of the time, but a distinct peak at the start time.

With the prevalence and development of deep learning, more and more initial point detection researches start from a learning network to optimize and improve related algorithms. Exemplary Hi-learning based toolThe initial point detection process comprises the following steps: framing, feature extraction, network structure and peak selection based on ODF generation ^[5-7] . Jan Schl ü ter et al ^[8] ODF is learned from the input feature vector using CNN (convolutional neural network) network to detect the starting point. Erik Marchi et al ^[9] And respectively carrying out wavelet transformation and short-time Fourier transformation on the audio, and inputting the audio into an RNN (recurrent neural network) network and an LSTM (long-short term memory) network for note starting point detection. Peter Steiner et al ^[10] An Echo State network (Echo State Networks, ESNs) is introduced ^[11] To learn ODF and propose a new stacked ESN algorithm based thereon.

The study of deep learning networks typically ignores the characteristics of the music signal itself. Since the network is composed of a large number of layers, it is difficult to determine the roles of these layers individually, thus preventing generalization of the detection of other types of music onsets. For example: the minority nationality music with extremely special morphological characteristics is not suitable for the minority nationality music in China because the audio morphological characteristic identification and retrieval method based on the western music harmony and structure system is widely adopted in the field of the current world music calculation.

Meanwhile, in the algorithm flow of deep learning, both feature extraction and ODF generation are complicated. In particular, feature extraction involves short-time fourier transforms, filter banks, spectral flux extraction, feature vector construction, and the like. More complicated than this, to obtain the optimal ODF, a series of learning networks, such as sample labeling and training, weight update operations based on error propagation, pattern testing, etc., are required to explore the relationship between the feature vectors and the starting time points, and although these operations may result in high detection accuracy, they are inefficient and consume a large amount of computational resources.

Reference to the literature

[1]Fingerhut M.Music Information Retrieval,or how to search for(and maybe find)music and do away with incipits[J].International Association of Music Libraries International Association of Sound&Audiovisual Archives Congress,2004.

[2] Construction of the concept of Munberg music by old soldiers [ J ] Chinese music 2006 (1): 2.

[3]Schreiber H,Weis C,Muller M.Local Key Estimation In Classical Music Recordings:A Cross-Version Study on Schubert's Winterreise[C]//ICASSP 2020-2020IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2020.

[4]Dixon,Simon.Evaluation of the Audio Beat Tracking System BeatRoot[J].Journal of New Music Research,2007,36(1):39-50.

[5]Grosche P,MüM,ller.Extracting Predominant Local Pulse Information From Music Recordings[J].IEEE Transactions on Audio,Speech,and Language Processing,2011.

[6]Percival G,Tzanetakis G.Streamlined Tempo Estimation Based on Autocorrelation and Cross-correlation With Pulses[J].IEEE/ACM Transactions on Audio Speech&Language Processing,2014,22(12):1765-1776.

[7]Steiner P,Jalalvand A,Stone S,et al.Feature Engineering and Stacked Echo State Networks for Musical Onset Detection[C]//ICPR 2020.2021.

[8]Schlüter J,

S.Improved musical onset detection with Convolutional Neural Networks[C]//IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2014:6979-6983.

[9]Marchi E,Ferroni G,Eyben F,et al.Audio Onset Detection:A Wavelet Packet Based Approach with Recurrent Neural Networks[C]//2014International Joint Conference on Neural Networks(IJCNN).IEEE,2014.

[10]Steiner P,Jalalvand A,Stone S,et al.Feature Engineering and Stacked Echo State Networks for Musical Onset Detection[C]//ICPR 2020.2021.

[11]Steiner P,Stone S,Birkholz P.Note Onset Detection using Echo State Networks[C]//Elektronische Sprachsignalverarbeitung(ESSV)2020.2020.

[12]Herremans D,Chew E.MorpheuS:generating structured music with constrained patterns and tension[J].IEEE Transactions on Affective Computing,2017:1-1.

Disclosure of Invention

The invention provides a device for detecting and decoding note starting points of tonal music, which realizes the detection of the starting time of notes and the decoding of the notes; the invention explores the internal relation between the starting time point and the tonal rhythm from the characteristics of the audio signal to obtain an ODF curve, so that the precision and the accuracy are improved, and simultaneously, the model is more universal and is described in detail as follows:

a tonal music note onset detection and note decoding method, the method comprising:

drawing eta (m) as an end point detection function curve, and detecting an extreme point of the curve as the starting time position of each tonal music note;

successively at intermediate positions of the start times of adjacent notes

Searching out the corresponding FFT spectrum

Peak spectrum number k of _p And calculating the corresponding pitch frequency k _p Δf；

Comparing the pitch frequency with the reference frequency of each note in the 12 equal temperament note-pitch table, and finding out the note with the smallest frequency difference as the note decoding result of the current time period.

The method models the tonal music into a segmented harmonic model, extends the model into the actual tonal music and is represented as follows:

wherein, { A _k ,f _k ,θ _k },k＝1,...,K ₁ And are and

representing the amplitude, frequency and initial phase of the previous beat and the next beat, respectively.

Wherein the time period is at least twice the FFT size, N _T ＞2N。

Further, the drawing η (m) as an endpoint detection function curve specifically includes:

the frequency range corresponds to the index range of the FFT as follows:

k∈[k _min ,k _max ],k _min ＝f _min /Δf，k _max ＝f _max /Δf

when the observation window is moved to the position of m Δ t, it corresponds to the windowed FFT spectrum X (m, k) extracted from the STFT time-frequency distribution, where the peaks are represented as:

the index is defined as:

wherein "card (-) means for

An operator for statistical counting, wherein alpha is a designated threshold ratio; using this metric, an ODF curve can be drawn.

Comparing the pitch frequency with the reference frequency of each note in the 12-equal law note-pitch table, and finding the note with the minimum frequency difference as the note decoding result of the current time period specifically comprises:

suppose that the start time t of the m-th note has been detected _m And the start time t of the (m + 1) th note _m+1 An intermediate time (t) _m +t _m+1 ) Per 2 is considered as the reliability point for realizing music decoding, corresponding to STFT index of：

Wherein [ ·]"represents the integer rounding operator, the windowed FFT spectrum

As a peak, and find the peak spectrum index as:

the melody frequency of this dominant frequency spectrum is estimated as:

and finally, finding out the closest frequency by searching a piano pitch table so as to judge which piano key is pressed.

An apparatus for tonal music note onset detection and note decoding, the apparatus comprising: a processor and a memory, the processor and the memory having stored therein program instructions, the processor invoking the program instructions stored in the memory to cause the apparatus to perform any of the method steps described.

The technical scheme provided by the invention has the beneficial effects that:

1. the invention realizes the note decoding and the note starting time detection, provides a direction for the research breakthrough point of music information retrieval, namely, the method does not need to be applied to a simple deep learning algorithm, but turns the attention to the special physical characteristics of time domain, frequency domain, phase and the like of the music, and combines the two to obtain more breakthroughs;

2. the method takes note starting point detection as one of basic research directions of music information retrieval, and lays a foundation for advanced tasks such as rhythm analysis, beat tracking, fundamental frequency estimation, pitch identification and the like.

Drawings

FIG. 1 is a flow chart of a note onset detection and note decoding apparatus for tonal music, i.e., a schematic diagram of the onset detection and note decoding of notes;

FIG. 2 is a schematic diagram of a simple waveform of a harmonic model of the pitch segment;

FIG. 3 is a representation of spectral leakage for different segments of a regularization segmentation model;

FIG. 4 is a graph showing the results of the initial time measurement;

wherein, (a) is STFT time-frequency energy diagram; and (b) is an ODF curve based on the spectral leakage characteristics.

FIG. 5 is a chart of chords and melodic notes for the music "cradle music";

FIG. 6 is a diagram illustrating the note decoding result;

FIG. 7 is a diagram of a hardware implementation of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.

The complicated technique changes the accuracy of the detection of the starting point, so that the nature of the starting point needs to be known. In this respect, the embodiment of the present invention attempts to essentially explore the intrinsic relationship between the starting time point and the key tempo, and design a definite processing operation according to the theoretical knowledge of signal processing. The signal processing method does not involve complex operations resulting from network training and optimization, resulting in higher efficiency and consuming fewer computational resources. It should be noted that signal processing to emphasize spectral flux is critical in terms of ODF design ^[3,4,12] . However, if some spectral parameters are not properly set or the viewing angle is not proper, it is difficult to find the change law of the spectral flux.

The embodiment of the invention provides a starting time detector based on spectrum leakage characteristic extraction, which can decode notes on the basis, and has the main novelty of three aspects:

(1) The embodiment of the invention provides a segmental harmonic model of tonal music; (2) The embodiment of the invention defines a good measurement for evaluating the spectrum leakage degree of different positions of the music piece, and an ODF curve can be drawn to capture the start time of the beat through the peak value; (3) The embodiment of the invention provides a note decoding method.

The above contributions are based on a profound understanding of the spectrum of the modulated music. Both theoretical analysis and experimental results will verify the high accuracy and efficiency of the scheme.

Example 1

The embodiment of the invention is based on the proposed tonal music segmented harmonic model, and finds that the spectrum leakage effect at the boundary of adjacent notes becomes very serious, so that the onset of notes is successfully detected. In order to obtain a good evaluation of the effect, the embodiment of the present invention also derives a series of parameters (including down-sampling coefficients, FFT size, step size, etc.) of the tonal music. Under the condition that the conditions are met, a good measure reflecting the spectrum leakage degree is defined, and an intuitive note starting point detection function curve can be drawn, so that the starting position is searched through peak value selection. The note starting point detection result is combined with a standard piano pitch table, and final notes can be decoded one by one through simple similar frequency matching. Experiments prove that the scheme has higher precision in both detection tasks and decoding tasks.

1. General flow for embodiments of the invention

The specific flow of the tonal music note starting point detection and note decoding device based on spectrum leakage feature extraction provided by the embodiment of the invention is shown in fig. 1:

the method comprises the following specific steps:

input: tonal music sample, lower limit of detection frequency f _min ＝210Hz，f _max =840Hz, line threshold ratio α =0.1, 12 equal law note-pitch table.

Step 1: down-sampling the input audio to reduce its sampling rate to F _s ＝6300samples/s；

Step 2: sliding FFT analysis is carried out on the down-sampled audio sequence by using a window with the length of 1024 Hanning (corresponding to the frequency resolution delta f)＝f _s N =6.15 Hz), the sliding step is 1 sample point, and a short-time Fourier transform (STFT) time-frequency spectrogram | X (m, k) | (m is a time index and k is a frequency index) is generated as the window covers the entire audio sequence;

step 3: for each time instant m, at

Searching in maximum value X of | X (m, k) | _max And at k ∈ [ k ] _min ,k _max ]Internally counting that the FFT amplitude spectrum is larger than alpha X _max Of the spectral line

And calculating the number of the spectral lines and the percentage eta (m) of the total number of the spectral lines;

step 4: drawing eta (m) as an endpoint Detection Function curve (ODF) by using the time m, and detecting an extreme point of the curve as the initial time position of each note;

step 5: successively at intermediate positions of the start times of adjacent notes

Searching out the corresponding FFT spectrum

Step 6: will k _p Δ f is compared with the reference frequencies of the respective notes (e.g., "Do" notes) of the 12-equal temperament note-pitch table, and the note with the smallest frequency difference is found as the note decoding result of the current time period.

In the above steps, all note starting point detection is completed by stopping at Step 4, and all note decoding is completed by stopping at Step 6. It is emphasized that the above operations are all based on the spectrum leakage degree of the tonal music, and the technical principle thereof is detailed as follows.

2. Technical principle of the invention

2.1 segmental harmonic model of tonal music

The embodiment of the invention models the tonal music into a segmented harmonic model. Specifically, a piece of music consists of a series of beats, each of which differs in duration and harmonic component.

To elaborate the piecewise harmonic model, we take the signal x (t) with time 3.0938 seconds as an example, and the formula is:

in the formula (2), the first half of time of x (t) is frequency f ₁ Tone of =7.1Hz, the second half of the time being at frequency f ₂ Tone of =16.3 Hz. The waveform of x (t) is shown in fig. 2.

As shown in FIG. 2, embodiments of the present invention utilize a sampling rate f _s =64samples/s, 3 fragments with the same time length (1 second) are extracted from x (t), t e [ 0.04696, 1 ], t e [ 1.04696, 2.0313s, respectively]，t∈[2.0313s,3.0313s]. In addition, these three segments are fast Fourier transformed using a 64-point Hanning window, whose magnitude spectrum | X ₁ (k)|,|X ₂ (k)|,|X ₃ (k) Shown in fig. 3.

From fig. 3, the 3 spectra | X can be found ₁ (k)|,|X ₂ (k)|,|X ₃ (k) L exhibit different degrees of spectral leakage. Specifically, for the first and third parts, both of which contain only a single pure tone, the degree of spectral leakage is small, and the spectral energy is mainly concentrated in a narrow region centered at the peak (k =6 and k =16, respectively). However, for the second part, the spectral leakage is very severe, showing that the side lobes are spread over a wide area around the two peaks, since this part is not pure and is mixed with two tonalities. In addition, a peak | X can be found ₂ (k) The height of | is less than | X ₁ (k)|,|X ₃ (k) The height of the peak in | reflects the height of the peak that spectral leakage would attenuate.

The above example is a simplified description of the modulation music. The embodiment of the present invention considers that the signal model in equation (2) can be extended to the actual tonal music, and can be expressed as (for the area covering the adjacent beats):

wherein, { A _k ,f _k ,θ _k },k＝1,...,K ₁ And are each selected from

As will be described in detail later, a high degree of spectral leakage also implies a transition from one musical note to another.

2.2 STFT parameter setting conditions for Start Point detection

Parameters related to STFT include FFT size N, window type and step size Δ d. Furthermore, as described above, in order to reduce the computational complexity, it is necessary to down-sample the input music recordings. For a song of a piece of tonal music, the individual beats usually fluctuate around the period T, which can be roughly known in advance. At the same time, in order to implement note decoding, it is necessary to associate these STFT parameters with the pitch table.

Assume an original sampling rate of F _s (usually F) _s =44.1 ksamples/s), the downsampling factor is D. Thus, the sampling rate is:

f _s ＝F _s /D (4)

the amount of samples in a beat period is therefore:

N _T ＝Tf _s ＝TF _s /D (5)

therefore, the frequency resolution Δ f of the FFT is equal to:

Δf＝f _s /N＝F _s /(DN) (6)

to ensure that the FFT window can fit within a complete beat time period without covering any region of the previous beat duration and the next beat duration, the time period should be at least twice the FFT size, i.e. the following inequality should hold:

N _T ＞2N (7)

furthermore, in order to be able to spectrally distinguish the notes of a piano, the minimum spacing of the pitch-frequency table (i.e. the O4, O5 regions listed in the table) should exceed twice the frequency resolution. As shown in Table 1, the minimum interval is equal to the sum of notes A and A ^b B (i.e., 233.082Hz-220hz = 13.082hz), the following inequality should also hold:

Δf＜13.082/2＝6.541 (8)

TABLE 1 Piano Pitch and corresponding frequency Table

In consideration of the parameter conditions (4) to (8), the embodiment of the present invention sets the STFT parameter to:

downsampling factor D =7 and FFT size N =1024. Therefore, with equation (4), the sampling rate is reduced to f _s ＝F _s /D =44100/7=6300samples/s. Substituting a conventional beat period value T =0.41s into equation (5) yields the amount of samples N within one beat period _T ＝Tf _s 2583, then, considering that the FFT size should be an integer power of 2, a feasible FFT size can be determined from the inequality in equation (7) and N =1024, from which the frequency resolution can be calculated as:

Δf＝f _s /N＝6.1523Hz

the step size of STFT is designated Δ d = N/16=64, and the step size Δ t = Δ d/f _s =0.0102 corresponds, indicating a sufficiently high tracking accuracy. In addition, a Hanning window may be selected as the commonly used windowType (b).

Note that Δ f also satisfies inequality (8), which demonstrates that all of the above STFT-related parameters are suitable for note onset detection.

2.3 ODF construction based on spectral leakage feature extraction

Essentially, the start time detection function is constructed to define an indicator for evaluating the extent of spectral leakage. In particular, as the observation time window moves, this index should be small when the observation time window completely enters the beat duration without overlapping any portion of the adjacent beat duration. Thereafter, as the viewing window continues to move forward, the overlap must increase, which also results in an increase in the metric. However, as it gradually goes to the next beat, the index also tends to decrease. While the variation as such is periodic.

In practice, to quantitatively compute the metric, a frequency range f e f should be given _min ,f _max ]The frequency range corresponds to the index range of FFT as follows:

k∈[k _min ,k _max ],k _min ＝f _min /Δf，k _max ＝f _max /Δf (9)

furthermore, when the observation window is moved to the position of m Δ t, it corresponds to the windowed FFT spectrum X (m, k) extracted from the STFT time-frequency distribution, where the peaks are represented as:

the index is defined as:

wherein "card (-) means for

The operator for statistical counting, alpha is the specified threshold ratio (suggested value around 0.1)。

By using this metric, an ODF curve can be drawn in which the peak position can be selected as the note onset time, as shown in later experimental results.

2.4 Note decoding

After detecting the onset time point of a note, the note can also be decoded by analyzing the spectral distribution within a given frequency range.

Take piano music as an example. A brilliant piano tune is usually played with two hands: the left hand strikes the left piano key to produce a chord (spectral components are distributed in a relatively low frequency region), and the right hand strikes the right piano key to produce a melody (spectral components are distributed in a relatively high frequency region). In essence, chords are designed according to melodies. Thus, embodiments of the present invention are actually intended to achieve a frequency range in f e f _min ,f _max ]And (4) note decoding of the melody in the interval.

Specifically, as shown in FIG. 1, assume that the start time t of the mth note has been detected _m And the start time t of the (m + 1) th note _m+1 . Since they are adjacent notes, the intermediate time (t) can be set _m +t _m+1 ) The/2 is considered as a reliable point for achieving music decoding, where the observation window is likely to lie completely within the mth beat duration. This time corresponds to the STFT index:

wherein [ ·]"stands for integer rounding operator. Thus, the windowed FFT spectrum can be used

As a peak, and find the peak spectrum index as:

from this, the melody frequency of this main spectrum can be estimated as:

finally, the closest frequency can be found by looking up the piano pitch table, thereby judging which piano key was pressed. Further, given a reference frequency (e.g., a frequency corresponding to the note "Do"), its note can be easily represented.

Example 2

Verification experiment

3.1.ODF graph

A piano tune "cradle tune" was recorded and the parameter settings mentioned in section 2.2 were performed. According to the above-mentioned starting time detection process, the time-frequency distribution shown in fig. 4 is obtained in the embodiment of the present invention.

Note decoding

After detecting the onset of the note duration, embodiments of the present invention further performed music decoding experiments (reference frequency of 369.994Hz instead of "Do"). For this music, "cradle music", the chord and melody score is shown in fig. 5.

Following the note decoding process described above, all notes are retrieved. For the convenience of demonstration, the note decoding result is displayed on the STFT time-frequency distribution spectrogram, as shown in FIG. 6.

The design principle and design result of tonal music start time detection and note decoding based on spectrum leakage feature extraction and the verification experiment performed on the model are given in detail above, and it can be seen that the embodiment of the present invention has the following beneficial effects:

(1) For the note-onset detection result, it can be seen from fig. 4 (a):

a. there are obviously many horizontal stripes (from the variation of each note) distributed on this STFT time-frequency plot, showing a good energy concentration effect, in fact due to the parameter settings described in section 2.2.

b. There are also clear vertical stripes at the boundaries of adjacent notes that are a side reflection of spectral leakage as the viewing window traverses the transition regions where the viewing window passes from one note to the next.

c. Note that there are a series of horizontal bars at different frequency locations during any note time, which can verify the correctness of the segmented harmonic model for tonal music presented in section 2.1.

As can be seen from fig. 4, diagram (b):

the ODF curve will show the fluctuating changes described in section 2.3, i.e. the ODF curve will become smaller when the viewing window is fully contained within the note and larger when moving to the transition region between adjacent note durations.

b. The peak locations (marked with asterisks) in fig. 4 (b) occur exactly where the vertical stripes (representing the boundaries of adjacent beat durations) are located in fig. 4 (a). The result proves that the provided starting time detection method has higher positioning precision.

(2) For the note decoding results, in conjunction with fig. 5 and 6, it can be seen that:

a. as shown in FIG. 6, the note decoding result is highly consistent with the true double-staff spectrum.

b. In addition, the embodiment of the invention has only a few decoding errors. For example, at time 8s, the desired decoding result should be the treble of the note "5", but it is decoded to the bass of the note "5". This is because the collected sound is a mixture of left-hand chords and right-hand rhythms. For treble with melody "5", the corresponding chord is also exactly bass "5". Therefore, the decoding result is completely dependent on the difference in the pressing strength of both hands, and it is reasonable to reflect the decoding result.

Example 3

The hardware implementation diagram is as shown in fig. 7, the collected audio signal x (t) is sampled by an a/D (analog-to-digital converter) to obtain a sample sequence x (n), the sample sequence x (n) enters a DSP chip in a digital input mode, and the signal after note decoding is output after internal algorithm processing of the DSP chip.

The DSP (Digital Signal Processor) in fig. 7 is a core device, and the internal program flow is as shown in fig. 1, and includes two parts: note onset detection and note decoding.

Note onset detection: first, the input audio is down-sampled using the D factor. Then, the signal is subjected to STFT, which is set according to the note start point detection STFT parameter setting conditions described in section 2.2. Next, by evaluating the degree of spectral leakage, an ODF curve is plotted, and a note onset is found by a peak extraction algorithm.

And (3) note decoding: with the aid of the above-retrieved note onsets, the frequency with the largest energy is found from the corresponding peak bin. The frequency is then note decoded according to the pitch table. All notes can be retrieved by repeating the above operation, going through all times.

The core algorithm of the tonal music note starting point detection and note decoding device extracted based on the frequency spectrum leakage characteristics is implanted into a DSP device, and high-precision, low-complexity and high-efficiency music signal analysis is completed based on the core algorithm.

In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.

Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.

Claims

1. A method for tonal music note onset detection and note decoding, the method comprising:

successively at intermediate positions of the start times of adjacent notes

Searching out the corresponding FFT spectrum

Peak spectrum number k of _p And calculates the corresponding pitch frequency k _p Δf；

2. A method for detecting note onset and decoding note of tonal music according to claim 1, wherein said method models tonal music as a segmented harmonic model, extended to actual tonal music, expressed as:

wherein, { A _k ,f _k ,θ _k },k＝1,...,K ₁ And are and

representing the amplitude, frequency and initial phase of the previous and next beat, respectively.

3. The method of claim 2, wherein the time period is at least twice the FFT size, N, for detecting note onset and decoding notes of a tonal music _T ＞2N。

4. The method as claimed in claim 1, wherein the step of drawing η (m) as an end point detection function curve comprises:

the frequency range corresponds to the index range of the FFT as follows:

k∈[k _min ,k _max ],k _min ＝f _min /Δf，k _max ＝f _max /Δf

the index is defined as:

wherein "card (-) means for

An operator for performing statistical counting, wherein alpha refers to a specified threshold ratio; by using this metric, an ODF curve can be drawn.

5. The method for detecting and decoding note onset for tonal music as recited in claim 1, wherein the step of comparing the pitch frequency with the reference frequencies of the notes in the 12 equal temperament note-pitch table to find the note with the smallest frequency difference as the note decoding result in the current time interval comprises:

suppose that the start time t of the m-th note has been detected _m And the start time t of the (m + 1) th note _m+1 An intermediate time (t) _m +t _m+1 ) The/2 is regarded as the reliable point for realizing music decoding, and corresponds to the STFT index as:

As a peak, and find the peak spectrum index as:

the melody frequency of this main spectrum is estimated as:

6. An apparatus for tonal music note onset detection and note decoding, the apparatus comprising: a processor and a memory, the processor and the memory having stored therein program instructions, the processor invoking the program instructions stored in the memory to cause the apparatus to perform any of the method steps described.