CN101383149A

CN101383149A - Stringed music vibrato automatic detection method

Info

Publication number: CN101383149A
Application number: CNA200810137404XA
Authority: CN
Inventors: 韩纪庆; 孙荣坤
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2008-10-27
Filing date: 2008-10-27
Publication date: 2009-03-11
Anticipated expiration: 2028-10-27
Also published as: CN101383149B

Abstract

The invention relates to a method for automatically detecting string music trills, in particular to a method for detecting string music in real time during automatic music transcription so as to solve the problems that the trills have great influence on the automatic music transcription, and a traditional automatic music transcription method can not automatically detect the trills in the music during automatic string music transcription. According to the number of common diapason notes of the string music, the trills are classified into N categories, and the N categories of trill modules are trained into a matched object library through an audio recognition method, and audio signals of the music to be detected are inputted, and the characteristics of the audio signals are extracted to obtain a characteristic vector sequence; the average period of the counted trills is used as a length to segment the characteristic vector sequence, each vector sequence segment is recognized through the audio recognition method, and time segments corresponding to the vector sequences of which continuous M or more than M segments are recognized as the same category of trills are detected to be the time segments of the thrills. The invention automatically detects the trills and eliminates the influence of the thrills on the automatic music transcription.

Description

Stringed music vibrato automatic detection method

Technical field

The present invention relates to the detection method in a kind of audio identification technology and automatic music mark field, be specifically related to a kind of method that in automatic music mark process, stringed music is detected in real time.

Background technology

The automatic music mark is an important application of multimedia technology, it is meant by analysis and processing to music audio signal, automatically its music score is noted by certain form, to be applied to many music association areas such as auxiliary music teaching, auxiliary musical composition.Though automatic music mark technology has obtained significant progress in recent years, but still have many problems not to be well solved so far, present most of achievement in research all obtains on conditions such as single musical instrument solo, homophony, no special skill performance, and the mark that many musical instruments are instrumental ensembled, the automatic mark of polyphony and complex conditions such as the special sound effect identification automatic musics down such as identification, trill that revolve harmony mark makes slow progress.In the music that many stringed musical instruments are played, exist the trill (at music score " tr " mark) that is used for modifying or showing melody emotion, style in a large number.Mark in the research at automatic music, directly mark if do not carry out trill to detect and be easy to occur mistake, even on melody, allow the automatic music labeling system feel at a loss at this kind musical instrument.Generally speaking, trill is that two continuous scales alternately occur fast on sound effect, yet the normal note that the fragment that a lot of music are exactly arranged but has the continuous scale of non-trill alternately to occur fast is not as adding the mistake (marking into semiquaver or demisemiquaver etc. as mistake) that differentiation can cause music to mark.In addition, again because the uncertainty of speed appears in the trill note, be that trill itself only requires that appearance replaces note fast and do not stipulate concrete speed, its speed is needed by melody fully and player's custom, technology are decided, therefore then can on melody, not make the automatic music labeling system produce mistake as not giving special detection, also not have a kind of special automatic testing method at present at stringed music vibrato.

Summary of the invention

The present invention is for solving at stringed music automatically in the mark process, the problem that the bigger and traditional automatic music mask method of influence that trill marks for automatic music can not detect automatically to the trill in the music provides a kind of stringed music vibrato automatic detection method.The present invention is realized by following steps:

Steps A 1, count N according to the note of string music range commonly used, trill is divided into the N class, N represents natural number, and the method by audio identification is the match objects storehouse with N class trill model training;

Steps A 2, the sound signal that will import music to be detected are designated as s (n), sound signal s (n) is carried out feature extraction obtain feature vector sequence X={x ₁, x ₂..., x _s, S represents natural number;

Steps A 3, on the basis of minute frame, be that length is carried out segmentation to feature vector sequence X with trill T average period that comes out, T representative is greater than 0 real number;

Steps A 4, each section vector sequence is discerned by the method for audio identification;

Steps A 5, for the parameter M that sets, it is the time period of trill that the pairing time period of vector sequence that continuous N or M are identified as same class trill with epimere is promptly detected.

Beneficial effect: the present invention is by being that length is carried out segmentation to feature vector sequence with the trill average period that comes out on minute frame basis, and identification piecemeal, detect the trill fragment in the stringed music, thereby realized automatic detection, removed the purpose of trill for the influence of automatic music mark to reach to the trill in the music.

Description of drawings

Fig. 1 is the method flow diagram of the detection described in the steps A 5; Fig. 2 is the spectrogram of stringed music fragment of the measuring tape trill to be checked of one section test usefulness, and wherein about as can see from Figure 2 0.200 second to 2.609 seconds is trill, and 6.889 seconds to 7.969 seconds is trill; Fig. 3 is that the method for the detection described in the steps A 5 detects the result's (example procedure has used the recognition methods based on vector quantization in the identification to music segments of steps A 1 and steps A 4) who obtains to snatch of music shown in Figure 2, wherein horizontal ordinate is the end points title, ordinate is represented actual and the detected pairing moment of trill end points, unit is second

Represent actual trill end points,

Expression detects the trill end points.

Embodiment

Embodiment one: present embodiment is made up of following steps:

The method of the audio identification that adopts in the steps A 1 of present embodiment and the steps A 4 is a vector quantization method, and neural net method and Hidden Markov Model (HMM) method are applicable to present embodiment too in addition.Process in the feature extraction described in the steps A 2 of present embodiment is: sound signal s (n) is carried out sample quantization and pre-emphasis processing, suppose that speaker's signal is stably in short-term, so can carrying out the branch frame, handles speaker's signal, the concrete frame method that divides is that the method that adopts finite length window movably to be weighted realizes, to the sound signal s after the weighting _w(n) calculate Mel cepstrum coefficient (MFCC), thereby obtain feature vector sequence X={x ₁, x ₂..., x _s,

MFCC Parameter Extraction process is as follows:

(1) sound signal of input is carried out branch frame, windowing, make discrete Fourier transform (DFT) then, obtain spectrum distribution information.

If the DFT of sound signal is

X_{a} (k) = Σ_{n = 0}^{N - 1} x (n) e^{- j 2 πtk / N}

0≤k≤N

X in the formula (n) is the sound signal of input, and N represents counting of Fourier transform;

(2) ask again spectrum amplitude square, obtain energy spectrum;

(3) with the triangle filter group of energy spectrum by one group of Mel yardstick;

For the parameter M described in the steps A 5, define a bank of filters (number of wave filter is close with the number of critical band) that M wave filter arranged, the wave filter of employing is a triangular filter, centre frequency is f (m), m=1,2, ... M makes M=24 in the present embodiment; The span of each triangular filter equates on the Mel scale in the bank of filters, gets 150Mel in the present embodiment; The frequency response of triangular filter is defined as:

H_{m} (k) = \{\begin{matrix} 0 & k < f (m - 1) \\ \frac{2 (k - f (m - 1))}{(f (m + 1) - f (m - 1)) (f (m) - f (m - 1))} & f (m - 1) \leq k \leq f (m) \\ \frac{2 (f (m + 1) - k)}{(f (m + 1) - f (m - 1)) (f (m + 1) - f (m))} & f (m) \leq k \leq f (m + 1) \\ 0 & k &GreaterEqual; f (m + 1) \end{matrix}

Wherein

Σ_{m = 0}^{M - 1} H_{m} (k) = 1

(4) the logarithm energy that calculates the output of each bank of filters is:

S (m) = \ln (Σ_{k = 0}^{N - 1} {| X_{a} (k) |}^{2} H_{m} (k))

0≤m<M

(5) obtain the MFCC coefficient through discrete cosine transform (DCT):

C (n) = Σ_{m = 0}^{N - 1} S (m) \cos (πn (m - 0.5) / M)

0≤n<M

Embodiment two: referring to Fig. 1～Fig. 3, present embodiment further defines the detection described in the steps A 5 and is made up of following steps on the basis of embodiment one:

Step B1, with the value n zero clearing of counter, n is a natural number;

Step B2, from feature vector sequence X, get the vector sequence that a segment length is T;

Step B3, judge that by the method for audio identification length is whether the vector sequence of T is trill and simultaneously identical with the trill classification of a last record, judged result then enters step B4 for being, judged result then enters step B5 for denying;

Step B4, write down the classification of this trill, the value n of counter adds 1 and return step B2;

Step B5, judge counter value n whether more than or equal to M (can make M equal 3), judged result then enters step B6 for being, judged result is then returned step B1 and is continued to detect for not;

Step B6 detects one section trill and exports the result;

Step B7, judge whether audio stream finishes, judged result is for being, detection of end process then, and judged result is then returned step B1 and is continued to detect for not.

Claims

1, stringed music vibrato automatic detection method is characterized in that it may further comprise the steps:

Steps A 2, the sound signal that will import music to be detected are designated as s (n), sound signal s (n) is carried out feature extraction obtain feature vector sequence X={x ₁, x ₂..., x _s), S represents natural number;

2, stringed music vibrato automatic detection method according to claim 1, the method that it is characterized in that the audio identification described in steps A 1 and the steps A 4 are vector quantization method, neural net method or Hidden Markov Model (HMM) method.

3, stringed music vibrato automatic detection method according to claim 1 and 2 is characterized in that the detection described in the steps A 5 may further comprise the steps:

Step B1, with the value n zero clearing of counter, n is a natural number;

Step B5, judge counter value n whether more than or equal to M, judged result then enters step B6 for being, judged result is then returned step B1 and is continued to detect for not;

Step B6 detects one section trill and exports the result;