CN105957538B

CN105957538B - Polyphony Melody extraction method based on conspicuousness

Info

Publication number: CN105957538B
Application number: CN201610299427.5A
Authority: CN
Inventors: 张维维; 陈喆; 殷福亮
Original assignee: Dalian Nationalities University
Current assignee: Dalian Sailing Technology Co Ltd
Priority date: 2016-05-09
Filing date: 2016-05-09
Publication date: 2019-06-11
Anticipated expiration: 2036-05-09
Also published as: CN105957538A

Abstract

The present invention discloses a kind of polyphony Melody extraction method based on conspicuousness, it is that corresponding significance function is defined as to two spectral peak amplitude products, candidate pitch frequency spacing in same frame less than 50 cent ranges merges, and can go out pitch according to a variety of relatively prime subfrequency combinational estimations；Adjacent two frame rates spacing is connected and composed into pitch contour less than the candidate pitch of 50 cents, and the pitch contour length less than 50ms tentatively screens out, according to setting filter criteria selection theme output.The pitch of theme component can be accurately estimated even if in the missing fundamental of theme component or in the case where buried by accompaniment, theme Contour extraction is carried out according to setting filter criteria, and then obtain correct theme output.

Description

Polyphony Melody extraction method based on conspicuousness

Technical field:

The invention belongs to acoustic musical signals processing field, especially a kind of pitch that can accurately estimate theme component can Obtain the polyphony Melody extraction method based on conspicuousness of correct theme output.

Background technique:

Polyphony is the musical form being most widely present, and theme is the soul of music, embodies the sheet of each musical works Matter difference.Polyphony Melody extraction is an important research content in machine hearing field, it is therefore an objective to be enabled a computer to As people, in " listening " music, ignore accompaniment tone influence, accurate recognition is spoken the pitch sequence of happy instrumental music melody, into And the tasks such as the achievable music analysis based on content, retrieval and recommendation, can be used as music score make a copy of, singing search, genre classification, Turn over the front-end processing for singing the applications such as identification.It is applied to polyphony Melody extraction at present to be broadly divided into based on source separation and significant The method of property.It is first to isolate theme component from polyphony based on source separation method, then exports theme component again Pitch sequence, direct limitation of the performance by source separating effect.Melody extraction method based on conspicuousness is with main rotation The energy conspicuousness and time smoothing of rule are foundation, and building multitone height first indicates, then selection have both energy it is significant and when Between smooth two aspects feature pitch contour as theme export, basic procedure is successively carried out to polyphony audio signal Spectrum analysis, pitch indicate, theme Contour extraction and theme export.But since polyphony spectrum structure is complicated, base Frequency is often buried in the frequency spectrum of bass accompaniment, percussion instrument and the accompaniment of other musical sounds, the existing theme based on conspicuousness Extracting method is just difficult to accurately estimate the pitch of theme component, theme occurs and exports incorrect problem.

Summary of the invention:

The present invention is to solve above-mentioned technical problem present in the prior art, and main rotation can accurately be estimated by providing one kind The pitch for restraining component can get the polyphony Melody extraction method based on conspicuousness of correct theme output.

The technical solution of the invention is as follows: a kind of polyphony Melody extraction method based on conspicuousness, is to multiple Music audio signal is adjusted successively to carry out spectrum analysis, pitch indicates, theme Contour extraction and theme export, it is characterised in that The step of pitch indicates is as follows: for each sinusoidal frequency p in polyphony mixed signal_{K, t}(k=1,2 ..., 50, p_{K, t}> 0) and corresponding amplitude m_{K, t}(k=1,2 ..., 50, m_{K, t}> 0), give any pair of spectrum peak frequency p_{I, t}And p_{J, t}And Its corresponding amplitude m_{I, t}And m_{J, t}, the i ≠ j, then

1. enabling x=min { p_{I, t}, p_{J, t}, y=max { p_{I, t}, p_{J, t}}；

2. calculatingWherein [] takes nearest integer；

3. enabling z=mod (y, x) if r (x, y) >=0.15, max { p is assigned_{I, t}, p_{J, t}}=z, wherein mod (y, x) is returnedRemainder, then branch to step 1.；R if (x, y) < 0.15, by this to spectrum peak frequency p_{I, t}And p_{J, t}Obtain candidate sound High frequency values areWhereinAnd give f_{L, t}Distribute a corresponding saliency value w_{L, t}=m_{I, t}m_{J, t}；

4. obtaining candidate pitch frequencies value and corresponding saliency value { f_{L, t}, w_{L, t}}；

5. select another pair spectrum peak frequency and its corresponding amplitude and repeat step 1. -4., until having handled all spectrums Peak；

6. candidate pitch estimates parameter setsWherein F_tFor candidate pitch frequencies value set,For phase Candidate pitch estimated value difference is merged into one less than the candidate pitch of 50 cents by the significant value set answered, forFrequency values are after pitch merges

Wherein Ω=f | | f-f_{R, t}| 50 cent of < }.

The theme Contour extraction is that melody contours generate and melody contours select, and the melody contours generation is will be preceding Candidate pitch of the frame rate spacing within the scope of 50 cents connects to form a profile afterwards, and length is taken to be greater than or equal to the wheel of 50ms Profile, in gained contour line, it is assumed that c_qWith c_sIt is two contour lines, such as meets following 3 conditions simultaneously, then this two profiles Merging becomes a profile:

①|t_{S, start}-t_{Q, end}| < 50ms；

②Semitone；

③

Wherein t_{S, start}Represent profile c_sInitial time, t_{Q, end}It is profile c_qEnd time,It is profile c_s? The frequency of initial time,It is profile c_qIn the frequency of end time,It is profile c_qIn the frequency of initial time Rate, C are total contour line quantity；The melody contours selection is in the non-overlapping unique melody contours of melody contours regional choice As theme, in the overlapping melody contours region with one or more, select based on the accumulative maximum melody contours of saliency value Melody.

Corresponding significance function is defined as two spectral peak amplitude products by the present invention, frequency interval in same frame less than 50 The candidate pitch of cent range merges, and can estimate pitch according to a variety of relatively prime subfrequencies；By adjacent two frame rate interval Candidate pitch less than 50 cents connects and composes pitch contour, and the pitch contour length less than 50ms tentatively screens out, root According to setting filter criteria selection theme output.It can in the case where the missing fundamental of theme component or by accompaniment burial The pitch for accurately estimating theme component carries out theme Contour extraction according to setting filter criteria, and then obtains correct Theme output.

Detailed description of the invention:

Fig. 1 is polyphony time domain waveform of the embodiment of the present invention.

Fig. 2 is the dominant sinusoid ingredient spectrogram of polyphony of the embodiment of the present invention.

Fig. 3 is the most significant pitch schematic diagram of frame level of polyphony of the embodiment of the present invention.

Fig. 4 is polyphony theme of embodiment of the present invention output schematic diagram.

Fig. 5 is that the embodiment of the present invention carries out pitch estimation accuracy rate schematic diagram under fundamental frequency loss situation.

Specific embodiment:

Polyphony spectrum analysis:

First to polyphony signal framing, every frame length is 46.4ms；

Polyphony can be indicated in time-domain are as follows:

Y (t)=x (t)+n (t)

Y (t) is polyphony in formula, and x (t) is theme component, and n (t) is accompaniment tone；

Time frequency analysis is carried out by Short Time Fourier Transform (STFT) to polyphony audio signal, can be obtained

Y (ω, t)=X (ω, t)+N (ω, t)

After carrying out spectrum analysis to polyphony, then the sinusoidal component wherein contained by spectrum peak search.Spectral peak is searched Rope is the frequency spectrum to any time t, finds amplitude spectrum | Y (ω, t) | local maximum, corresponding frequency is as sinusoidal frequency Rate is worth according to a preliminary estimate, then finds out the corresponding accurate sinusoidal frequency corrected value λ (ω, t) of local maximum according to instantaneous frequency method, Calculation formula is as follows:

Re [] takes real in formula, and Im [] takes the imaginary part of plural number.

Polyphony of embodiment of the present invention time domain waveform, dominant sinusoid ingredient spectrogram difference are as shown in Figure 1 and Figure 2.It is logical Spectrum analysis is crossed, the preceding maximum sinusoidal component frequency p of 50 amplitudes in polyphony mixed signal is selected_{K, t}(k=1, 2 ..., 50, p_{K, t}> 0) and corresponding amplitude m_{K, t}(k=1,2 ..., 50, m_{K, t}> 0).

Pitch indicates:

In order to from polyphony mixed signal, accurately estimate pitch, in the present invention proposes that a kind of improved Europe is several Algorithm is obtained, which can estimate pitch (i.e. fundamental frequency) by each relatively prime frequencies of harmonic components.Method is as follows: given to appoint It anticipates two spectral peaks, frequency is respectively p_{I, t}And p_{J, t}(i ≠ j) and its corresponding amplitude m_{I, t}And m_{J, t}, then

1. enabling x=min { p_{I, t}, p_{J, t}, y=max { p_{I, t}, p_{J, t}}；

2. calculatingWherein [] takes nearest integer；

6. candidate pitch parameter setWherein F_tFor candidate pitch frequencies value set,It is corresponding Candidate pitch estimated value difference is merged into one less than the candidate pitch of 50 cents by significant value set, for Frequency values are after pitch merges

Wherein Ω=f | | f-f_{R, t}| 50 cent of < }.

The most significant pitch of the frame level of polyphony of the embodiment of the present invention is as shown in Figure 3.

Theme Contour extraction:

It is to connect candidate pitch of the front and back frame rate spacing within the scope of 50 cents to form a profile, length is taken to be greater than Or the contour line equal to 50ms, in gained contour line, it is assumed that c_qWith c_sIt is two contour lines, such as meets following 3 items simultaneously Part, then this two profiles, which merge, becomes a profile:

①|t_{S, start}-t_{Q, end}| < 50ms；

②Semitone；

③

It is as shown in Figure 4 that polyphony theme of the embodiment of the present invention exports schematic diagram.

From FIG. 1 to FIG. 4 it can be seen that pitch proposed by the present invention and melody contours tracking can accurately extract master Melody pitch sequence.

Under fundamental frequency loss situation two relatively prime order harmonic components are carried out with the standard of pitch estimation according to the embodiment of the present invention True rate is as shown in Figure 5.Horizontal axis is the frequency shift (FS) standard deviation of any one in relatively prime order harmonic components in Fig. 5.Note signal is each The frequency shift (FS) of order harmonic components does not exceed 20 cents generally, as seen from Figure 5, when losing fundamental frequency, even if any mutual In the case that the offset of matter subfrequency reaches 20 cents, pitch estimation accuracy rate can also reach 84%, and note signal has Harmonic component abundant, therefore there are a variety of relatively prime subfrequency combinations, and then pitch estimation method energy proposed by the present invention Enough guarantee the accuracy rate and reliability of the estimation of theme pitch.

Claims

1. a kind of polyphony Melody extraction method based on conspicuousness, is successively to carry out frequency spectrum to polyphony audio signal Analysis, pitch indicate, theme Contour extraction and theme export, it is characterised in that the step of pitch indicates is as follows: right The maximum 50 sinusoidal component frequency p of amplitude in polyphony mixed signal_{K, t}(k=1,2 ..., 50, p_{K, t}> 0) and it is corresponding Amplitude m_{K, t}(k=1,2 ..., 50, m_{K, t}> 0), give any pair of spectrum peak frequency p_{I, t}And p_{J, t}And its corresponding amplitude m_{I, t} And m_{J, t}, the i ≠ j, then

1. enabling x=min { p_{I, t}, p_{J, t}, y=max { p_{I, t}, p_{J, t}}；

2. calculatingWherein [] takes nearest integer；

3. enabling z=mod (y, x) if r (x, y) >=0.15, max { p is assigned_{I, t}, p_{J, t}}=z, wherein mod (y, x) is returned's 1. remainder then branches to step；R if (x, y) < 0.15, by this to spectrum peak frequency p_{I, t}And p_{J, t}Candidate pitch is obtained, Its frequency values isWhereinAnd give f_{L, t}Distribute a corresponding saliency value w_{L, t}=m_{I, t}m_{J, t}；

5. select another pair spectrum peak frequency and its corresponding amplitude and repeat step 1. -4., until having handled all spectral peaks；

6. candidate pitch estimates parameter setsWherein F_tFor candidate pitch frequencies value set,It is corresponding Candidate pitch estimated value difference is merged into one less than the candidate pitch of 50 cents by significant value set, for Frequency values are after pitch merges

Wherein Ω=f | | f-f_{R, t}| 50 cent of < }.

2. the polyphony Melody extraction method according to claim 1 based on conspicuousness, it is characterised in that the master Melody contours are tracked as melody contours generation and melody contours selection, and the melody contours generation is that before and after frames frequency interval exists Candidate pitch within the scope of 50 cents connects to form a profile, takes length to be greater than or equal to the contour line of 50ms, in gained wheel In profile, it is assumed that c_qWith c_sIt is two contour lines, such as meets following 3 conditions simultaneously, then this two profiles, which merge, becomes a wheel It is wide:

①|t_{S, start}-t_{Q, end}| < 50ms；

②Semitone；

③

Wherein t_{S, start}Represent profile c_sInitial time, t_{Q, end}It is profile c_qEnd time,It is profile c_sIt is originating The frequency at moment,It is profile c_qIn the frequency of end time,It is profile c_qIn the frequency of initial time, C is Total contour line quantity；The melody contours select to be in the non-overlapping unique melody contours of melody contours regional choice as master Melody, in the overlapping melody contours region with one or more, selecting the accumulative maximum melody contours of saliency value is theme.