CN105957538B - Polyphony Melody extraction method based on conspicuousness - Google Patents
Polyphony Melody extraction method based on conspicuousness Download PDFInfo
- Publication number
- CN105957538B CN105957538B CN201610299427.5A CN201610299427A CN105957538B CN 105957538 B CN105957538 B CN 105957538B CN 201610299427 A CN201610299427 A CN 201610299427A CN 105957538 B CN105957538 B CN 105957538B
- Authority
- CN
- China
- Prior art keywords
- pitch
- frequency
- melody
- theme
- polyphony
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 18
- 230000003595 spectral effect Effects 0.000 claims abstract description 5
- 238000001228 spectrum Methods 0.000 claims description 16
- 230000005236 sound signal Effects 0.000 claims description 4
- 238000000034 method Methods 0.000 description 6
- 238000010183 spectrum analysis Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 239000004615 ingredient Substances 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 238000007630 basic procedure Methods 0.000 description 1
- 238000009933 burial Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 238000009527 percussion Methods 0.000 description 1
- 230000000452 restraining effect Effects 0.000 description 1
- 230000007306 turnover Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/90—Pitch determination of speech signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/90—Pitch determination of speech signals
- G10L2025/906—Pitch tracking
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
The present invention discloses a kind of polyphony Melody extraction method based on conspicuousness, it is that corresponding significance function is defined as to two spectral peak amplitude products, candidate pitch frequency spacing in same frame less than 50 cent ranges merges, and can go out pitch according to a variety of relatively prime subfrequency combinational estimations;Adjacent two frame rates spacing is connected and composed into pitch contour less than the candidate pitch of 50 cents, and the pitch contour length less than 50ms tentatively screens out, according to setting filter criteria selection theme output.The pitch of theme component can be accurately estimated even if in the missing fundamental of theme component or in the case where buried by accompaniment, theme Contour extraction is carried out according to setting filter criteria, and then obtain correct theme output.
Description
Technical field:
The invention belongs to acoustic musical signals processing field, especially a kind of pitch that can accurately estimate theme component can
Obtain the polyphony Melody extraction method based on conspicuousness of correct theme output.
Background technique:
Polyphony is the musical form being most widely present, and theme is the soul of music, embodies the sheet of each musical works
Matter difference.Polyphony Melody extraction is an important research content in machine hearing field, it is therefore an objective to be enabled a computer to
As people, in " listening " music, ignore accompaniment tone influence, accurate recognition is spoken the pitch sequence of happy instrumental music melody, into
And the tasks such as the achievable music analysis based on content, retrieval and recommendation, can be used as music score make a copy of, singing search, genre classification,
Turn over the front-end processing for singing the applications such as identification.It is applied to polyphony Melody extraction at present to be broadly divided into based on source separation and significant
The method of property.It is first to isolate theme component from polyphony based on source separation method, then exports theme component again
Pitch sequence, direct limitation of the performance by source separating effect.Melody extraction method based on conspicuousness is with main rotation
The energy conspicuousness and time smoothing of rule are foundation, and building multitone height first indicates, then selection have both energy it is significant and when
Between smooth two aspects feature pitch contour as theme export, basic procedure is successively carried out to polyphony audio signal
Spectrum analysis, pitch indicate, theme Contour extraction and theme export.But since polyphony spectrum structure is complicated, base
Frequency is often buried in the frequency spectrum of bass accompaniment, percussion instrument and the accompaniment of other musical sounds, the existing theme based on conspicuousness
Extracting method is just difficult to accurately estimate the pitch of theme component, theme occurs and exports incorrect problem.
Summary of the invention:
The present invention is to solve above-mentioned technical problem present in the prior art, and main rotation can accurately be estimated by providing one kind
The pitch for restraining component can get the polyphony Melody extraction method based on conspicuousness of correct theme output.
The technical solution of the invention is as follows: a kind of polyphony Melody extraction method based on conspicuousness, is to multiple
Music audio signal is adjusted successively to carry out spectrum analysis, pitch indicates, theme Contour extraction and theme export, it is characterised in that
The step of pitch indicates is as follows: for each sinusoidal frequency p in polyphony mixed signalK, t(k=1,2 ..., 50,
pK, t> 0) and corresponding amplitude mK, t(k=1,2 ..., 50, mK, t> 0), give any pair of spectrum peak frequency pI, tAnd pJ, tAnd
Its corresponding amplitude mI, tAnd mJ, t, the i ≠ j, then
1. enabling x=min { pI, t, pJ, t, y=max { pI, t, pJ, t};
2. calculatingWherein [] takes nearest integer;
3. enabling z=mod (y, x) if r (x, y) >=0.15, max { p is assignedI, t, pJ, t}=z, wherein mod (y, x) is returnedRemainder, then branch to step 1.;R if (x, y) < 0.15, by this to spectrum peak frequency pI, tAnd pJ, tObtain candidate sound
High frequency values areWhereinAnd give fL, tDistribute a corresponding saliency value wL, t=mI, tmJ, t;
4. obtaining candidate pitch frequencies value and corresponding saliency value { fL, t, wL, t};
5. select another pair spectrum peak frequency and its corresponding amplitude and repeat step 1. -4., until having handled all spectrums
Peak;
6. candidate pitch estimates parameter setsWherein FtFor candidate pitch frequencies value set,For phase
Candidate pitch estimated value difference is merged into one less than the candidate pitch of 50 cents by the significant value set answered, forFrequency values are after pitch merges
Wherein Ω=f | | f-fR, t| 50 cent of < }.
The theme Contour extraction is that melody contours generate and melody contours select, and the melody contours generation is will be preceding
Candidate pitch of the frame rate spacing within the scope of 50 cents connects to form a profile afterwards, and length is taken to be greater than or equal to the wheel of 50ms
Profile, in gained contour line, it is assumed that cqWith csIt is two contour lines, such as meets following 3 conditions simultaneously, then this two profiles
Merging becomes a profile:
①|tS, start-tQ, end| < 50ms;
②Semitone;
③
Wherein tS, startRepresent profile csInitial time, tQ, endIt is profile cqEnd time,It is profile cs?
The frequency of initial time,It is profile cqIn the frequency of end time,It is profile cqIn the frequency of initial time
Rate, C are total contour line quantity;The melody contours selection is in the non-overlapping unique melody contours of melody contours regional choice
As theme, in the overlapping melody contours region with one or more, select based on the accumulative maximum melody contours of saliency value
Melody.
Corresponding significance function is defined as two spectral peak amplitude products by the present invention, frequency interval in same frame less than 50
The candidate pitch of cent range merges, and can estimate pitch according to a variety of relatively prime subfrequencies;By adjacent two frame rate interval
Candidate pitch less than 50 cents connects and composes pitch contour, and the pitch contour length less than 50ms tentatively screens out, root
According to setting filter criteria selection theme output.It can in the case where the missing fundamental of theme component or by accompaniment burial
The pitch for accurately estimating theme component carries out theme Contour extraction according to setting filter criteria, and then obtains correct
Theme output.
Detailed description of the invention:
Fig. 1 is polyphony time domain waveform of the embodiment of the present invention.
Fig. 2 is the dominant sinusoid ingredient spectrogram of polyphony of the embodiment of the present invention.
Fig. 3 is the most significant pitch schematic diagram of frame level of polyphony of the embodiment of the present invention.
Fig. 4 is polyphony theme of embodiment of the present invention output schematic diagram.
Fig. 5 is that the embodiment of the present invention carries out pitch estimation accuracy rate schematic diagram under fundamental frequency loss situation.
Specific embodiment:
Polyphony spectrum analysis:
First to polyphony signal framing, every frame length is 46.4ms;
Polyphony can be indicated in time-domain are as follows:
Y (t)=x (t)+n (t)
Y (t) is polyphony in formula, and x (t) is theme component, and n (t) is accompaniment tone;
Time frequency analysis is carried out by Short Time Fourier Transform (STFT) to polyphony audio signal, can be obtained
Y (ω, t)=X (ω, t)+N (ω, t)
After carrying out spectrum analysis to polyphony, then the sinusoidal component wherein contained by spectrum peak search.Spectral peak is searched
Rope is the frequency spectrum to any time t, finds amplitude spectrum | Y (ω, t) | local maximum, corresponding frequency is as sinusoidal frequency
Rate is worth according to a preliminary estimate, then finds out the corresponding accurate sinusoidal frequency corrected value λ (ω, t) of local maximum according to instantaneous frequency method,
Calculation formula is as follows:
Re [] takes real in formula, and Im [] takes the imaginary part of plural number.
Polyphony of embodiment of the present invention time domain waveform, dominant sinusoid ingredient spectrogram difference are as shown in Figure 1 and Figure 2.It is logical
Spectrum analysis is crossed, the preceding maximum sinusoidal component frequency p of 50 amplitudes in polyphony mixed signal is selectedK, t(k=1,
2 ..., 50, pK, t> 0) and corresponding amplitude mK, t(k=1,2 ..., 50, mK, t> 0).
Pitch indicates:
In order to from polyphony mixed signal, accurately estimate pitch, in the present invention proposes that a kind of improved Europe is several
Algorithm is obtained, which can estimate pitch (i.e. fundamental frequency) by each relatively prime frequencies of harmonic components.Method is as follows: given to appoint
It anticipates two spectral peaks, frequency is respectively pI, tAnd pJ, t(i ≠ j) and its corresponding amplitude mI, tAnd mJ, t, then
1. enabling x=min { pI, t, pJ, t, y=max { pI, t, pJ, t};
2. calculatingWherein [] takes nearest integer;
3. enabling z=mod (y, x) if r (x, y) >=0.15, max { p is assignedI, t, pJ, t}=z, wherein mod (y, x) is returnedRemainder, then branch to step 1.;R if (x, y) < 0.15, by this to spectrum peak frequency pI, tAnd pJ, tObtain candidate sound
High frequency values areWhereinAnd give fL, tDistribute a corresponding saliency value wL, t=mI, tmJ, t;
4. obtaining candidate pitch frequencies value and corresponding saliency value { fL, t, wL, t};
5. select another pair spectrum peak frequency and its corresponding amplitude and repeat step 1. -4., until having handled all spectrums
Peak;
6. candidate pitch parameter setWherein FtFor candidate pitch frequencies value set,It is corresponding
Candidate pitch estimated value difference is merged into one less than the candidate pitch of 50 cents by significant value set, for
Frequency values are after pitch merges
Wherein Ω=f | | f-fR, t| 50 cent of < }.
The most significant pitch of the frame level of polyphony of the embodiment of the present invention is as shown in Figure 3.
Theme Contour extraction:
It is to connect candidate pitch of the front and back frame rate spacing within the scope of 50 cents to form a profile, length is taken to be greater than
Or the contour line equal to 50ms, in gained contour line, it is assumed that cqWith csIt is two contour lines, such as meets following 3 items simultaneously
Part, then this two profiles, which merge, becomes a profile:
①|tS, start-tQ, end| < 50ms;
②Semitone;
③
Wherein tS, startRepresent profile csInitial time, tQ, endIt is profile cqEnd time,It is profile cs?
The frequency of initial time,It is profile cqIn the frequency of end time,It is profile cqIn the frequency of initial time
Rate, C are total contour line quantity;The melody contours selection is in the non-overlapping unique melody contours of melody contours regional choice
As theme, in the overlapping melody contours region with one or more, select based on the accumulative maximum melody contours of saliency value
Melody.
It is as shown in Figure 4 that polyphony theme of the embodiment of the present invention exports schematic diagram.
From FIG. 1 to FIG. 4 it can be seen that pitch proposed by the present invention and melody contours tracking can accurately extract master
Melody pitch sequence.
Under fundamental frequency loss situation two relatively prime order harmonic components are carried out with the standard of pitch estimation according to the embodiment of the present invention
True rate is as shown in Figure 5.Horizontal axis is the frequency shift (FS) standard deviation of any one in relatively prime order harmonic components in Fig. 5.Note signal is each
The frequency shift (FS) of order harmonic components does not exceed 20 cents generally, as seen from Figure 5, when losing fundamental frequency, even if any mutual
In the case that the offset of matter subfrequency reaches 20 cents, pitch estimation accuracy rate can also reach 84%, and note signal has
Harmonic component abundant, therefore there are a variety of relatively prime subfrequency combinations, and then pitch estimation method energy proposed by the present invention
Enough guarantee the accuracy rate and reliability of the estimation of theme pitch.
Claims (2)
1. a kind of polyphony Melody extraction method based on conspicuousness, is successively to carry out frequency spectrum to polyphony audio signal
Analysis, pitch indicate, theme Contour extraction and theme export, it is characterised in that the step of pitch indicates is as follows: right
The maximum 50 sinusoidal component frequency p of amplitude in polyphony mixed signalK, t(k=1,2 ..., 50, pK, t> 0) and it is corresponding
Amplitude mK, t(k=1,2 ..., 50, mK, t> 0), give any pair of spectrum peak frequency pI, tAnd pJ, tAnd its corresponding amplitude mI, t
And mJ, t, the i ≠ j, then
1. enabling x=min { pI, t, pJ, t, y=max { pI, t, pJ, t};
2. calculatingWherein [] takes nearest integer;
3. enabling z=mod (y, x) if r (x, y) >=0.15, max { p is assignedI, t, pJ, t}=z, wherein mod (y, x) is returned's
1. remainder then branches to step;R if (x, y) < 0.15, by this to spectrum peak frequency pI, tAnd pJ, tCandidate pitch is obtained,
Its frequency values isWhereinAnd give fL, tDistribute a corresponding saliency value wL, t=mI, tmJ, t;
4. obtaining candidate pitch frequencies value and corresponding saliency value { fL, t, wL, t};
5. select another pair spectrum peak frequency and its corresponding amplitude and repeat step 1. -4., until having handled all spectral peaks;
6. candidate pitch estimates parameter setsWherein FtFor candidate pitch frequencies value set,It is corresponding
Candidate pitch estimated value difference is merged into one less than the candidate pitch of 50 cents by significant value set, for
Frequency values are after pitch merges
Wherein Ω=f | | f-fR, t| 50 cent of < }.
2. the polyphony Melody extraction method according to claim 1 based on conspicuousness, it is characterised in that the master
Melody contours are tracked as melody contours generation and melody contours selection, and the melody contours generation is that before and after frames frequency interval exists
Candidate pitch within the scope of 50 cents connects to form a profile, takes length to be greater than or equal to the contour line of 50ms, in gained wheel
In profile, it is assumed that cqWith csIt is two contour lines, such as meets following 3 conditions simultaneously, then this two profiles, which merge, becomes a wheel
It is wide:
①|tS, start-tQ, end| < 50ms;
②Semitone;
③
Wherein tS, startRepresent profile csInitial time, tQ, endIt is profile cqEnd time,It is profile csIt is originating
The frequency at moment,It is profile cqIn the frequency of end time,It is profile cqIn the frequency of initial time, C is
Total contour line quantity;The melody contours select to be in the non-overlapping unique melody contours of melody contours regional choice as master
Melody, in the overlapping melody contours region with one or more, selecting the accumulative maximum melody contours of saliency value is theme.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610299427.5A CN105957538B (en) | 2016-05-09 | 2016-05-09 | Polyphony Melody extraction method based on conspicuousness |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610299427.5A CN105957538B (en) | 2016-05-09 | 2016-05-09 | Polyphony Melody extraction method based on conspicuousness |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105957538A CN105957538A (en) | 2016-09-21 |
CN105957538B true CN105957538B (en) | 2019-06-11 |
Family
ID=56914127
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610299427.5A Active CN105957538B (en) | 2016-05-09 | 2016-05-09 | Polyphony Melody extraction method based on conspicuousness |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105957538B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107103908B (en) * | 2017-05-02 | 2019-12-24 | 大连民族大学 | Polyphonic music polyphonic pitch height estimation method and application of pseudo bispectrum in polyphonic pitch estimation |
CN107122332B (en) * | 2017-05-02 | 2020-08-21 | 大连民族大学 | One-dimensional signal two-dimensional spectrum transformation method, pseudo bispectrum and application thereof |
CN108536871B (en) * | 2018-04-27 | 2022-03-04 | 大连民族大学 | Music main melody extraction method and device based on particle filtering and limited dynamic programming search range |
CN111326164B (en) * | 2020-01-21 | 2023-03-21 | 大连海事大学 | Semi-supervised music theme extraction method |
CN111223491B (en) * | 2020-01-22 | 2022-11-15 | 深圳市倍轻松科技股份有限公司 | Method, device and terminal equipment for extracting music signal main melody |
CN115527514B (en) * | 2022-09-30 | 2023-11-21 | 恩平市奥科电子科技有限公司 | Professional vocal melody feature extraction method for music big data retrieval |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH03276197A (en) * | 1990-03-27 | 1991-12-06 | Nitsuko Corp | Melody recognizing device and melody information extracting device to be used for the same |
JP2001265330A (en) * | 2000-03-21 | 2001-09-28 | Alpine Electronics Inc | Device and method for extracting melody |
CN104616664A (en) * | 2015-02-02 | 2015-05-13 | 合肥工业大学 | Method for recognizing audio based on spectrogram significance test |
-
2016
- 2016-05-09 CN CN201610299427.5A patent/CN105957538B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH03276197A (en) * | 1990-03-27 | 1991-12-06 | Nitsuko Corp | Melody recognizing device and melody information extracting device to be used for the same |
JP2001265330A (en) * | 2000-03-21 | 2001-09-28 | Alpine Electronics Inc | Device and method for extracting melody |
CN104616664A (en) * | 2015-02-02 | 2015-05-13 | 合肥工业大学 | Method for recognizing audio based on spectrogram significance test |
Non-Patent Citations (2)
Title |
---|
Melody extraction from polyphonic music signals;Justin Salamon等;《IEEE SIGNAL PROCESSING MAGAZING》;20140212;第118-134页 |
Vocal melody extraction in the presence of pitched accompaniment in polyphonic music;Vishweshwara Rao 等;《IEEE TRANSACTIONS ON AUDIO,SPEECH,AND LANGUAGE PROCESSING》;20101130;第18卷(第8期);第2145-2154页 |
Also Published As
Publication number | Publication date |
---|---|
CN105957538A (en) | 2016-09-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105957538B (en) | Polyphony Melody extraction method based on conspicuousness | |
Gkiokas et al. | Music tempo estimation and beat tracking by applying source separation and metrical relations | |
Sridhar et al. | Raga identification of carnatic music for music information retrieval | |
Holzapfel et al. | Three dimensions of pitched instrument onset detection | |
JP6784362B2 (en) | Song melody information processing method, server, and storage medium | |
US8193436B2 (en) | Segmenting a humming signal into musical notes | |
CN104978962A (en) | Query by humming method and system | |
CN102903357A (en) | Method, device and system for extracting chorus of song | |
Zhou et al. | Music onset detection based on resonator time frequency image | |
WO2009125489A1 (en) | Tempo detection device and tempo detection program | |
CN110139206A (en) | A kind of processing method and system of stereo audio | |
Holzapfel et al. | Beat tracking using group delay based onset detection | |
Rajan et al. | Group delay based melody monopitch extraction from music | |
Elowsson et al. | Modelling perception of speed in music audio | |
Kraft et al. | Polyphonic pitch detection by matching spectral and autocorrelation peaks | |
Salamon et al. | Melody, bass line, and harmony representations for music version identification | |
CN101763848B (en) | Synchronization method for audio content identification | |
JP2012181475A (en) | Method for extracting feature of acoustic signal and method for processing acoustic signal using the feature | |
Dittmar et al. | Novel mid-level audio features for music similarity | |
Reddy et al. | Predominant melody extraction from vocal polyphonic music signal by combined spectro-temporal method | |
Wan et al. | Automatic piano music transcription using audio‐visual features | |
CN114093388A (en) | Note cutting method, cutting system and video-song evaluation method | |
Salamon et al. | A chroma-based salience function for melody and bass line estimation from music audio signals | |
Davies et al. | Comparing mid-level representations for audio based beat tracking | |
Reddy et al. | Enhanced Harmonic Content and Vocal Note Based Predominant Melody Extraction from Vocal Polyphonic Music Signals. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20190618 Address after: Room 507, 5th floor, No. 3 Gaoxin Street, Dalian Hi-tech Industrial Park, Liaoning Province Patentee after: Dalian Sailing Technology Co., Ltd. Address before: 116600 No. 18 Liaohe West Road, Dalian Economic and Technological Development Zone, Liaoning Province Patentee before: Dalian ethnic university |
|
TR01 | Transfer of patent right |