CN105957538B - Polyphony Melody extraction method based on conspicuousness - Google Patents

Polyphony Melody extraction method based on conspicuousness Download PDF

Info

Publication number
CN105957538B
CN105957538B CN201610299427.5A CN201610299427A CN105957538B CN 105957538 B CN105957538 B CN 105957538B CN 201610299427 A CN201610299427 A CN 201610299427A CN 105957538 B CN105957538 B CN 105957538B
Authority
CN
China
Prior art keywords
pitch
frequency
melody
theme
polyphony
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610299427.5A
Other languages
Chinese (zh)
Other versions
CN105957538A (en
Inventor
张维维
陈喆
殷福亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Sailing Technology Co Ltd
Original Assignee
Dalian Nationalities University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Nationalities University filed Critical Dalian Nationalities University
Priority to CN201610299427.5A priority Critical patent/CN105957538B/en
Publication of CN105957538A publication Critical patent/CN105957538A/en
Application granted granted Critical
Publication of CN105957538B publication Critical patent/CN105957538B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals
    • G10L2025/906Pitch tracking

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The present invention discloses a kind of polyphony Melody extraction method based on conspicuousness, it is that corresponding significance function is defined as to two spectral peak amplitude products, candidate pitch frequency spacing in same frame less than 50 cent ranges merges, and can go out pitch according to a variety of relatively prime subfrequency combinational estimations;Adjacent two frame rates spacing is connected and composed into pitch contour less than the candidate pitch of 50 cents, and the pitch contour length less than 50ms tentatively screens out, according to setting filter criteria selection theme output.The pitch of theme component can be accurately estimated even if in the missing fundamental of theme component or in the case where buried by accompaniment, theme Contour extraction is carried out according to setting filter criteria, and then obtain correct theme output.

Description

Polyphony Melody extraction method based on conspicuousness
Technical field:
The invention belongs to acoustic musical signals processing field, especially a kind of pitch that can accurately estimate theme component can Obtain the polyphony Melody extraction method based on conspicuousness of correct theme output.
Background technique:
Polyphony is the musical form being most widely present, and theme is the soul of music, embodies the sheet of each musical works Matter difference.Polyphony Melody extraction is an important research content in machine hearing field, it is therefore an objective to be enabled a computer to As people, in " listening " music, ignore accompaniment tone influence, accurate recognition is spoken the pitch sequence of happy instrumental music melody, into And the tasks such as the achievable music analysis based on content, retrieval and recommendation, can be used as music score make a copy of, singing search, genre classification, Turn over the front-end processing for singing the applications such as identification.It is applied to polyphony Melody extraction at present to be broadly divided into based on source separation and significant The method of property.It is first to isolate theme component from polyphony based on source separation method, then exports theme component again Pitch sequence, direct limitation of the performance by source separating effect.Melody extraction method based on conspicuousness is with main rotation The energy conspicuousness and time smoothing of rule are foundation, and building multitone height first indicates, then selection have both energy it is significant and when Between smooth two aspects feature pitch contour as theme export, basic procedure is successively carried out to polyphony audio signal Spectrum analysis, pitch indicate, theme Contour extraction and theme export.But since polyphony spectrum structure is complicated, base Frequency is often buried in the frequency spectrum of bass accompaniment, percussion instrument and the accompaniment of other musical sounds, the existing theme based on conspicuousness Extracting method is just difficult to accurately estimate the pitch of theme component, theme occurs and exports incorrect problem.
Summary of the invention:
The present invention is to solve above-mentioned technical problem present in the prior art, and main rotation can accurately be estimated by providing one kind The pitch for restraining component can get the polyphony Melody extraction method based on conspicuousness of correct theme output.
The technical solution of the invention is as follows: a kind of polyphony Melody extraction method based on conspicuousness, is to multiple Music audio signal is adjusted successively to carry out spectrum analysis, pitch indicates, theme Contour extraction and theme export, it is characterised in that The step of pitch indicates is as follows: for each sinusoidal frequency p in polyphony mixed signalK, t(k=1,2 ..., 50, pK, t> 0) and corresponding amplitude mK, t(k=1,2 ..., 50, mK, t> 0), give any pair of spectrum peak frequency pI, tAnd pJ, tAnd Its corresponding amplitude mI, tAnd mJ, t, the i ≠ j, then
1. enabling x=min { pI, t, pJ, t, y=max { pI, t, pJ, t};
2. calculatingWherein [] takes nearest integer;
3. enabling z=mod (y, x) if r (x, y) >=0.15, max { p is assignedI, t, pJ, t}=z, wherein mod (y, x) is returnedRemainder, then branch to step 1.;R if (x, y) < 0.15, by this to spectrum peak frequency pI, tAnd pJ, tObtain candidate sound High frequency values areWhereinAnd give fL, tDistribute a corresponding saliency value wL, t=mI, tmJ, t
4. obtaining candidate pitch frequencies value and corresponding saliency value { fL, t, wL, t};
5. select another pair spectrum peak frequency and its corresponding amplitude and repeat step 1. -4., until having handled all spectrums Peak;
6. candidate pitch estimates parameter setsWherein FtFor candidate pitch frequencies value set,For phase Candidate pitch estimated value difference is merged into one less than the candidate pitch of 50 cents by the significant value set answered, forFrequency values are after pitch merges
Wherein Ω=f | | f-fR, t| 50 cent of < }.
The theme Contour extraction is that melody contours generate and melody contours select, and the melody contours generation is will be preceding Candidate pitch of the frame rate spacing within the scope of 50 cents connects to form a profile afterwards, and length is taken to be greater than or equal to the wheel of 50ms Profile, in gained contour line, it is assumed that cqWith csIt is two contour lines, such as meets following 3 conditions simultaneously, then this two profiles Merging becomes a profile:
①|tS, start-tQ, end| < 50ms;
Semitone;
Wherein tS, startRepresent profile csInitial time, tQ, endIt is profile cqEnd time,It is profile cs? The frequency of initial time,It is profile cqIn the frequency of end time,It is profile cqIn the frequency of initial time Rate, C are total contour line quantity;The melody contours selection is in the non-overlapping unique melody contours of melody contours regional choice As theme, in the overlapping melody contours region with one or more, select based on the accumulative maximum melody contours of saliency value Melody.
Corresponding significance function is defined as two spectral peak amplitude products by the present invention, frequency interval in same frame less than 50 The candidate pitch of cent range merges, and can estimate pitch according to a variety of relatively prime subfrequencies;By adjacent two frame rate interval Candidate pitch less than 50 cents connects and composes pitch contour, and the pitch contour length less than 50ms tentatively screens out, root According to setting filter criteria selection theme output.It can in the case where the missing fundamental of theme component or by accompaniment burial The pitch for accurately estimating theme component carries out theme Contour extraction according to setting filter criteria, and then obtains correct Theme output.
Detailed description of the invention:
Fig. 1 is polyphony time domain waveform of the embodiment of the present invention.
Fig. 2 is the dominant sinusoid ingredient spectrogram of polyphony of the embodiment of the present invention.
Fig. 3 is the most significant pitch schematic diagram of frame level of polyphony of the embodiment of the present invention.
Fig. 4 is polyphony theme of embodiment of the present invention output schematic diagram.
Fig. 5 is that the embodiment of the present invention carries out pitch estimation accuracy rate schematic diagram under fundamental frequency loss situation.
Specific embodiment:
Polyphony spectrum analysis:
First to polyphony signal framing, every frame length is 46.4ms;
Polyphony can be indicated in time-domain are as follows:
Y (t)=x (t)+n (t)
Y (t) is polyphony in formula, and x (t) is theme component, and n (t) is accompaniment tone;
Time frequency analysis is carried out by Short Time Fourier Transform (STFT) to polyphony audio signal, can be obtained
Y (ω, t)=X (ω, t)+N (ω, t)
After carrying out spectrum analysis to polyphony, then the sinusoidal component wherein contained by spectrum peak search.Spectral peak is searched Rope is the frequency spectrum to any time t, finds amplitude spectrum | Y (ω, t) | local maximum, corresponding frequency is as sinusoidal frequency Rate is worth according to a preliminary estimate, then finds out the corresponding accurate sinusoidal frequency corrected value λ (ω, t) of local maximum according to instantaneous frequency method, Calculation formula is as follows:
Re [] takes real in formula, and Im [] takes the imaginary part of plural number.
Polyphony of embodiment of the present invention time domain waveform, dominant sinusoid ingredient spectrogram difference are as shown in Figure 1 and Figure 2.It is logical Spectrum analysis is crossed, the preceding maximum sinusoidal component frequency p of 50 amplitudes in polyphony mixed signal is selectedK, t(k=1, 2 ..., 50, pK, t> 0) and corresponding amplitude mK, t(k=1,2 ..., 50, mK, t> 0).
Pitch indicates:
In order to from polyphony mixed signal, accurately estimate pitch, in the present invention proposes that a kind of improved Europe is several Algorithm is obtained, which can estimate pitch (i.e. fundamental frequency) by each relatively prime frequencies of harmonic components.Method is as follows: given to appoint It anticipates two spectral peaks, frequency is respectively pI, tAnd pJ, t(i ≠ j) and its corresponding amplitude mI, tAnd mJ, t, then
1. enabling x=min { pI, t, pJ, t, y=max { pI, t, pJ, t};
2. calculatingWherein [] takes nearest integer;
3. enabling z=mod (y, x) if r (x, y) >=0.15, max { p is assignedI, t, pJ, t}=z, wherein mod (y, x) is returnedRemainder, then branch to step 1.;R if (x, y) < 0.15, by this to spectrum peak frequency pI, tAnd pJ, tObtain candidate sound High frequency values areWhereinAnd give fL, tDistribute a corresponding saliency value wL, t=mI, tmJ, t
4. obtaining candidate pitch frequencies value and corresponding saliency value { fL, t, wL, t};
5. select another pair spectrum peak frequency and its corresponding amplitude and repeat step 1. -4., until having handled all spectrums Peak;
6. candidate pitch parameter setWherein FtFor candidate pitch frequencies value set,It is corresponding Candidate pitch estimated value difference is merged into one less than the candidate pitch of 50 cents by significant value set, for Frequency values are after pitch merges
Wherein Ω=f | | f-fR, t| 50 cent of < }.
The most significant pitch of the frame level of polyphony of the embodiment of the present invention is as shown in Figure 3.
Theme Contour extraction:
It is to connect candidate pitch of the front and back frame rate spacing within the scope of 50 cents to form a profile, length is taken to be greater than Or the contour line equal to 50ms, in gained contour line, it is assumed that cqWith csIt is two contour lines, such as meets following 3 items simultaneously Part, then this two profiles, which merge, becomes a profile:
①|tS, start-tQ, end| < 50ms;
Semitone;
Wherein tS, startRepresent profile csInitial time, tQ, endIt is profile cqEnd time,It is profile cs? The frequency of initial time,It is profile cqIn the frequency of end time,It is profile cqIn the frequency of initial time Rate, C are total contour line quantity;The melody contours selection is in the non-overlapping unique melody contours of melody contours regional choice As theme, in the overlapping melody contours region with one or more, select based on the accumulative maximum melody contours of saliency value Melody.
It is as shown in Figure 4 that polyphony theme of the embodiment of the present invention exports schematic diagram.
From FIG. 1 to FIG. 4 it can be seen that pitch proposed by the present invention and melody contours tracking can accurately extract master Melody pitch sequence.
Under fundamental frequency loss situation two relatively prime order harmonic components are carried out with the standard of pitch estimation according to the embodiment of the present invention True rate is as shown in Figure 5.Horizontal axis is the frequency shift (FS) standard deviation of any one in relatively prime order harmonic components in Fig. 5.Note signal is each The frequency shift (FS) of order harmonic components does not exceed 20 cents generally, as seen from Figure 5, when losing fundamental frequency, even if any mutual In the case that the offset of matter subfrequency reaches 20 cents, pitch estimation accuracy rate can also reach 84%, and note signal has Harmonic component abundant, therefore there are a variety of relatively prime subfrequency combinations, and then pitch estimation method energy proposed by the present invention Enough guarantee the accuracy rate and reliability of the estimation of theme pitch.

Claims (2)

1. a kind of polyphony Melody extraction method based on conspicuousness, is successively to carry out frequency spectrum to polyphony audio signal Analysis, pitch indicate, theme Contour extraction and theme export, it is characterised in that the step of pitch indicates is as follows: right The maximum 50 sinusoidal component frequency p of amplitude in polyphony mixed signalK, t(k=1,2 ..., 50, pK, t> 0) and it is corresponding Amplitude mK, t(k=1,2 ..., 50, mK, t> 0), give any pair of spectrum peak frequency pI, tAnd pJ, tAnd its corresponding amplitude mI, t And mJ, t, the i ≠ j, then
1. enabling x=min { pI, t, pJ, t, y=max { pI, t, pJ, t};
2. calculatingWherein [] takes nearest integer;
3. enabling z=mod (y, x) if r (x, y) >=0.15, max { p is assignedI, t, pJ, t}=z, wherein mod (y, x) is returned's 1. remainder then branches to step;R if (x, y) < 0.15, by this to spectrum peak frequency pI, tAnd pJ, tCandidate pitch is obtained, Its frequency values isWhereinAnd give fL, tDistribute a corresponding saliency value wL, t=mI, tmJ, t
4. obtaining candidate pitch frequencies value and corresponding saliency value { fL, t, wL, t};
5. select another pair spectrum peak frequency and its corresponding amplitude and repeat step 1. -4., until having handled all spectral peaks;
6. candidate pitch estimates parameter setsWherein FtFor candidate pitch frequencies value set,It is corresponding Candidate pitch estimated value difference is merged into one less than the candidate pitch of 50 cents by significant value set, for Frequency values are after pitch merges
Wherein Ω=f | | f-fR, t| 50 cent of < }.
2. the polyphony Melody extraction method according to claim 1 based on conspicuousness, it is characterised in that the master Melody contours are tracked as melody contours generation and melody contours selection, and the melody contours generation is that before and after frames frequency interval exists Candidate pitch within the scope of 50 cents connects to form a profile, takes length to be greater than or equal to the contour line of 50ms, in gained wheel In profile, it is assumed that cqWith csIt is two contour lines, such as meets following 3 conditions simultaneously, then this two profiles, which merge, becomes a wheel It is wide:
①|tS, start-tQ, end| < 50ms;
Semitone;
Wherein tS, startRepresent profile csInitial time, tQ, endIt is profile cqEnd time,It is profile csIt is originating The frequency at moment,It is profile cqIn the frequency of end time,It is profile cqIn the frequency of initial time, C is Total contour line quantity;The melody contours select to be in the non-overlapping unique melody contours of melody contours regional choice as master Melody, in the overlapping melody contours region with one or more, selecting the accumulative maximum melody contours of saliency value is theme.
CN201610299427.5A 2016-05-09 2016-05-09 Polyphony Melody extraction method based on conspicuousness Active CN105957538B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610299427.5A CN105957538B (en) 2016-05-09 2016-05-09 Polyphony Melody extraction method based on conspicuousness

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610299427.5A CN105957538B (en) 2016-05-09 2016-05-09 Polyphony Melody extraction method based on conspicuousness

Publications (2)

Publication Number Publication Date
CN105957538A CN105957538A (en) 2016-09-21
CN105957538B true CN105957538B (en) 2019-06-11

Family

ID=56914127

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610299427.5A Active CN105957538B (en) 2016-05-09 2016-05-09 Polyphony Melody extraction method based on conspicuousness

Country Status (1)

Country Link
CN (1) CN105957538B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107103908B (en) * 2017-05-02 2019-12-24 大连民族大学 Polyphonic music polyphonic pitch height estimation method and application of pseudo bispectrum in polyphonic pitch estimation
CN107122332B (en) * 2017-05-02 2020-08-21 大连民族大学 One-dimensional signal two-dimensional spectrum transformation method, pseudo bispectrum and application thereof
CN108536871B (en) * 2018-04-27 2022-03-04 大连民族大学 Music main melody extraction method and device based on particle filtering and limited dynamic programming search range
CN111326164B (en) * 2020-01-21 2023-03-21 大连海事大学 Semi-supervised music theme extraction method
CN111223491B (en) * 2020-01-22 2022-11-15 深圳市倍轻松科技股份有限公司 Method, device and terminal equipment for extracting music signal main melody
CN115527514B (en) * 2022-09-30 2023-11-21 恩平市奥科电子科技有限公司 Professional vocal melody feature extraction method for music big data retrieval

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03276197A (en) * 1990-03-27 1991-12-06 Nitsuko Corp Melody recognizing device and melody information extracting device to be used for the same
JP2001265330A (en) * 2000-03-21 2001-09-28 Alpine Electronics Inc Device and method for extracting melody
CN104616664A (en) * 2015-02-02 2015-05-13 合肥工业大学 Method for recognizing audio based on spectrogram significance test

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03276197A (en) * 1990-03-27 1991-12-06 Nitsuko Corp Melody recognizing device and melody information extracting device to be used for the same
JP2001265330A (en) * 2000-03-21 2001-09-28 Alpine Electronics Inc Device and method for extracting melody
CN104616664A (en) * 2015-02-02 2015-05-13 合肥工业大学 Method for recognizing audio based on spectrogram significance test

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Melody extraction from polyphonic music signals;Justin Salamon等;《IEEE SIGNAL PROCESSING MAGAZING》;20140212;第118-134页
Vocal melody extraction in the presence of pitched accompaniment in polyphonic music;Vishweshwara Rao 等;《IEEE TRANSACTIONS ON AUDIO,SPEECH,AND LANGUAGE PROCESSING》;20101130;第18卷(第8期);第2145-2154页

Also Published As

Publication number Publication date
CN105957538A (en) 2016-09-21

Similar Documents

Publication Publication Date Title
CN105957538B (en) Polyphony Melody extraction method based on conspicuousness
Gkiokas et al. Music tempo estimation and beat tracking by applying source separation and metrical relations
Sridhar et al. Raga identification of carnatic music for music information retrieval
Holzapfel et al. Three dimensions of pitched instrument onset detection
JP6784362B2 (en) Song melody information processing method, server, and storage medium
US8193436B2 (en) Segmenting a humming signal into musical notes
CN104978962A (en) Query by humming method and system
CN102903357A (en) Method, device and system for extracting chorus of song
Zhou et al. Music onset detection based on resonator time frequency image
WO2009125489A1 (en) Tempo detection device and tempo detection program
CN110139206A (en) A kind of processing method and system of stereo audio
Holzapfel et al. Beat tracking using group delay based onset detection
Rajan et al. Group delay based melody monopitch extraction from music
Elowsson et al. Modelling perception of speed in music audio
Kraft et al. Polyphonic pitch detection by matching spectral and autocorrelation peaks
Salamon et al. Melody, bass line, and harmony representations for music version identification
CN101763848B (en) Synchronization method for audio content identification
JP2012181475A (en) Method for extracting feature of acoustic signal and method for processing acoustic signal using the feature
Dittmar et al. Novel mid-level audio features for music similarity
Reddy et al. Predominant melody extraction from vocal polyphonic music signal by combined spectro-temporal method
Wan et al. Automatic piano music transcription using audio‐visual features
CN114093388A (en) Note cutting method, cutting system and video-song evaluation method
Salamon et al. A chroma-based salience function for melody and bass line estimation from music audio signals
Davies et al. Comparing mid-level representations for audio based beat tracking
Reddy et al. Enhanced Harmonic Content and Vocal Note Based Predominant Melody Extraction from Vocal Polyphonic Music Signals.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20190618

Address after: Room 507, 5th floor, No. 3 Gaoxin Street, Dalian Hi-tech Industrial Park, Liaoning Province

Patentee after: Dalian Sailing Technology Co., Ltd.

Address before: 116600 No. 18 Liaohe West Road, Dalian Economic and Technological Development Zone, Liaoning Province

Patentee before: Dalian ethnic university

TR01 Transfer of patent right