CN108172210B - Singing harmony generation method based on singing voice rhythm - Google Patents

Singing harmony generation method based on singing voice rhythm Download PDF

Info

Publication number
CN108172210B
CN108172210B CN201810101219.9A CN201810101219A CN108172210B CN 108172210 B CN108172210 B CN 108172210B CN 201810101219 A CN201810101219 A CN 201810101219A CN 108172210 B CN108172210 B CN 108172210B
Authority
CN
China
Prior art keywords
singing
bpm
singing voice
harmony
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810101219.9A
Other languages
Chinese (zh)
Other versions
CN108172210A (en
Inventor
张栋
彭建云
汪培侨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201810101219.9A priority Critical patent/CN108172210B/en
Publication of CN108172210A publication Critical patent/CN108172210A/en
Application granted granted Critical
Publication of CN108172210B publication Critical patent/CN108172210B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H7/00Instruments in which the tones are synthesised from a data store, e.g. computer organs
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/315Sound category-dependent sound synthesis processes [Gensound] for musical use; Sound category-specific synthesis-controlling parameters or control means therefor
    • G10H2250/455Gensound singing voices, i.e. generation of human voices for musical applications, vocal singing sounds or intelligible words at a desired pitch or with desired vocal effects, e.g. by phoneme synthesis

Abstract

The invention relates to a singing harmony generation method based on singing voice rhythm. From the application of singing and voice, the rhythm detection is carried out on the singing voice of the singer based on the frequency spectrum flux, the delay quantity of the harmony voice part is adaptively adjusted according to the singing voice rhythm to generate harmony, the beat extraction process can be simplified, the time complexity is reduced, and the music expression form of the singer is enriched. The singing harmony generation method based on the singing rhythm is simple, flexible to implement and high in practicability.

Description

Singing harmony generation method based on singing voice rhythm
Technical Field
The invention relates to the field of singing voice synthesis, in particular to a singing harmony generating method based on singing voice rhythm.
Background
Singing voice is a more complex audio signal and artistic expression form, and has important significance for analysis and research. With the popularization of music entertainment, sound effect processing for music voice becomes a hotspot for research and application, and is receiving wide attention from academic and industrial fields. Although sound effect processing technology for karaoke application is relatively mature, it is difficult for users to match harmony sound for their singing due to the limitations of their vocal and singing abilities. Therefore, it is of great practical value to study how to generate harmony based on the vocal characteristics of singers and how to generate adaptive harmony based on the rhythm of singing voice.
Disclosure of Invention
The invention aims to provide a singing harmony generating method based on singing voice rhythm, which can generate harmony in a self-adaptive manner according to the speed of a beat so as to enrich the music expression form of a singer.
In order to achieve the purpose, the technical scheme of the invention is as follows: a singing harmony generation method based on singing sound rhythm is characterized by comprising the following steps:
step S1: preprocessing the input singing voice audio signal, wherein the preprocessing mode comprises the following steps: filtering, pre-emphasis and normalization;
step S2: framing the preprocessed singing voice audio x (n), and calculating the log spectrum of each frame
Figure BDA0001566182790000011
Step S3: from a sequence of log spectra
Figure BDA0001566182790000012
Calculating the spectral flux SF (n) of the singing voice signal, taking the spectral flux SF (n) as an endpoint intensity curve F (t) after low-pass filtering and smoothing, and then calculating an autocorrelation sequence TG (tau) of the endpoint intensity curve, wherein the tau of the TG (tau) with the maximum value is a beat period, so that a BPM characteristic value can be calculated;
step S4: calculating the average BPM characteristic value of the whole input singing voice signal, recording the average BPM characteristic value as BPM, and calculating the delay amount of the sum voice part by the BPM;
step S5: copying a part of the preprocessed singing voice audio x (n) and increasing the pitch of the preprocessed singing voice audio x (n) to a third degree pitch, and then generating a harmony voice part h (n) through a time delay;
step S6: and (d) superposing the original sound part x (n) and the harmony sound part h (n) in a linear proportion to output y (n), namely the generated singing harmony.
In one embodiment of the present invention, in the step S2, the log spectrum of each frame
Figure BDA0001566182790000013
The calculation is realized according to the following steps:
step S21: dividing the singing voice audio into frames according to the frame length K and the frame shift hop of each frame to obtain xi(n);
Step S22: for xi(n) carrying out short-time Fourier transform to obtain frequency domain signal Xi(k);
Step S23: according to the formula
Figure BDA0001566182790000021
Obtaining a log spectrum sequence
Figure BDA0001566182790000022
In an embodiment of the present invention, the frame length K is a sampling number within 10ms to 30ms, where K is a time length of each frame and a sampling frequency; and the frame shift hop is the non-overlapped part of two adjacent frames, and hop is K/3.
In an embodiment of the present invention, in the step S3, the spectral flux sf (n) is:
Figure BDA0001566182790000023
wherein n is the frame number, K is the frame length, and H (x) is the half-wave rectification function;
the autocorrelation sequence TG (τ) is:
TG(τ)=W(τ)∑F(t)F(t-τ);
wherein W (τ) is a Gaussian weighting function;
the BPM characteristic value is as follows:
BPM=60*fs/hop*τmax
wherein fs is the sampling rate, hop is the frame shift, τmaxIs the beat period.
In an embodiment of the present invention, in the step S4, the specific implementation steps are as follows:
step S41: calculating and extracting the BPM characteristic value of a singing voice signal every 2 seconds, wherein the average value of the BPM characteristic value sequence of the whole time signal is the average value
Figure BDA0001566182790000024
And is marked as BPM;
step S42: setting a delayed beat number D according to the formula
Figure BDA0001566182790000025
Calculating the delay amount delay。
In an embodiment of the present invention, in the step S5, the pitch-up method adopts a pitch conversion method of stabilizing timbre.
In one embodiment of the present invention, in the step S5, the three-degree interval is an incompletely harmonized three-degree interval, i.e., the pitch is 2^ (3/12) or 2^ (4/12) times the original pitch.
In an embodiment of the present invention, in the step S6, the linear scale superposition formula is:
y(n)=x(n)+k*h(n);
in the above formula, k is a dry-wet ratio, and a more preferable effect can be obtained when k is 0.8.
Compared with the prior art, the invention has the following beneficial effects: the invention provides a singing harmony generating method based on singing rhythm, which can simplify the beat extraction process and reduce the time complexity from the application of singing harmony, can generate harmony in a self-adaptive manner according to the speed of beats, and can enrich the music expression form of a singer. The method is simple, flexible to realize and high in practicability.
Drawings
Fig. 1 is a flow chart of a singing harmony generation method based on singing voice rhythm in the present invention.
Detailed Description
The technical scheme of the invention is specifically explained below with reference to the accompanying drawings.
The invention provides a singing harmony generation method based on singing sound rhythm, which is mainly divided into three stages as shown in figure 1: in a rhythm detection stage, a flux filtering method is provided aiming at singing voice and human ear auditory characteristics, an endpoint intensity curve is obtained by adopting spectral flux calculation, and a BPM characteristic value is further extracted; in the harmony part generation stage, a singing harmony generation algorithm is proposed for singing harmony, the harmony part delay amount is dynamically calculated according to the BPM characteristic value, and a harmony part with the same person is generated by adopting a pitch conversion algorithm with stable tone color; in the superposition synthesis stage, singing and voice are superposed and output by adopting a linear proportion according to the delay amount and the dry-wet ratio. The method comprises the following specific steps:
step S1: calculate the songLogarithmic spectrum of audio signal: firstly, the whole song audio signal is preprocessed by filtering, pre-emphasis, normalization and the like. Then dividing the obtained speech signal into small-segment speech frames according to the frame K and the frame hop to obtain xiAnd (n), wherein K is the time length of each frame and the sampling frequency, and hop is K/3. For each frame, the following processing is performed: x is to bei(n) short-time Fourier transform to Xi(k)=STFT(xi(n)), then according to the formula
Figure BDA0001566182790000031
The obtained log spectrum sequence
Figure BDA0001566182790000032
Step S2: calculating BPM characteristic value: from a sequence of logarithmic spectra
Figure BDA0001566182790000033
Calculating the spectral flux SF (n) of the singing voice signal, and then taking the signal as an endpoint intensity curve F (t) after low-pass filtering and smoothing; calculating an autocorrelation sequence TG (tau) of the endpoint intensity curve, weighting the autocorrelation sequence by adopting a Gaussian window function, and enabling the tau with the maximum value of the TG (tau) to be a beat period, wherein BPM (60 fs/hop tau) is obtained according to a formulamaxAnd calculating to obtain the BPM characteristic value.
Step S3: calculating the average beat: calculating and extracting the BPM characteristic value of a singing voice signal every 2 seconds, wherein the average value of the BPM characteristic value sequence of the whole time signal is the average value
Figure BDA0001566182790000034
Step S4: calculating a delay amount: if it is
Figure BDA0001566182790000035
According to the formula
Figure BDA0001566182790000036
And calculating the delay amount of the sum sound part, otherwise, indicating that the BPM characteristic value exceeds the processing range and not processing.
Step S5: generating a harmony sound part: the original signal is copied and the pitch is promoted to be incompletely harmonious three-degree pitch by adopting a pitch conversion method for stabilizing tone color, namely the pitch is 2^ (3/12) or 2^ (4/12) times of the original pitch, and the harmonic sound part signal h (n) delayed by delay relative to the main sound part is obtained through a delayer.
Step S6: linear proportional superposition: and (d) linearly superposing the original vocal part x (n) and the harmony vocal part h (n) according to a formula y (n) (+ k) ((n), and outputting y (n)) which is the generated singing harmony. In the above formula, k is a dry-wet ratio, and a more preferable effect can be obtained when k is 0.8.
The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.

Claims (8)

1. A singing harmony generation method based on singing sound rhythm is characterized by comprising the following steps:
step S1: preprocessing the input singing voice audio signal, wherein the preprocessing mode comprises the following steps: filtering, pre-emphasis and normalization;
step S2: framing the preprocessed singing voice audio x (n), and calculating the log spectrum of each frame
Figure FDA0002859281260000011
Step S3: from a sequence of log spectra
Figure FDA0002859281260000012
Calculating the spectral flux SF (n) of the singing voice signal, taking the spectral flux SF (n) as an endpoint intensity curve F (t) after low-pass filtering and smoothing, and then calculating an autocorrelation sequence TG (tau) of the endpoint intensity curve, wherein the tau of the TG (tau) with the maximum value is a beat period, so that a BPM characteristic value can be calculated; the BPM characteristic value is as follows:
BPM=60*fs/hop*τmax
wherein fs is the sampling rate, hop is the frame shift, τmaxIs the beat period;
step S4: calculating the average BPM characteristic value of the whole input singing voice signal, recording the average BPM characteristic value as BPM, and calculating the delay amount of the sum voice part by the BPM;
step S5: copying a part of the preprocessed singing voice audio x (n) and increasing the pitch of the preprocessed singing voice audio x (n) to a third degree, and then generating a harmony voice part h (n) delayed by delay relative to the singing voice audio x (n) through a delay device;
step S6: and (d) superposing the original sound part x (n) and the harmony sound part h (n) in a linear proportion to output y (n), namely the generated singing harmony.
2. The method for generating singing harmony sound based on singing voice rhythm as claimed in claim 1, wherein in step S2, log spectrum of each frame
Figure FDA0002859281260000013
The calculation is realized according to the following steps:
step S21: dividing the singing voice audio into frames according to the frame length K and the frame shift hop of each frame to obtain xi(n);
Step S22: for xi(n) carrying out short-time Fourier transform to obtain frequency domain signal Xi(k);
Step S23: according to the formula
Figure FDA0002859281260000014
Obtaining a log spectrum sequence
Figure FDA0002859281260000015
3. The method of claim 2, wherein the frame length K is a sampling number within 10ms to 30ms, K being a sampling frequency per frame time length; and the frame shift hop is the non-overlapped part of two adjacent frames, and hop is K/3.
4. The singing harmony generation method based on singing voice rhythm as claimed in claim 1, wherein in said step S3, said spectral flux sf (n) is:
Figure FDA0002859281260000021
wherein n is the frame number, K is the frame length, and H (x) is the half-wave rectification function;
the autocorrelation sequence TG (τ) is:
TG(τ)=W(τ)∑F(t)F(t-τ);
wherein W (τ) is a Gaussian weighting function;
the BPM characteristic value is as follows:
BPM=60*fs/hop*τmax
wherein fs is the sampling rate, hop is the frame shift, τmaxIs the beat period.
5. The singing harmony generation method based on singing voice rhythm as claimed in claim 1, wherein in said step S4, the following steps are implemented:
step S41: calculating and extracting the BPM characteristic value of a singing voice signal every 2 seconds, wherein the average value of the BPM characteristic value sequence of the whole time signal is the average value
Figure FDA0002859281260000022
And is marked as BPM;
step S42: setting a delayed beat number D according to the formula
Figure FDA0002859281260000023
The delay amount delay is calculated.
6. The method for generating singing harmony sound based on singing voice rhythm of claim 1, wherein in step S5, the pitch raising method employs a pitch conversion method of stabilizing tone.
7. The method of claim 1, wherein in step S5, the third degree interval is an incompletely harmonious third degree interval, i.e. the pitch is 2^ (3/12) or 2^ (4/12) times the original pitch.
8. The method for generating singing harmony sound based on singing voice rhythm as claimed in claim 1, wherein in said step S6, the formula of said linear scale superposition is:
y(n)=x(n)+k*h(n);
in the above formula, k is a dry-wet ratio, and a more preferable effect can be obtained when k is 0.8.
CN201810101219.9A 2018-02-01 2018-02-01 Singing harmony generation method based on singing voice rhythm Expired - Fee Related CN108172210B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810101219.9A CN108172210B (en) 2018-02-01 2018-02-01 Singing harmony generation method based on singing voice rhythm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810101219.9A CN108172210B (en) 2018-02-01 2018-02-01 Singing harmony generation method based on singing voice rhythm

Publications (2)

Publication Number Publication Date
CN108172210A CN108172210A (en) 2018-06-15
CN108172210B true CN108172210B (en) 2021-03-02

Family

ID=62512557

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810101219.9A Expired - Fee Related CN108172210B (en) 2018-02-01 2018-02-01 Singing harmony generation method based on singing voice rhythm

Country Status (1)

Country Link
CN (1) CN108172210B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109545176B (en) * 2019-01-21 2022-03-04 北京小唱科技有限公司 Dynamic echo processing method and device for audio
CN109920449B (en) * 2019-03-18 2022-03-04 广州市百果园网络科技有限公司 Beat analysis method, audio processing method, device, equipment and medium
CN110853604A (en) * 2019-10-30 2020-02-28 西安交通大学 Automatic generation method of Chinese folk songs with specific region style based on variational self-encoder
CN112908289B (en) * 2021-03-10 2023-11-07 百果园技术(新加坡)有限公司 Beat determining method, device, equipment and storage medium
CN113411663B (en) * 2021-04-30 2023-02-21 成都东方盛行电子有限责任公司 Music beat extraction method for non-woven engineering

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1134580A (en) * 1995-02-02 1996-10-30 雅马哈株式会社 Harmony chorus apparatus generating chorus sound derived from vocal sound
CN1153964A (en) * 1995-02-27 1997-07-09 雅马哈株式会社 Karaoke apparatus creating virtual harmony voice over actual singing voice
JP2001117578A (en) * 1999-10-21 2001-04-27 Yamaha Corp Device and method for adding harmony sound
US6816833B1 (en) * 1997-10-31 2004-11-09 Yamaha Corporation Audio signal processor with pitch and effect control
CN102568457A (en) * 2011-12-23 2012-07-11 深圳市万兴软件有限公司 Music synthesis method and device based on humming input
CN102568454A (en) * 2011-12-13 2012-07-11 北京百度网讯科技有限公司 Method and device for analyzing music BPM (Beat Per Minutes)
CN105070283A (en) * 2015-08-27 2015-11-18 百度在线网络技术(北京)有限公司 Singing voice scoring method and apparatus
CN105659322A (en) * 2013-09-19 2016-06-08 微软技术许可有限责任公司 Recommending audio sample combinations
CN106228973A (en) * 2016-07-21 2016-12-14 福州大学 Stablize the music voice modified tone method of tone color
CN106373580A (en) * 2016-09-05 2017-02-01 北京百度网讯科技有限公司 Singing synthesis method based on artificial intelligence and device
CN106653037A (en) * 2015-11-03 2017-05-10 广州酷狗计算机科技有限公司 Audio data processing method and device
US20170221466A1 (en) * 2012-10-19 2017-08-03 Sing Trix Llc Vocal processing with accompaniment music input

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1134580A (en) * 1995-02-02 1996-10-30 雅马哈株式会社 Harmony chorus apparatus generating chorus sound derived from vocal sound
CN1153964A (en) * 1995-02-27 1997-07-09 雅马哈株式会社 Karaoke apparatus creating virtual harmony voice over actual singing voice
US6816833B1 (en) * 1997-10-31 2004-11-09 Yamaha Corporation Audio signal processor with pitch and effect control
JP2001117578A (en) * 1999-10-21 2001-04-27 Yamaha Corp Device and method for adding harmony sound
CN102568454A (en) * 2011-12-13 2012-07-11 北京百度网讯科技有限公司 Method and device for analyzing music BPM (Beat Per Minutes)
CN102568457A (en) * 2011-12-23 2012-07-11 深圳市万兴软件有限公司 Music synthesis method and device based on humming input
US20170221466A1 (en) * 2012-10-19 2017-08-03 Sing Trix Llc Vocal processing with accompaniment music input
CN105659322A (en) * 2013-09-19 2016-06-08 微软技术许可有限责任公司 Recommending audio sample combinations
CN105070283A (en) * 2015-08-27 2015-11-18 百度在线网络技术(北京)有限公司 Singing voice scoring method and apparatus
CN106653037A (en) * 2015-11-03 2017-05-10 广州酷狗计算机科技有限公司 Audio data processing method and device
CN106228973A (en) * 2016-07-21 2016-12-14 福州大学 Stablize the music voice modified tone method of tone color
CN106373580A (en) * 2016-09-05 2017-02-01 北京百度网讯科技有限公司 Singing synthesis method based on artificial intelligence and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"synchronization method for improving temporal harmony of music and video clips";Hayato Kumagai;《international conference on applied computing & information technology/international conference on computational science& intelligence 》;20151231;全文 *
"tempo and beat estimation of musical signals";Alonso M.;《international symposium on music information retrieval》;20041231;全文 *
"一种基于简单自相关的基音周期搜索算法";王孝欣;《工业控制计算机》;20151231;全文 *

Also Published As

Publication number Publication date
CN108172210A (en) 2018-06-15

Similar Documents

Publication Publication Date Title
CN108172210B (en) Singing harmony generation method based on singing voice rhythm
TW200412178A (en) Apparatus and method for audio-signal-processing
Ikemiya et al. Singing voice separation and vocal F0 estimation based on mutual combination of robust principal component analysis and subharmonic summation
Boersma et al. Spectral characteristics of three styles of Croatian folk singing
JP2015515647A (en) Automatic utterance conversion to songs, rap, or other audible expressions with the desired time signature or rhythm
DE102012103553A1 (en) AUDIO SYSTEM AND METHOD FOR USING ADAPTIVE INTELLIGENCE TO DISTINCT THE INFORMATION CONTENT OF AUDIOSIGNALS IN CONSUMER AUDIO AND TO CONTROL A SIGNAL PROCESSING FUNCTION
EP2962299B1 (en) Audio signal analysis
JPH09258787A (en) Frequency band expanding circuit for narrow band voice signal
CN110136730B (en) Deep learning-based piano and acoustic automatic configuration system and method
CN106653048B (en) Single channel sound separation method based on voice model
CN106383676B (en) Instant photochromic rendering system for sound and application thereof
CN110139206A (en) A kind of processing method and system of stereo audio
Camacho On the use of auditory models' elements to enhance a sawtooth waveform inspired pitch estimator on telephone-quality signals
Kumar et al. Musical onset detection on carnatic percussion instruments
McFee et al. Better beat tracking through robust onset aggregation
Benetos et al. Auditory spectrum-based pitched instrument onset detection
Sofianos et al. Towards effective singing voice extraction from stereophonic recordings
Xu et al. The extraction and simulation of Mel frequency cepstrum speech parameters
Ellis et al. Inharmonic speech: a tool for the study of speech perception and separation
Chanrungutai et al. Singing voice separation for mono-channel music using non-negative matrix factorization
Bonjyotsna et al. Analytical study of vocal vibrato and mordent of Indian popular singers
Chen et al. Modified Perceptual Linear Prediction Liftered Cepstrum (MPLPLC) Model for Pop Cover Song Recognition.
JP2001249676A (en) Method for extracting fundamental period or fundamental frequency of periodical waveform with added noise
Sharma et al. Separating the source information in repetition-dependent music and enhancing it by real-time digital audio processing
JP2000003200A (en) Voice signal processor and voice signal processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210302

Termination date: 20220201

CF01 Termination of patent right due to non-payment of annual fee