CN108172210B

CN108172210B - Singing harmony generation method based on singing voice rhythm

Info

Publication number: CN108172210B
Application number: CN201810101219.9A
Authority: CN
Inventors: 张栋; 彭建云; 汪培侨
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2018-02-01
Filing date: 2018-02-01
Publication date: 2021-03-02
Anticipated expiration: 2038-02-01
Also published as: CN108172210A

Abstract

The invention relates to a singing harmony generation method based on singing voice rhythm. From the application of singing and voice, the rhythm detection is carried out on the singing voice of the singer based on the frequency spectrum flux, the delay quantity of the harmony voice part is adaptively adjusted according to the singing voice rhythm to generate harmony, the beat extraction process can be simplified, the time complexity is reduced, and the music expression form of the singer is enriched. The singing harmony generation method based on the singing rhythm is simple, flexible to implement and high in practicability.

Description

Singing harmony generation method based on singing voice rhythm

Technical Field

The invention relates to the field of singing voice synthesis, in particular to a singing harmony generating method based on singing voice rhythm.

Background

Singing voice is a more complex audio signal and artistic expression form, and has important significance for analysis and research. With the popularization of music entertainment, sound effect processing for music voice becomes a hotspot for research and application, and is receiving wide attention from academic and industrial fields. Although sound effect processing technology for karaoke application is relatively mature, it is difficult for users to match harmony sound for their singing due to the limitations of their vocal and singing abilities. Therefore, it is of great practical value to study how to generate harmony based on the vocal characteristics of singers and how to generate adaptive harmony based on the rhythm of singing voice.

Disclosure of Invention

The invention aims to provide a singing harmony generating method based on singing voice rhythm, which can generate harmony in a self-adaptive manner according to the speed of a beat so as to enrich the music expression form of a singer.

In order to achieve the purpose, the technical scheme of the invention is as follows: a singing harmony generation method based on singing sound rhythm is characterized by comprising the following steps:

step S1: preprocessing the input singing voice audio signal, wherein the preprocessing mode comprises the following steps: filtering, pre-emphasis and normalization;

step S2: framing the preprocessed singing voice audio x (n), and calculating the log spectrum of each frame

Step S3: from a sequence of log spectra

Calculating the spectral flux SF (n) of the singing voice signal, taking the spectral flux SF (n) as an endpoint intensity curve F (t) after low-pass filtering and smoothing, and then calculating an autocorrelation sequence TG (tau) of the endpoint intensity curve, wherein the tau of the TG (tau) with the maximum value is a beat period, so that a BPM characteristic value can be calculated;

step S4: calculating the average BPM characteristic value of the whole input singing voice signal, recording the average BPM characteristic value as BPM, and calculating the delay amount of the sum voice part by the BPM;

step S5: copying a part of the preprocessed singing voice audio x (n) and increasing the pitch of the preprocessed singing voice audio x (n) to a third degree pitch, and then generating a harmony voice part h (n) through a time delay;

step S6: and (d) superposing the original sound part x (n) and the harmony sound part h (n) in a linear proportion to output y (n), namely the generated singing harmony.

In one embodiment of the present invention, in the step S2, the log spectrum of each frame

The calculation is realized according to the following steps:

step S21: dividing the singing voice audio into frames according to the frame length K and the frame shift hop of each frame to obtain x_i(n)；

Step S22: for x_i(n) carrying out short-time Fourier transform to obtain frequency domain signal X_i(k)；

Step S23: according to the formula

Obtaining a log spectrum sequence

In an embodiment of the present invention, the frame length K is a sampling number within 10ms to 30ms, where K is a time length of each frame and a sampling frequency; and the frame shift hop is the non-overlapped part of two adjacent frames, and hop is K/3.

In an embodiment of the present invention, in the step S3, the spectral flux sf (n) is:

wherein n is the frame number, K is the frame length, and H (x) is the half-wave rectification function;

the autocorrelation sequence TG (τ) is:

TG(τ)＝W(τ)∑F(t)F(t-τ)；

wherein W (τ) is a Gaussian weighting function;

the BPM characteristic value is as follows:

BPM＝60*fs/hop*τ_max；

wherein fs is the sampling rate, hop is the frame shift, τ_maxIs the beat period.

In an embodiment of the present invention, in the step S4, the specific implementation steps are as follows:

step S41: calculating and extracting the BPM characteristic value of a singing voice signal every 2 seconds, wherein the average value of the BPM characteristic value sequence of the whole time signal is the average value

And is marked as BPM;

step S42: setting a delayed beat number D according to the formula

Calculating the delay amount delay。

In an embodiment of the present invention, in the step S5, the pitch-up method adopts a pitch conversion method of stabilizing timbre.

In one embodiment of the present invention, in the step S5, the three-degree interval is an incompletely harmonized three-degree interval, i.e., the pitch is 2^ (3/12) or 2^ (4/12) times the original pitch.

In an embodiment of the present invention, in the step S6, the linear scale superposition formula is:

y(n)＝x(n)+k*h(n)；

in the above formula, k is a dry-wet ratio, and a more preferable effect can be obtained when k is 0.8.

Compared with the prior art, the invention has the following beneficial effects: the invention provides a singing harmony generating method based on singing rhythm, which can simplify the beat extraction process and reduce the time complexity from the application of singing harmony, can generate harmony in a self-adaptive manner according to the speed of beats, and can enrich the music expression form of a singer. The method is simple, flexible to realize and high in practicability.

Drawings

Fig. 1 is a flow chart of a singing harmony generation method based on singing voice rhythm in the present invention.

Detailed Description

The technical scheme of the invention is specifically explained below with reference to the accompanying drawings.

The invention provides a singing harmony generation method based on singing sound rhythm, which is mainly divided into three stages as shown in figure 1: in a rhythm detection stage, a flux filtering method is provided aiming at singing voice and human ear auditory characteristics, an endpoint intensity curve is obtained by adopting spectral flux calculation, and a BPM characteristic value is further extracted; in the harmony part generation stage, a singing harmony generation algorithm is proposed for singing harmony, the harmony part delay amount is dynamically calculated according to the BPM characteristic value, and a harmony part with the same person is generated by adopting a pitch conversion algorithm with stable tone color; in the superposition synthesis stage, singing and voice are superposed and output by adopting a linear proportion according to the delay amount and the dry-wet ratio. The method comprises the following specific steps:

step S1: calculate the songLogarithmic spectrum of audio signal: firstly, the whole song audio signal is preprocessed by filtering, pre-emphasis, normalization and the like. Then dividing the obtained speech signal into small-segment speech frames according to the frame K and the frame hop to obtain x_iAnd (n), wherein K is the time length of each frame and the sampling frequency, and hop is K/3. For each frame, the following processing is performed: x is to be_i(n) short-time Fourier transform to X_i(k)＝STFT(x_i(n)), then according to the formula

The obtained log spectrum sequence

Step S2: calculating BPM characteristic value: from a sequence of logarithmic spectra

Calculating the spectral flux SF (n) of the singing voice signal, and then taking the signal as an endpoint intensity curve F (t) after low-pass filtering and smoothing; calculating an autocorrelation sequence TG (tau) of the endpoint intensity curve, weighting the autocorrelation sequence by adopting a Gaussian window function, and enabling the tau with the maximum value of the TG (tau) to be a beat period, wherein BPM (60 fs/hop tau) is obtained according to a formula_maxAnd calculating to obtain the BPM characteristic value.

Step S3: calculating the average beat: calculating and extracting the BPM characteristic value of a singing voice signal every 2 seconds, wherein the average value of the BPM characteristic value sequence of the whole time signal is the average value

Step S4: calculating a delay amount: if it is

According to the formula

And calculating the delay amount of the sum sound part, otherwise, indicating that the BPM characteristic value exceeds the processing range and not processing.

Step S5: generating a harmony sound part: the original signal is copied and the pitch is promoted to be incompletely harmonious three-degree pitch by adopting a pitch conversion method for stabilizing tone color, namely the pitch is 2^ (3/12) or 2^ (4/12) times of the original pitch, and the harmonic sound part signal h (n) delayed by delay relative to the main sound part is obtained through a delayer.

Step S6: linear proportional superposition: and (d) linearly superposing the original vocal part x (n) and the harmony vocal part h (n) according to a formula y (n) (+ k) ((n), and outputting y (n)) which is the generated singing harmony. In the above formula, k is a dry-wet ratio, and a more preferable effect can be obtained when k is 0.8.

The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.

Claims

1. A singing harmony generation method based on singing sound rhythm is characterized by comprising the following steps:

Step S3: from a sequence of log spectra

Calculating the spectral flux SF (n) of the singing voice signal, taking the spectral flux SF (n) as an endpoint intensity curve F (t) after low-pass filtering and smoothing, and then calculating an autocorrelation sequence TG (tau) of the endpoint intensity curve, wherein the tau of the TG (tau) with the maximum value is a beat period, so that a BPM characteristic value can be calculated; the BPM characteristic value is as follows:

BPM＝60*fs/hop*τ_max

wherein fs is the sampling rate, hop is the frame shift, τ_maxIs the beat period;

step S5: copying a part of the preprocessed singing voice audio x (n) and increasing the pitch of the preprocessed singing voice audio x (n) to a third degree, and then generating a harmony voice part h (n) delayed by delay relative to the singing voice audio x (n) through a delay device;

2. The method for generating singing harmony sound based on singing voice rhythm as claimed in claim 1, wherein in step S2, log spectrum of each frame

The calculation is realized according to the following steps:

Step S23: according to the formula

Obtaining a log spectrum sequence

3. The method of claim 2, wherein the frame length K is a sampling number within 10ms to 30ms, K being a sampling frequency per frame time length; and the frame shift hop is the non-overlapped part of two adjacent frames, and hop is K/3.

4. The singing harmony generation method based on singing voice rhythm as claimed in claim 1, wherein in said step S3, said spectral flux sf (n) is:

the autocorrelation sequence TG (τ) is:

TG(τ)＝W(τ)∑F(t)F(t-τ)；

wherein W (τ) is a Gaussian weighting function;

the BPM characteristic value is as follows:

BPM＝60*fs/hop*τ_max；

5. The singing harmony generation method based on singing voice rhythm as claimed in claim 1, wherein in said step S4, the following steps are implemented:

And is marked as BPM;

step S42: setting a delayed beat number D according to the formula

The delay amount delay is calculated.

6. The method for generating singing harmony sound based on singing voice rhythm of claim 1, wherein in step S5, the pitch raising method employs a pitch conversion method of stabilizing tone.

7. The method of claim 1, wherein in step S5, the third degree interval is an incompletely harmonious third degree interval, i.e. the pitch is 2^ (3/12) or 2^ (4/12) times the original pitch.

8. The method for generating singing harmony sound based on singing voice rhythm as claimed in claim 1, wherein in said step S6, the formula of said linear scale superposition is:

y(n)＝x(n)+k*h(n)；