CN118506753A - High Hu Leyin range expansion method based on time-varying harmonic energy structure - Google Patents

High Hu Leyin range expansion method based on time-varying harmonic energy structure Download PDF

Info

Publication number
CN118506753A
CN118506753A CN202410656298.5A CN202410656298A CN118506753A CN 118506753 A CN118506753 A CN 118506753A CN 202410656298 A CN202410656298 A CN 202410656298A CN 118506753 A CN118506753 A CN 118506753A
Authority
CN
China
Prior art keywords
harmonic
pitch
tone
musical
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410656298.5A
Other languages
Chinese (zh)
Inventor
王一歌
梁烨新
韦岗
曹燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202410656298.5A priority Critical patent/CN118506753A/en
Publication of CN118506753A publication Critical patent/CN118506753A/en
Pending legal-status Critical Current

Links

Landscapes

  • Electrophonic Musical Instruments (AREA)

Abstract

The invention provides a high Hu Leyin sound range expansion method based on a time-varying harmonic energy structure, which comprises the following steps: designing a short-time Fourier transform of self-adaptive frequency spectrum resolution and a harmonic energy structure extraction method of a self-adaptive window to extract a time-varying harmonic energy structure of violin musical tones; aligning the time-varying harmonic energy structure by using linear interpolation, and obtaining a mapping relation of the time-varying harmonic energy structure by extracting the energy centroid of the time-varying harmonic energy structure through calculation; the tone pitch length of the hogfennel root musical tones is adjusted, harmonic components are filtered out through band-pass filtering, and the mapping relation is used for adjusting the harmonic energy, so that the medium-pitch hogfennel root musical tones, low-pitch Hu Leyin and double-low-pitch Hu Leyin are synthesized. The invention realizes a gamut expansion method of Gaohu musical sound capable of maintaining the tone color of the Huqin.

Description

High Hu Leyin range expansion method based on time-varying harmonic energy structure
Technical Field
The invention relates to the technical field of audio processing, in particular to a high Hu Leyin voice domain expansion method based on a time-varying harmonic energy structure.
Background
The method for researching the range expansion of the hogfennel root in the digital domain is used for complementing the middle and low sound parts of the Huqin musical instrument, and has important research significance.
The key of the expansion of the gamut of the peucedanum on the digital gamut is to ensure that the generated bass audio shows a musical instrument tone similar to the tone of the hu, and the effect cannot be achieved by simply converting the treble audio into the bass audio through resampling. However, the huqin bass tone color which can be used for reference is lacking in reality, and the range expansion is needed by referring to the tone color relation of the high and low audios of other musical instruments. The western violin instrument family is respectively a violin, a cello and a bass violin according to the range distribution, the four instruments comprise high, medium and low sound parts, the ranges are extremely wide, and therefore the four instruments cover the string group of the western symphony instrument. Violin instruments are diverse in tone color style, have extremely strong affinities, and always occupy the dominant position of symphony due to their prominent advantages in tone color expression. Meanwhile, the violin instrument and the huqin instrument are the string instruments, and have the common points in sounding principle and playing method, so that the value and feasibility for expanding the gamut of the huqin instrument are realized by referring to the tone relation among the violin instruments.
Most of the current gamut expansion requires model training of source tone and target tone data, and such models usually require a large amount of audio data for training, and require a large amount of time and computing resources.
Disclosure of Invention
The invention aims to solve the defects in the prior art and provides a high Hu Leyin range expansion method based on a time-varying harmonic energy structure. The high Hu Yinyu expansion based on the time-varying harmonic energy structure is based on the traditional audio processing algorithm, has the advantages of low computational complexity and high efficiency, can keep the timbre of the urheen to expand the gamut, and has important significance for solving the problem of civil music symphony.
The aim of the invention can be achieved by adopting the following technical scheme:
The method comprises the following steps:
S1, obtaining the frequency spectrum of each frame of violin tone samples by using short-time Fourier transform with self-adaptive frequency spectrum resolution; the method for extracting the harmonic energy structure of the self-adaptive window obtains the time-varying harmonic energy structure of the violin musical sound;
S2, normalizing a time-varying harmonic energy structure of the violin tone sample; the linear interpolation aligns time-varying harmonic energy structures of a plurality of violin tone samples; extracting energy centroids of a plurality of source violin musical sound samples and energy centroids of a plurality of target violin musical sound samples using a weighted average based on vector distances; calculating a time-varying harmonic energy structure mapping relation of the source violin musical sound and the target violin musical sound;
S3, adjusting the pitch and the length of the hogfennel root musical sound; filtering out harmonic components by band-pass filtering; the medium pitch Hu Leyin is synthesized by using the time-varying harmonic energy structure mapping relation of the violin and the medium violin, the low pitch Hu Leyin is synthesized by using the time-varying harmonic energy structure mapping relation of the violin and the cello, and the double low pitch Hu Leyin is synthesized by using the time-varying harmonic energy structure mapping relation of the violin and the bass violin.
Further, the step S1 includes the steps of:
s101, estimating accurate fundamental frequency values of musical sounds; the fundamental frequency estimation method comprises the steps of firstly, calculating to obtain a coarse estimation value of fundamental frequency by using a YIN algorithm, and then estimating a fine estimation value of the fundamental frequency by using a narrowband spectrum energy estimation method; the method comprises the following specific steps:
Step S101.1: calculating a difference function d t [ tau ] of the musical tone signal according to the formula (1), wherein x [ n ] is a time domain of the musical tone signal, W is an integral length, and 1024 is taken; normalizing the difference function according to the formula (2) to obtain a normalized difference function d t' [ tau ];
Step S101.2: setting a threshold value, taking 0.05, taking the first minimum value point of the normalized difference function, which is lower than the threshold value, as a periodic point, and taking the global minimum value if the minimum value point does not exist. Obtaining a coarse estimation of a fundamental frequency period value as N 0, and obtaining a fundamental frequency coarse estimation value f 0 by a formula (3), wherein f s is the sampling rate of a musical tone signal;
Step S101.3: intercepting the front 8192 points of the musical tone signal, adding a hamming window to the intercepted signal, and calculating a frequency domain spectrum X [ k ] of the musical tone signal by using fast Fourier transform;
Step S101.4: according to a frequency estimation algorithm of the narrowband spectrum energy, the calculation of an estimated value f es of the frequency domain spectrum amplitude maximum value of the musical tone signal is obtained by a formula (4), wherein k max is an index of the frequency domain spectrum maximum value, L is the interception length of the musical tone signal and is 8192;
step S101.5: calculating harmonic times of the maximum amplitude frequency according to a formula (5), wherein t represents a multiple value of a frequency domain spectrum maximum value frequency and a fundamental frequency, calculating a fine estimation value f base of the fundamental frequency of the musical tone signal according to a formula (6), and calculating a period value N of the musical tone signal according to a formula (7);
s102, carrying out single-period fundamental frequency sequence marking on musical tone signals, and providing basis for subsequent framing; the method comprises the following specific steps:
Step S102.1: taking the maximum value of the musical tone signal as a searching starting point;
Step S102.2: the sequence points are searched for to the right of the musical tone signal. Taking the rightmost point of the extracted sequence, and marking the rightmost point as n end, taking the maximum value point as the next sequence point in a search interval, and calculating left and right boundaries l e and r e of the search interval to obtain the sequence according to the formula (4) and the formula (5);
l e=nend + (1- ρ). Times.N formula (4)
R e=nend + (1+ρ). Times.N formula (5)
Where ρ is the search range coefficient, 0.25 is taken, and N is the number of cycles of the fundamental frequency. Iterating continuously by the method until the search interval exceeds the signal range;
step S102.3: searching for a sequence point to the left of the musical tone signal; taking the leftmost point of the extracted sequence, and marking the leftmost point as n 1, taking the maximum value point as the last sequence point in a search interval, and calculating left and right boundaries l s and r s of the search interval to obtain the sequence according to the formula (6) and the formula (7);
l s=n1 - (1+ρ). Times.N formula (6)
R s=n1 - (1- ρ). Times.N formula (7)
Iterating continuously by the method until the search interval exceeds the signal range; finally, single-period fundamental frequency sequence points of the musical tone signals are obtained, which are expressed as NP= [ n 1,n2,...,nM ], wherein M is the number of fundamental frequency periods of the musical tone signals;
s103, obtaining each frame frequency spectrum of musical sound by using short-time Fourier transform of the self-adaptive frequency spectrum resolution; the method comprises the following specific steps:
Step S103.1: according to single-period fundamental frequency sequence points, the musical tone signal is divided into frames, the frame length is 16 pitch periods, the frame is shifted to 8 pitch periods, and then the time domain y a [ m ] of the a-th frame is represented by a formula (8);
Wherein x n is musical tone signal, M is fundamental frequency period number of musical tone signal, F is frame number which can be divided, L X is time domain length of musical tone signal, floor is rounding down;
step S103.2: performing discrete Fourier transform on each frame of data Y a [ m ] to obtain a corresponding frequency domain spectrum Y a [ k ], and finally obtaining a frequency domain spectrum sequence YS= (Y 1[k],Y2[k],...,Ya[k],...,YF [ k ]);
s104, extracting integer times of harmonic energy structures from Y a k by using a harmonic energy structure extraction method of an adaptive window; the method comprises the following specific steps:
Step S104.1: determining a search interval range of the b integer harmonic, wherein the left and right boundaries of the search interval range are obtained by a formula (10) and a formula (11);
Wherein L b is the left boundary, r b is the right boundary, L Y is the number of points of Y a [ k ], For the position of the b-1 th integer harmonic peak at DFT, when b=1, takeJ b is the update step size of the b-th harmonic search interval;
Step S104.2: finding the maximum point in the search interval [ l b,rb ], then recording the point as the position k b of the harmonic peak, and then using the calculation formula (13) to calculate the harmonic energy e b of the number, wherein round represents rounded rounding;
Step S104.3: continuously repeating the step S104.1 and the step S104.2 to obtain an integer harmonic energy sequence of a frame of time domain signal And integer harmonic positionWherein I is the maximum harmonic frequency;
S105, extracting the fractional harmonic energy structure from Y a k. The method comprises the following specific steps:
step S105.1: determining fractional harmonic positions; the position of the fractional harmonic of the b-th c-th position is calculated by formula (15);
Gamma= [0.25,0.5,0.75] formula (16)
Wherein,The fractional harmonic position of the position at c-th order,For the position of the b-th integer harmonic, γ c represents the position of the fractional harmonic, and represents the position of the fractional harmonic of the c-th position at the interval between two adjacent integer harmonics, namely, each of 0.25, 0.5 and 0.75 is provided with one fractional harmonic;
Step S105.2: calculating fractional harmonic energy; the energy of the fractional harmonic of the position at the b-th c-th is calculated by formula (17);
step S105.3: continuously repeating the step S105.1 and the step S105.2 to obtain an integer harmonic energy sequence FE a and an integer harmonic position FK a of a frame of time domain signal, wherein I is the maximum harmonic frequency;
s106, calculating an integral number harmonic energy structure and a fractional number harmonic energy structure of each frame of the combined musical tone signal to obtain a time-varying harmonic energy structure of the violin musical tone; the method comprises the following specific steps:
Step S106.1: repeating step S104 and step S105 for the frequency domain spectrum of each frame of the frequency domain spectrum sequence YS extracted in step S103 for the violin tone, to obtain an integer number of harmonic energy structure sequences ies= (IE 1,IE2,...,IEa,...,IEF) and a sum fractional number of harmonic energy structure sequences fes= (FE 1,FE2,...,FEa,...,FEF);
Step S106.2: combining each element of IES and FES, i.e., combining IE a and FE a to obtain E a, where the specific combination is shown in formula (20), then obtaining a new sequence es= (E 1,E2,...,Ea,...,EF), where ES is the time-varying harmonic energy structure of violin musical tones;
S107, repeating steps S101 to S106 for the musical tone samples of violin MIDI pitches 55 to 103 to extract the time-varying harmonic energy structure of each musical tone sample of the violin, and similarly extracting the musical tone samples of violin MIDI pitches 48 to 96, the musical tone samples of violin MIDI pitches 36 to 84 and the time-varying harmonic energy structures of the musical tone samples of bass MIDI tones 24 to 66.
Further, the step S2 includes the steps of:
S201, normalizing the time-varying harmonic energy structure of the extracted musical tone signal in a formula (21), wherein E a (b, c) represents the harmonic energy of the b-th c-th position of the a-th frame;
s202, linearly interpolating and aligning a plurality of musical sound samples with the same pitch of the same musical instrument, and assuming that the size of a time-varying harmonic energy structure ES of a certain musical sound sample is FxI×4 and the number of frames to be aligned is F t, the size of a target energy structure ES' is F t xI×4. The method comprises the following specific steps:
Step S202.1: determining an interpolation position, wherein the position of a known data point is obtained by a formula (22), and the position needing interpolation is obtained by a formula (23); where pt o (i) represents the location of the ith known data point and pt t (j) represents the location of the jth known data point:
pt o(i)=(i-1)×(Ft -1), 1.ltoreq.i.ltoreq.F formula (22)
Pt t(j)=(j-1)×(F-1),1≤j≤Ft formula (23)
Step S202.2: linear interpolation, the specific calculation of which is available from equation (24), for the harmonic energy at the c-th position of the b-th frame; wherein i l and i r represent the order of the left and right points nearest to the jth interpolation point, respectively; traversing the value of each b and c, and continuously calling the formula (24) to calculate interpolation values, so that the aim of aligning the frame numbers can be achieved; after alignment of the linear interpolation, let es=es';
S203, S203: extracting energy centroid TECs of homotone high tone samples SEC of a plurality of source musical instruments and homotone samples of a plurality of target musical instruments; the method comprises the following specific steps:
Step S203.1: calculating the space distance between a certain music sound sample and other samples in a formula (25);
Wherein I represents the maximum harmonic frequency, S represents the number of tone samples, ES i is the time-varying harmonic energy structure of the ith tone sample, ES j is the time-varying harmonic energy structure of the jth tone sample, and d ai represents the spatial distance of the sample from other samples in the a-th frame;
step S203.2: calculating the weight of the weighted average in the form of a formula (26), wherein alpha ai represents the weight of the ith tone sample in the a frame;
Step S203.3: calculating an energy centroid SEC of the source musical instrument and an energy centroid TEC of the target musical instrument, wherein the calculation formulas are formula (27) and formula (28) respectively; wherein ES si represents the time-varying harmonic energy structure of the ith tone sample of the source musical instrument, ES ti represents the time-varying harmonic energy structure of the ith tone sample of the target musical instrument, S s represents the number of tone samples of the source musical instrument, and S t represents the number of tone samples of the target musical instrument;
S204, namely: calculating a time-varying harmonic energy structure mapping relation ESM of the source musical instrument and the target musical instrument, wherein a calculation formula of a harmonic energy mapping coefficient of a c position of a b th time of a mapping relation a frame is formula (29);
S205,: the violin musical tone is used as a source musical instrument, the violin, the cello and the bass violin are respectively used as target musical instruments, the steps S201 to S203 are repeatedly applied to obtain a time-varying harmonic energy structure mapping relation ESM pqo of the p pitch musical tone of the violin and the q pitch of other violins o, the range of p is 55-103, the range of q is 24-96, the range of o is 1-3, and the violin, the cello and the bass violin are respectively represented.
Further, the step S3 includes the steps of:
S301, performing pitch adjustment on a hogfennel root tone with a certain pitch p, and adjusting the pitch to q; pitch adjustment is achieved by resampling, a resampling mode selects cubic spline interpolation, and the sampling rate calculation of resampling is obtained by a formula (30);
Wherein, f s_new is the new sampling rate, f s is the original sampling rate, f base is the tone fundamental frequency value before pitch adjustment, and f bt is the target tone fundamental frequency value after pitch adjustment; after resampling, setting the sampling rate as the original sampling rate f s, and obtaining the effect of pitch adjustment;
S302, performing tone length adjustment on the high-note tone with the adjusted pitch; the pitch length adjustment is realized by deleting the pitch unit; the method comprises the following specific steps:
Step S302.1: dividing the pitch unit according to the single-period fundamental frequency sequence point np= [ n 1,n2,...,nM ] extracted in step S102, the ith pitch unit x i [ m ] can be obtained by formula (35), and the center point m i thereof can be obtained by formula (31);
m i=ni formula (32)
Wherein M is the number of partitionable fundamental tone units, and also is the number of single-period fundamental frequency sequence points, L X is the time domain length of musical tone signals, and after the fundamental tone units are partitioned, a hanning window is added to the fundamental tone units;
Step S302.2: deleting the fundamental tone unit; each len pitch units one existing pitch unit is deleted, the pitch unit deletion interval is available from equation (33);
step S302.3: the frame restores musical tones. If the modified pitch cell sequence S is equation (34), and the center point sequence SP is equation (35);
Where M D is the number of pitch units after the removal of a portion of the pitch units, then the superposition of the ith pitch unit can be represented by equation (36);
x [ n+m i-1]=x[n+mi-1]+xi[n],mi-1≤n≤mi+1 formula (36)
S303, band-pass filtering to filter out pure harmonic components of the hogfennel root musical sound, firstly, setting a band-pass filter bank according to harmonic positions to filter out all pure harmonic components of the whole time domain of the hogfennel root musical sound, and then dividing each pure harmonic component according to step 103.1. The method comprises the following specific steps:
Step S303.1: the positions of the harmonics of the hogfennel musical tone are extracted from the hogfennel musical tone signal by using the flow of step S104 and step S105, and the obtained harmonic positions are expressed by formula (37).
Wherein f s is the sampling rate of the Gaohu musical sound, and L P is the point number of DFT of the Gaohu musical sound;
step S303.2: setting a band-pass filter group, wherein parameters of the band-pass filter for filtering harmonic components of the b th and c th positions are obtained by a formula (38) and a formula (39);
f stop(b,c)=[fbc-0.25fbt,fbc+0.25fbt formula (38)
F pass(b,c)=[fbc-0.05fbt,fbc+0.05fbt formula (39)
Wherein f stop (b, c) is the stop band frequency of the band-pass filter, f pass (b, c) is the pass band frequency of the band-pass filter, the band-pass filter is selected as an FIR filter in the digital filter, a window function method is selected as the implementation of the FIR filter, and a Hanning window is selected as a window function type;
Step S303.3: filtering the gaohu musical sound by using a band-pass filter bank to obtain total I multiplied by 4 pure harmonic time domain components, wherein the time domain length of each component is the same as that of the original musical sound signal;
Step S303.4: framing the I×4 clean harmonic time domain components using the method of step S103.1 to obtain F×I×4 harmonic components altogether, where the clean harmonic components at the b-th and c-th positions of the a-th frame can be denoted as p_ har (a, b, c);
s304, adjusting the harmonic energy. If a huqin tone with a pitch q is to be synthesized from a hogfennel tone with a pitch p, then p_ har is energy-adjusted using ESM pqo extracted in step S2; the specific adjustment is shown in formula (40);
s305, restoring the time domain signal by an overlap-add method; the superposition step is the same as step S302.3, in order to smooth the time domain waveform, the amplitudes of different frames have good transition, the Hanning window is used for windowing each frame of data, the two ends of each frame of data are zero, and when the two frames are spliced, the overlapped parts are directly added, so that the synthesized Huqin musical sound with the pitch of q is finally obtained;
S306, repeating the steps S301 to S305 for the musical sound with the MIDI pitch of 67, wherein the time-varying harmonic energy structure mapping relation selected in the step S304 is ESM pq1, and synthesizing the medium-tone Gaohu musical sound with the MIDI pitch range of 48 to 66; repeating steps S301 to S305 for the musical sound with the tone pitch of 67 of the hogfennel, wherein the time-varying harmonic energy structure mapping relation selected in the step S304 is ESM pq2, and synthesizing low tone pitch Hu Leyin with tone pitches ranging from 36 to 66; repeating steps S301 to S305 for a musical tone having a gaohu pitch MIDI pitch of 67, wherein the time-varying harmonic energy structure mapping relationship selected in step S304 is ESM pq3, and synthesizing a double low pitch Hu Leyin having a pitch range of 24 to 66.
Compared with the prior art, the invention has the following advantages and effects:
(1) When the accurate fundamental frequency of musical sound is extracted, the invention provides a narrow-band spectrum energy frequency estimation algorithm combined with YIN, which can avoid frequency multiplication errors and half-frequency errors as much as possible and can lighten fundamental frequency errors caused by DFT frequency spectrum resolution as much as possible.
(2) The invention provides a time-varying harmonic energy structure of a self-adaptive frequency spectrum resolution short-time Fourier transform extracted tone as tone representation of the tone, the method considers the pitch attribute of the tone, and frames the tone signal in a self-adaptive frame length by referring to the pitch annotation of a pitch synchronous superposition algorithm, so that the frequency spectrum resolution of DFT can meet the subsequent requirements.
(3) The time-varying harmonic energy structure-based high Hu Yinyu expansion is provided, and after the tone relation of the violin instrument is referred, bass high-note musical sound is successfully synthesized under the condition that the tone of the violin is kept unchanged.
(4) The gamut expansion method provided by the invention is based on the traditional Fourier transform analysis and the fundamental tone synchronous superposition algorithm, and can obtain a better gamut expansion effect without a large amount of data for model training.
Drawings
FIG. 1 is a flow chart of a high Hu Leyin pitch range expansion method based on a time-varying harmonic energy structure in an embodiment of the invention.
Fig. 2 is a flow chart of an extracted time-varying harmonic energy structure in an embodiment of the invention.
FIG. 3 is a flow chart of extracting time-varying harmonic energy structure mappings in an embodiment of the invention.
Fig. 4 is a flowchart for synthesizing a low-pitched note G2 pitch tone in the embodiment of the present invention.
Fig. 5 is a time domain diagram and a frequency domain diagram of synthesizing a tone of a low-pitched note G2 according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
The embodiment discloses a high Hu Yinyu expansion method based on a time-varying harmonic energy structure, which comprises the following specific steps as shown in figure 1:
s1, obtaining a frequency spectrum of each frame of a violin tone sample by using short-time Fourier transform of self-adaptive frequency spectrum resolution; extracting a time-varying harmonic energy structure of a violin tone sample by using an adaptive window search method, wherein a flow chart of the time-varying harmonic energy structure is shown in fig. 2;
S101, estimating accurate fundamental frequency values of musical sounds; the method comprises the steps of carrying out fundamental frequency estimation on a musical tone sample with the tone pitch of G4, wherein the sampling rate of the musical tone is f s =44100 Hz, firstly, calculating to obtain a rough estimation value f 0 = 386.84Hz of the fundamental frequency by using a YIN algorithm, and then obtaining a fine estimation value f b = 388.81Hz of the fundamental frequency by a narrowband spectrum energy estimation method;
S102, carrying out single-period fundamental frequency sequence marking on musical tone signals, and providing basis for subsequent framing. The length of a violin pitch G4 tone sample is 56264, the maximum value position of the tone is 13527, then the number of fundamental frequency period points N=113 is continuously searched to two sides, and the obtained single period fundamental frequency sequence mark NP= [82,195,308,421, & gt, 56037,56150,56263] has 500 sequence points in total;
S103, obtaining each frame frequency spectrum of musical sound by using short-time Fourier transform of the adaptive frequency spectrum resolution. Firstly framing a violin musical tone sample according to single-period fundamental frequency sequence points to obtain 61 frames, wherein the length of each frame is about 1798 points, and then performing discrete Fourier transform on the 61 frames to obtain 61 frequency domain spectrums to form a frequency domain spectrum sequence YS;
S104, extracting an integer harmonic energy structure by using an adaptive window searching method. Extracting an integer harmonic energy structure of a 10 th frame of a violin pitch G4 tone, wherein when the energy of a first harmonic is extracted, the left boundary and the right boundary are respectively 11 and 21, then searching the maximum value point of a frequency domain spectrum in the range to be 17, then calculating the harmonic energy of the frequency domain spectrum to be 0.0075, searching to obtain other harmonic energy, and finally obtaining IE= [0.0075,0.0097,0.0016, 6.43 multiplied by 10 -11, IK= [17,33,49,65, IK, 878], wherein the lengths of the two sequences are 55, namely the maximum harmonic frequency is 55;
S105, extracting the fractional harmonic energy structure. The 10 th frame of the violin pitch G4 musical sound is extracted by a fractional harmonic energy structure, when fractional harmonic energy between 0Hz and first harmonic is extracted, the positions of the fractional harmonic energy structure are 5,9 and 13 respectively, the harmonic energy is 5.43 multiplied by 10 -11,7.21×10-7,3.95×10-6 respectively, and other fractional harmonic waves are calculated by the same method, so that an obtained fractional harmonic energy sequence FE and an obtained fractional harmonic energy position sequence FK;
s106, calculating an integral number harmonic energy structure and a fractional number harmonic energy structure of each frame of the combined musical tone signal to obtain a time-varying harmonic energy structure of the violin musical tone; repeating step S104 and step S105 with the frequency domain spectrum sequence YS extracted in step S103 to obtain a time-varying harmonic energy structure of a musical tone sample of the tone pitch of the violin G4, the time-varying harmonic energy structure having a size of 61×55×4 and being a three-dimensional tensor.
S2, normalizing the time-varying harmonic energy structures of the music sound samples, linearly interpolating the time-varying harmonic energy structures of the plurality of music sound samples, extracting energy centroids of the plurality of source music sound samples and extracting energy centroids of the plurality of target music sound samples by using weighted average based on vector distances, and calculating a time-varying harmonic energy structure mapping relation of source violin music sound and target violin music sound, wherein a flow chart of the time-varying harmonic energy structure mapping relation is shown in figure 3;
s201, normalizing the time-varying harmonic energy structure of the musical tone signal of the tone pitch of the violin G4 extracted in the step S1;
S202, linearly interpolating and aligning a plurality of musical sound samples with the same pitch of the same musical instrument, wherein the time-varying harmonic energy structure size of the musical sound samples with the pitch of the violin G4 is 61 multiplied by 55 multiplied by 4 before the frame number is aligned, and the structure size of the musical sound samples is 20 multiplied by 55 multiplied by 4 after the alignment;
S203, extracting energy centroid TEC of homotone high tone samples of a plurality of source musical instruments and energy centroid TEC of homotone high tone samples of a plurality of target musical instruments; specifically, this embodiment repeats steps S1 to S202 for all the tone samples of the violin pitch G4, to obtain a time-varying harmonic energy structure of each tone sample, and then extracts the energy centroid SEC of the tone sample of the violin pitch G4, and similarly extracts the energy centroid TEC of the tone sample of the violin pitch G2, all of which have a size of 20×55×4;
S204, calculating a time-varying harmonic energy structure mapping relationship ESM of the source musical instrument and the target musical instrument; specifically, the embodiment extracts ESM 67-43-2, which is a time-varying harmonic energy structure mapping relationship between the violin pitch G4 and the cello pitch G2, and the size thereof is 20×55×4.
S3, adjusting the pitch and the length of the hogfennel root musical sound, filtering harmonic components by band-pass filtering, and synthesizing low pitch Hu Leyin by applying a time-varying harmonic energy structure mapping relation of the violin and the cello, wherein a flow chart of the low pitch Hu Leyin is shown in FIG 4;
and S301, performing pitch adjustment on one tone sample with the gaohu pitch of G4 to adjust the tone sample to the G2 pitch, wherein the resampling sampling rate is f s_new = 192000Hz, the tone count length before pitch adjustment is 80886, and the tone count length after adjustment is 323544.
S302, performing tone length adjustment on the high-note tone with the adjusted pitch; the pitch adjustment is achieved by deleting the pitch units, the number of pitch units being 661 before the pitch units are deleted, the number of pitch units being 167 after the pitch units are deleted, the number of tone points after the pitch adjustment being 81752;
S303, filtering out pure harmonic components of the hogfennel root musical sound through band-pass filtering; filtering the peucedanum tone through a band-pass filter group to obtain 55 multiplied by 4 pure harmonic components, and framing the pure harmonic components to obtain 20 multiplied by 55 multiplied by 4 pure harmonic components;
S304, adjusting the harmonic energy. Using ESM 67-43-2 extracted in step 2 to adjust the pure harmonic component extracted in step 303;
S305, restoring the time domain signal by an overlap-add method; performing superposition reduction on the pure harmonic components subjected to the harmonic energy adjustment in the step 304 by using an overlap-add method to finally obtain synthesized Huqin musical sound with the pitch of G2 and the length of 81752; fig. 5 shows a time domain diagram and a frequency domain diagram of a synthesized huqin musical tone, and from the diagram, it can be seen that the time domain envelope of the synthesized huqin musical tone has no amplitude outlier, and the frequency domain has no frequency spectrum leakage phenomenon, which indicates that the effect of synthesizing the musical tone is better.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims (4)

1. The high Hu Leyin range expansion method based on the time-varying harmonic energy structure is characterized by comprising the following steps of:
S1, obtaining the frequency spectrum of each frame of violin tone samples by using short-time Fourier transform with self-adaptive frequency spectrum resolution; obtaining a time-varying harmonic energy structure of violin musical sound by using a harmonic energy structure extraction method of an adaptive window;
S2, normalizing a time-varying harmonic energy structure of the violin tone sample; the linear interpolation aligns time-varying harmonic energy structures of a plurality of violin tone samples; extracting energy centroids of a plurality of source violin musical sound samples and energy centroids of a plurality of target violin musical sound samples using a weighted average based on vector distances; calculating a time-varying harmonic energy structure mapping relation of the source violin musical sound and the target violin musical sound;
S3, adjusting the pitch and the length of the hogfennel root musical sound; filtering out harmonic components by band-pass filtering; the medium pitch Hu Leyin is synthesized by using the time-varying harmonic energy structure mapping relation of the violin and the medium violin, the low pitch Hu Leyin is synthesized by using the time-varying harmonic energy structure mapping relation of the violin and the cello, and the double low pitch Hu Leyin is synthesized by using the time-varying harmonic energy structure mapping relation of the violin and the bass violin.
2. The method for extending a high Hu Leyin range based on a time-varying harmonic energy structure according to claim 1, wherein the step S1 includes the steps of:
s101, estimating accurate fundamental frequency values of musical sounds; the fundamental frequency estimation method comprises the steps of firstly, calculating to obtain a coarse estimation value of fundamental frequency by using a YIN algorithm, and then estimating a fine estimation value of the fundamental frequency by using a narrowband spectrum energy estimation method; the method comprises the following specific steps:
Step S101.1: according to the calculated difference function d t [ tau ] of the musical tone signal, the integral length is 1024, and according to the normalization of the difference function, the normalized difference function d t' [ tau ] is obtained;
Step S101.2: setting a threshold value, taking 0.05, taking the first minimum value point of the normalized difference function lower than the threshold value as a periodic point, if the minimum value point does not exist, taking a global minimum value, obtaining a coarse estimation of a fundamental frequency periodic value as N 0, and calculating a fundamental frequency coarse estimation value f 0;
Step S101.3: intercepting the front 8192 points of the musical tone signal, adding a hamming window to the intercepted signal, and calculating a frequency domain spectrum X [ k ] of the musical tone signal by using fast Fourier transform;
step S101.4: according to the frequency estimation algorithm of the narrowband spectrum energy, calculating an estimated value f es of the frequency of the maximum value of the frequency domain spectrum amplitude of the musical tone signal;
Step S101.5: calculating harmonic times of the maximum amplitude frequency according to the following formula, wherein t represents a multiple value of the maximum frequency of the frequency domain spectrum and the fundamental frequency; calculating a fine estimation value f base of the fundamental frequency of the musical tone signal; calculating a period value N of the musical tone signal, wherein f s is a sampling rate of the musical tone signal;
s102, carrying out single-period fundamental frequency sequence marking on musical tone signals, and providing basis for subsequent framing; the method comprises the following specific steps:
Step S102.1: taking the maximum value of the musical tone signal as a searching starting point;
step S102.2: searching for sequence points to the right of the musical tone signal; taking the rightmost point of the extracted sequence, denoted as n end, the largest maximum point is taken as the next sequence point in a search interval, and the left and right boundaries l e and r e of the search interval are calculated as follows:
le=nend+(1-ρ)×N
re=nend+(1+ρ)×N
Wherein ρ is a search range coefficient, 0.25 is taken, and N is the number of cycles of the fundamental frequency; continuously iterating until the search interval exceeds the signal range;
Step S102.3: searching for a sequence point to the left of the musical tone signal; taking the leftmost point of the extracted sequence, denoted as n 1, the largest maximum point is taken as the last sequence point in a search interval, and the left and right boundaries l s and r s of the search interval are calculated as follows:
ls=n1-(1+ρ)×N
rs=n1-(1-ρ)×N
Continuously iterating until the search interval exceeds the signal range; finally, single-period fundamental frequency sequence points of the musical tone signals are obtained, which are expressed as NP= [ n 1,n2,...,nM ], wherein M is the number of fundamental frequency periods of the musical tone signals;
s103, obtaining each frame frequency spectrum of musical sound by using short-time Fourier transform of the self-adaptive frequency spectrum resolution; the method comprises the following specific steps:
Step S103.1: according to single-period fundamental frequency sequence points, the musical tone signal is divided into frames, the frame length is 16 pitch periods, the frame is shifted to 8 pitch periods, and then the time domain y a m of the a frame is expressed by the following formula;
Wherein x n is musical tone signal, M is fundamental frequency period number of musical tone signal, F is frame number which can be divided, L X is time domain length of musical tone signal, floor is rounding down;
step S103.2: performing discrete Fourier transform on each frame of data Y a [ m ] to obtain a corresponding frequency domain spectrum Y a [ k ], and finally obtaining a frequency domain spectrum sequence YS= (Y 1[k],Y2[k],...,Ya[k],...,YF [ k ]);
s104, extracting integer times of harmonic energy structures from Y a k by using a harmonic energy structure extraction method of an adaptive window; the method comprises the following specific steps:
step S104.1: determining the search interval range of the b integer harmonic, wherein the left and right boundaries are calculated by the following formula:
Wherein L b is the left boundary, r b is the right boundary, L Y is the number of points of Y a [ k ], For the position of the b-1 th integer harmonic peak at DFT, when b=1, takeJ b is the update step size of the b-th harmonic search interval;
Step S104.2: finding the maximum point in the search interval [ l b,rb ], then recording the point as the position k b of the harmonic peak, and then using the calculation formula (13) to calculate the harmonic energy e b of the number, wherein round represents rounded rounding;
Step S104.3: continuously repeating the step S104.1 and the step S104.2 to obtain an integer harmonic energy sequence of a frame of time domain signal And integer harmonic positionWherein I is the maximum harmonic frequency;
S105, extracting a fractional harmonic energy structure from Y a [ k ], wherein the method comprises the following specific steps:
Step S105.1: determining fractional harmonic positions; the position of the fractional harmonic of the b-th c-th position is calculated by the following formula:
γ=[0.25,0.5,0.75]
wherein, The fractional harmonic position of the position at c-th order,For the position of the b-th integer harmonic, γ c represents the position of the fractional harmonic, and represents the position of the fractional harmonic of the c-th position at the interval between two adjacent integer harmonics, namely, each of 0.25, 0.5 and 0.75 is provided with one fractional harmonic;
Step S105.2: calculating fractional harmonic energy; the energy of the fractional harmonic at position c at the b-th order is calculated by the following formula:
step S105.3: continuously repeating the step S105.1 and the step S105.2 to obtain an integer harmonic energy sequence FE a and an integer harmonic position FK a of a frame of time domain signal, wherein I is the maximum harmonic frequency;
s106, calculating an integral number harmonic energy structure and a fractional number harmonic energy structure of each frame of the combined musical tone signal to obtain a time-varying harmonic energy structure of the violin musical tone; the method comprises the following specific steps:
Step S106.1: repeating step S104 and step S105 for the frequency domain spectrum of each frame of the frequency domain spectrum sequence YS extracted in step S103 for the violin tone, to obtain an integer number of harmonic energy structure sequences ies= (IE 1,IE2,...,IEa,...,IEF) and a sum fractional number of harmonic energy structure sequences fes= (FE 1,FE2,...,FEa,...,FEF);
Step S106.2: combining each element of IES and FES, i.e., combining IE a and FE a to obtain E a, the specific combination is shown in the following formula; obtaining a new sequence ES= (E 1,E2,...,Ea,...,EF), wherein ES is the time-varying harmonic energy structure of violin musical tones;
S107, repeating steps S101 to S106 for the musical tone samples of violin MIDI pitches 55 to 103 to extract the time-varying harmonic energy structure of each musical tone sample of the violin, and similarly extracting the musical tone samples of violin MIDI pitches 48 to 96, the musical tone samples of violin MIDI pitches 36 to 84 and the time-varying harmonic energy structures of the musical tone samples of bass MIDI tones 24 to 66.
3. The method for extending a high Hu Leyin range based on a time-varying harmonic energy structure according to claim 1, wherein the step S2 includes the steps of:
s201, normalizing the time-varying harmonic energy structure of the extracted musical tone signal, wherein the normalization mode is shown in the following formula, and E a (b, c) represents the harmonic energy of the b th c position of the a frame;
S202, linearly interpolating and aligning a plurality of musical sound samples with the same pitch of the same musical instrument, and assuming that the size of a time-varying harmonic energy structure ES of a certain musical sound sample is FxI multiplied by 4 and the number of frames to be aligned is F t, the size of a target energy structure ES' is F t multiplied by I multiplied by 4; the method comprises the following specific steps:
step S202.1: determining an interpolation position, wherein pt o (i) represents the position of the ith known data point and pt t (j) represents the position of the jth known data point:
pto(i)=(i-1)×(Ft-1),1≤i≤F
ptt(j)=(j-1)×(F-1),1≤j≤Ft
Step S202.2: linear interpolation, for harmonic energy at the c-th position of the b-th frame, the specific calculation of which is obtainable by the following formula; wherein i l and i r represent the order of the left and right points nearest to the jth interpolation point, respectively; traversing the values of each b and c, and continuously calling the following formulas to calculate interpolation values, so that the aim of aligning the frame numbers can be achieved; after alignment of the linear interpolation, let es=es';
S203: extracting energy centroid TECs of homotone high tone samples SEC of a plurality of source musical instruments and homotone samples of a plurality of target musical instruments; the method comprises the following specific steps:
Step S203.1: the space distance between a certain music sound sample and other samples is calculated by the following calculation modes:
Wherein I represents the maximum harmonic frequency, S represents the number of tone samples, ES i is the time-varying harmonic energy structure of the ith tone sample, ES j is the time-varying harmonic energy structure of the jth tone sample, and d ai represents the spatial distance of the sample from other samples in the a-th frame;
Step S203.2: calculating the weight of the weighted average in the following manner, wherein alpha ai represents the weight of the ith musical sound sample in the a frame;
Step S203.3: the energy mass center SEC of the source musical instrument and the energy mass center TEC of the target musical instrument are calculated, and the calculation formulas are respectively shown as follows; wherein ES si represents the time-varying harmonic energy structure of the ith tone sample of the source musical instrument, ES ti represents the time-varying harmonic energy structure of the ith tone sample of the target musical instrument, S s represents the number of tone samples of the source musical instrument, and S t represents the number of tone samples of the target musical instrument;
S204: the calculation formula of the harmonic energy mapping coefficient of the c position of the b th time of the a frame of the mapping relation is as follows:
S205: the violin musical tone is used as a source musical instrument, the violin, the cello and the bass violin are respectively used as target musical instruments, the steps S201 to S203 are repeatedly applied to obtain a time-varying harmonic energy structure mapping relation ESM pqo of the p pitch musical tone of the violin and the q pitch of other violins o, the range of p is 55-103, the range of q is 24-96, the range of o is 1-3, and the violin, the cello and the bass violin are respectively represented.
4. The method for extending a high Hu Leyin range based on a time-varying harmonic energy structure according to claim 1, wherein the step S3 includes the steps of:
s301, performing pitch adjustment on a hogfennel root tone with a certain pitch p, and adjusting the pitch to q; pitch adjustment is achieved by resampling, which selects cubic spline interpolation, and the sampling rate of resampling is calculated as follows:
Wherein, f s_new is the new sampling rate, f s is the original sampling rate, f base is the tone fundamental frequency value before pitch adjustment, and f bt is the target tone fundamental frequency value after pitch adjustment; after resampling, setting the sampling rate as the original sampling rate f s, and obtaining the effect of pitch adjustment;
S302, performing tone length adjustment on the high-note tone with the adjusted pitch; the pitch length adjustment is realized by deleting the pitch unit; the method comprises the following specific steps:
Step S302.1: according to the single-period fundamental frequency sequence point np= [ n 1,n2,...,nM ] extracted in step S102, the i-th pitch unit x i [ m ] and its center point m i can be calculated as follows:
mi=ni
Wherein M is the number of partitionable fundamental tone units, and also is the number of single-period fundamental frequency sequence points, L X is the time domain length of musical tone signals, and after the fundamental tone units are partitioned, a hanning window is added to the fundamental tone units;
step S302.2: deleting the fundamental tone unit; an existing pitch unit is deleted every len pitch units, and the pitch unit deletion interval is calculated by the following formula:
step S302.3: frame-restored musical tones; if the modified pitch unit sequence S and the center point sequence SP are shown in the following formula:
Where M D is the number of pitch units after the removal of a part of the pitch units, then the superposition of the ith pitch unit can be expressed by the following formula;
x[n+mi-1]=x[n+mi-1]+xi[n],mi-1≤n≤mi+1
s303, bandpass filtering to filter out pure harmonic components of the hogfennel root musical sound, firstly setting a bandpass filter bank according to harmonic positions to filter out all pure harmonic components of the whole time domain of the hogfennel root musical sound, and then dividing each pure harmonic component according to step 103.1; the method comprises the following specific steps:
Step S303.1: the positions of the harmonics of the hogfennel musical tone are extracted for the hogfennel musical tone signal using the flow of steps S104 and S105, and the obtained harmonic positions are expressed by the following formula:
Wherein f s is the sampling rate of the Gaohu musical sound, and L P is the point number of DFT of the Gaohu musical sound;
Step S303.2: setting a band-pass filter group, wherein parameters of the band-pass filter for filtering harmonic components of the b th and c th positions are obtained by the following formula:
fstop(b,c)=[fbc-0.25fbt,fbc+0.25fbt]
fpass(b,c)=[fbc-0.05fbt,fbc+0.05fbt]
Wherein f stop (b, c) is the stop band frequency of the band-pass filter, f pass (b, c) is the pass band frequency of the band-pass filter, the band-pass filter is selected as an FIR filter in the digital filter, a window function method is selected as the implementation of the FIR filter, and a Hanning window is selected as a window function type;
Step S303.3: filtering the gaohu musical sound by using a band-pass filter bank to obtain total I multiplied by 4 pure harmonic time domain components, wherein the time domain length of each component is the same as that of the original musical sound signal;
Step S303.4: framing the I×4 clean harmonic time domain components using the method of step S103.1 to obtain F×I×4 harmonic components altogether, where the clean harmonic components at the b-th and c-th positions of the a-th frame can be denoted as p_ har (a, b, c);
s304, adjusting harmonic energy; if a huqin tone with a pitch q is to be synthesized from a hogfennel tone with a pitch p, then p_ har is energy-adjusted using ESM pqo extracted in step S2; the specific adjustment is shown in the following formula;
s305, restoring the time domain signal by an overlap-add method; the superposition step is the same as step S302.3, in order to smooth the time domain waveform, the amplitudes of different frames have good transition, the Hanning window is used for windowing each frame of data, the two ends of each frame of data are zero, and when the two frames are spliced, the overlapped parts are directly added, so that the synthesized Huqin musical sound with the pitch of q is finally obtained;
S306, repeating the steps S301 to S305 for the musical sound with the MIDI pitch of 67, wherein the time-varying harmonic energy structure mapping relation selected in the step S304 is ESM pq1, and synthesizing the medium-tone Gaohu musical sound with the MIDI pitch range of 48 to 66; repeating steps S301 to S305 for the musical sound with the tone pitch of 67 of the hogfennel, wherein the time-varying harmonic energy structure mapping relation selected in the step S304 is ESM pq2, and synthesizing low tone pitch Hu Leyin with tone pitches ranging from 36 to 66; repeating steps S301 to S305 for a musical tone having a gaohu pitch MIDI pitch of 67, wherein the time-varying harmonic energy structure mapping relationship selected in step S304 is ESM pq3, and synthesizing a double low pitch Hu Leyin having a pitch range of 24 to 66.
CN202410656298.5A 2024-05-24 2024-05-24 High Hu Leyin range expansion method based on time-varying harmonic energy structure Pending CN118506753A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410656298.5A CN118506753A (en) 2024-05-24 2024-05-24 High Hu Leyin range expansion method based on time-varying harmonic energy structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410656298.5A CN118506753A (en) 2024-05-24 2024-05-24 High Hu Leyin range expansion method based on time-varying harmonic energy structure

Publications (1)

Publication Number Publication Date
CN118506753A true CN118506753A (en) 2024-08-16

Family

ID=92240368

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410656298.5A Pending CN118506753A (en) 2024-05-24 2024-05-24 High Hu Leyin range expansion method based on time-varying harmonic energy structure

Country Status (1)

Country Link
CN (1) CN118506753A (en)

Similar Documents

Publication Publication Date Title
Serra Musical sound modeling with sinusoids plus noise
EP1125273B1 (en) Fast find fundamental method
US6298322B1 (en) Encoding and synthesis of tonal audio signals using dominant sinusoids and a vector-quantized residual tonal signal
KR20180050652A (en) Method and system for decomposing sound signals into sound objects, sound objects and uses thereof
Brown Frequency ratios of spectral components of musical sounds
JP7359164B2 (en) Sound signal synthesis method and neural network training method
US20060217984A1 (en) Critical band additive synthesis of tonal audio signals
Traube et al. Extracting the fingering and the plucking points on a guitar string from a recording
Fragoulis et al. Automated classification of piano-guitar notes
US6965069B2 (en) Programmable melody generator
CN118506753A (en) High Hu Leyin range expansion method based on time-varying harmonic energy structure
US10319353B2 (en) Method for audio sample playback using mapped impulse responses
US20230377591A1 (en) Method and system for real-time and low latency synthesis of audio using neural networks and differentiable digital signal processors
Bahre et al. Novel audio feature set for monophonie musical instrument classification
Dubnov Polyspectral analysis of musical timbre
Yokoyama et al. Relation between violin timbre and harmony overtone
Derrien A very low latency pitch tracker for audio to MIDI conversion
Dziubiński et al. High accuracy and octave error immune pitch detection algorithms
Cosi et al. Timbre classification by NN and auditory modeling
JP3444396B2 (en) Speech synthesis method, its apparatus and program recording medium
Tolonen Object-based sound source modeling for musical signals
Fonseca et al. Low-latency f0 estimation for the finger plucked electric bass guitar using the absolute difference function
CN118430485A (en) Musical instrument tone color conversion method based on musical tone signal harmonic energy
CN116959503B (en) Sliding sound audio simulation method and device, storage medium and electronic equipment
EP3929914A1 (en) Sound signal synthesis method, generative model training method, sound signal synthesis system, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination