CN104183245A

CN104183245A - Method and device for recommending music stars with tones similar to those of singers

Info

Publication number: CN104183245A
Application number: CN201410448290.6A
Authority: CN
Inventors: 王子亮; 刘旺; 邹应双; 蔡智力
Original assignee: Fujian Star Net eVideo Information Systems Co Ltd
Current assignee: Fujian Kaimi Network Science & Technology Co Ltd
Priority date: 2014-09-04
Filing date: 2014-09-04
Publication date: 2014-12-03

Abstract

The invention provides a method for recommending music stars with tones similar to those of singers. The method includes the steps that pure human voice frequencies are obtained, preprocessing is conducted on the pure human voice frequencies, a voice feature coefficient set of each pure human voice frequency is extracted, and corresponding music star models are trained through the voice model algorithm; preprocessing is conducted on given user voice samples, and feature coefficient sets are extracted; the feature coefficient sets of the user voice samples are matched with all the music star models, and the music stars with the tones most similar to the tones of the singers are found out. The invention further provides a corresponding device. The method and device for recommending the music stars with the tones similar to those of the singers can be applied to KTV scenes, are used for recommending music stars with the tones similar to those of users, can increase singing pleasure, and improve the level of the users for simulating the tones of the music stars.

Description

Singer's recommend method and device that a kind of singer's tone color is similar

[technical field]

The present invention relates to intelligent sound technical field, be specifically related to a kind of singer's tone color similar singer's recommend method and device.

[background technology]

Universal along with intelligent terminal, people are more and more higher to the requirement of life Intelligent Service, speech-sound intelligent can change service become people in the urgent need to.

The assessment method that has couple singer " to carry a tune inaccurate " in existing singing evaluation and test technology, such as accuracy in pitch scoring technology, but lessly passes a judgement to " singing to such an extent that like " or " sing picture who ".The intellectuality of K song system, in the urgent need to a kind of technology, can go out and the immediate singer of its tone color according to user's Sound Match, and then the song of recommending corresponding singer to user, thereby increases the enjoyment that user sings, and improves the level that user imitates singer's tone color.

[summary of the invention]

The singer's recommend method that provides a kind of singer's tone color similar is provided one of technical matters to be solved by this invention, is embodied as the function that singer finds out the singer similar to its tone color.

The present invention is what one of to solve the problems of the technologies described above by the following technical solutions:

Singer's recommend method that singer's tone color is similar, comprises the steps:

Audio repository is processed: obtain pure people's sound audio of all singers, purer people's sound audio is carried out to pre-service, then extract respectively the sound characteristic coefficient set of each pure people's sound audio;

Singer's model training: according to the corresponding characteristic coefficient collection of each singer, go out corresponding singer's model with sound model Algorithm for Training;

Tone color coupling: for given user's sample sound, carry out pre-service, and extract characteristic coefficient collection; Then the characteristic coefficient collection of user voice sample is mated with all singer's models, find out the most similar singer of tone color.

Further, pure people's sound audio acquisition pattern of described singer comprises: by song, go accompaniment mode to obtain.

Further, described singer's model training step comprises: first all sound characteristic coefficient set of extracting in audio repository are concentrated in together and train universal background model UBM; Then according to the corresponding characteristic coefficient collection of each singer, utilize universal background model UBM adaptive training to go out the model of all singers in audio repository.

Further, in described tone color coupling step, the operation of " characteristic coefficient of user voice sample being mated with all singer's models; find out the most similar singer of tone color " comprising: the characteristic coefficient collection that calculates user voice sample and singer's model and with the log-likelihood ratio of universal model UBM, using the corresponding singer of log-likelihood ratio maximal value as recommendation singer.

Further, described sound characteristic coefficient, a kind of in MFCC, LPCC, LSP, PLP.

Further, the pre-treatment step in described audio repository treatment step and tone color coupling step all comprises successively: minute frame, windowing, go quiet;

Describedly go quietly, comprise the steps:

Calculate the short-time energy of every frame, formula is:

E_{n} = Σ_{m = 0}^{N - 1} {[w (m) x (n + m)]}^{2}

In above formula, w represents window function, and x is voice signal, n=0, and 1L, 2L ..., N is frame length, L is that frame moves length;

When the short-time energy of this frame is during lower than a certain threshold value, just think that it is mute frame, directly remove.

Further, described adaptive training goes out the model of all singers in audio repository, adopts Bayesian adaptation, specifically comprises:

I for UBM is mixed member, calculates the posterior probability of component i:

P (i | x_{i}) = \frac{w_{i} p_{i} (x_{t})}{Σ_{j = 1}^{M} w_{j} p_{j} (x_{t})}

X representation feature coefficient wherein, w represents weight coefficient;

Then calculate weight, average and variance:

\begin{matrix} n_{i} = Σ_{t = 1}^{T} p (i | x_{t}), E_{i} (x) = \frac{1}{n} Σ_{t = 1}^{T} P (i | x_{t}) x_{t} & E_{i} (x^{2}) = \frac{1}{n} Σ_{t = 1}^{T} p (i | x_{t}) x_{t}^{2} \end{matrix}

Then revise the parameter w of each Gaussian distribution in old UBM _i, μ _i,

Revised new weight:

{\overset{Λ}{w}}_{i} = [α_{i}^{w} n_{i} / T + (1 - α_{i}^{w}) w_{i}] γ;

Revised new average:

{\overset{Λ}{μ}}_{i} = α_{i}^{m} E_{i} (x) + (1 - α_{i}^{m}) μ_{i};

Revised new variance:

{\overset{Λ}{δ}}_{i}^{2} = α_{i}^{v} E_{i} (x^{2}) + (1 - α_{i}^{v}) (δ_{i}^{2} + μ_{i}^{2}) - {\overset{Λ}{μ}}_{i}^{2};

Wherein, γ is rule factor, is used for guaranteeing and be 1, be respectively i Gauss's weight, average, the modifying factor of variance,

in formula, r ^ρfor constant, be used for retraining the variation yardstick of modifying factor.

Further, the characteristic coefficient collection of described calculating user voice sample and singer's model and with the log-likelihood ratio of universal model UBM, formula is:

S (X) = \frac{1}{T} Σ_{t = 1}^{T} \log p (x_{t} | λ_{star}) - \log p (x_{t} | λ_{ubm}),

X representation feature coefficient wherein, T represents frame number, λ _star, λ _ubmrepresent singer's model and UBM model, p represents the likelihood score of singer's model or UBM model output characteristic vector sequence.

The present invention also provides a kind of singer's tone color similar singer's recommendation apparatus, and it comprises: audio repository processing module, singer's model training module and tone color matching module,

Audio repository processing module: for obtaining pure people's sound audio of all singers, purer people's sound audio is carried out to pre-service, then extract respectively the sound characteristic coefficient set of each pure people's sound audio;

Singer's model training module: for according to the corresponding characteristic coefficient collection of each singer, adopt sound model Algorithm for Training to go out corresponding singer's model;

Tone color matching module: for the sample sound of the user to given, carry out pre-service, and extract characteristic coefficient collection; Then the characteristic coefficient collection of user voice sample is mated with all singer's models, find out the most similar singer of tone color.

Further, described singer's model training module comprises: all sound characteristic coefficient set of extracting in audio repository are concentrated in together and train universal background model UBM;

Then according to the corresponding characteristic coefficient collection of each singer, utilize universal background model UBM adaptive training to go out the model of all singers in audio repository.

Further, in described tone color matching module, the operation of " characteristic coefficient of user voice sample being mated with all singer's models; find out the most similar singer of tone color " comprising: the characteristic coefficient collection that calculates user voice sample and singer's model and with the log-likelihood ratio of universal model UBM, using the corresponding singer of log-likelihood ratio maximal value as recommendation singer.

Further, the pre-treatment step in described audio repository processing module and tone color matching module all comprises successively: minute frame, windowing, go quiet;

Describedly go quietly, comprise the steps:

Calculate the short-time energy of every frame, formula is:

E_{n} = Σ_{m = 0}^{N - 1} {[w (m) x (n + m)]}^{2}

I for UBM is mixed member, calculates the posterior probability of component i:

P (i | x_{i}) = \frac{w_{i} p_{i} (x_{t})}{Σ_{j = 1}^{M} w_{j} p_{j} (x_{t})}

X representation feature coefficient wherein, w represents weight coefficient;

Then calculate weight, average and variance:

\begin{matrix} n_{i} = Σ_{t = 1}^{T} p (i | x_{t}), E_{i} (x) = \frac{1}{n} Σ_{t = 1}^{T} P (i | x_{t}) x_{t} & E_{i} (x^{2}) = \frac{1}{n} Σ_{t = 1}^{T} p (i | x_{t}) x_{t}^{2} \end{matrix}

Then revise the parameter w of each Gaussian distribution in old UBM _i, μ _i,

Revised new weight:

{\overset{Λ}{w}}_{i} = [α_{i}^{w} n_{i} / T + (1 - α_{i}^{w}) w_{i}] γ;

Revised new average:

{\overset{Λ}{μ}}_{i} = α_{i}^{m} E_{i} (x) + (1 - α_{i}^{m}) μ_{i};

Revised new variance:

{\overset{Λ}{δ}}_{i}^{2} = α_{i}^{v} E_{i} (x^{2}) + (1 - α_{i}^{v}) (δ_{i}^{2} + μ_{i}^{2}) - {\overset{Λ}{μ}}_{i}^{2};

S (X) = \frac{1}{T} Σ_{t = 1}^{T} \log p (x_{t} | λ_{star}) - \log p (x_{t} | λ_{ubm}),

The invention has the advantages that: the present invention proposes a kind of singer's tone color similar singer's recommend method and device, for singer finds out the singer similar to its tone color as a reference, increase the enjoyment of singing.Be applied to, in KTV scene, can attract a large number of users, boost consumption, and improve the level that user imitates singer's tone color.

[accompanying drawing explanation]

The invention will be further described in conjunction with the embodiments with reference to the accompanying drawings.

Fig. 1 is the process flow diagram of method audio repository processing of the present invention and singer's model training process.

Fig. 2 is single singer's model training process flow diagram in method of the present invention.

Fig. 3 is method tone color matching process process flow diagram of the present invention.

Fig. 4 calculates likelihood ratio process flow diagram in method middle pitch colour matching process of the present invention.

Fig. 5 is apparatus structure schematic diagram of the present invention.

[embodiment]

The first embodiment:

Below this embodiment is described in detail.

S1: audio repository processing procedure (as shown in Figure 1):

S11: prepare audio repository, collect some songs of the singer of some, such as 300 singers, the corresponding stereo audio of each singer's 5 song;

S12: all songs in audio repository are removed to accompaniment and obtain pure people's audio frequency, its method can referenced patent name be called the disposal route of a < < stereo audio and install > >, and number of patent application is: 201410263446.3 Chinese invention patent.The method is mainly utilized between stereo left and right acoustic channels the otherness of accompaniment and voice, accompaniment is suppressed to filtering, thereby extract voice.It is to reduce accompaniment composition in song for the impact of singer's tone color model training that song is gone to the object of accompaniment.

All songs in audio repository are removed to accompaniment and obtain pure people's sound audio, specifically comprise:

The left and right sound track signals of stereo audio is transformed to frequency domain;

Calculate the right amplitude ratio of L channel frequency-region signal frequency corresponding to R channel frequency-region signal, frequency to amplitude ratio in preset range is classified frequency to be decayed as, and calculate the right phase differential of L channel frequency-region signal frequency corresponding to R channel frequency-region signal, the frequency by phase differential difference in preset range is also classified frequency to be decayed as; The computing formula of described amplitude ratio is:

k _n(i)＝abs(fft_frameR _n(i))/abs(fft_frameL _n(i))*(2/π)，

N=0 in formula, 1,2 ..., N-1, represents frame number i=0,1,2 ..., FN/2, FN represents counting of Fourier transform, the computing formula of phase differential is:

p _n(i)＝angel(fft_frameL _n(i))-angel(fft_frameR _n(i))，

n＝0，1，2，…，N-1；i＝0，1，2，…，FN/2；

Then, filter out frequency to be decayed, namely amplitude ratio is dropped on to the frequency of certain limit, its intermediate-frequeney point i meets

K _n(i) < α or k _n(i) > β, 0< α <0.5,0.5< β <1, α gets 0.4, β and gets 0.6,

Or phase difference value is dropped on to the frequency of certain limit, wherein i meets

P _n(i) < φ or here φ gets-0.1, get 0.1, classify frequency to be decayed as;

Treat the frequency of decay, the composition of accompanying carries out attenuation processing, and formula is:

Fft_frameR _n(i)=0 or fft_frameL _n(i)=0, in formula, i is frequency to be decayed;

Frequency-region signal after decay is inversely transformed into time domain, can obtains removing the song audio frequency after accompaniment.

In other embodiments, also can obtain pure people's sound audio by additive method, be not limited to above-mentioned algorithm.

In other embodiments, if collected pure people's sound audio of all singers in step S11, skip over step S12.

S13: the song of removing after accompaniment is carried out to pre-service, comprising: minute frame, windowing, go quiet;

Divide frame, refer to sound signal is divided into some frames, every frame comprises the sampled voice point of predetermined number, and between consecutive frame, has the coincidence sampled point of predetermined number;

Windowing, adopts and adds Hanning window filtering processing, can also be other windowing mode.

Go quietly, comprising:

Calculate the short-time energy of every frame, formula is:

E_{n} = Σ_{m = 0}^{N - 1} {[w (m) x (n + m)]}^{2}

When the short-time energy of this frame is during lower than a certain threshold value, just think that it is mute frame, directly remove.Quietly do not comprise effective sound characteristic, therefore need to remove.

S14: to pretreated audio extraction sound characteristic coefficient.Described sound characteristic coefficient can be a kind of in MFCC, LPCC, LSP, PLP; MFCC refers to Mel frequency cepstral coefficient, and LPCC refers to linear prediction cepstrum coefficient coefficient, and LSP refers to line spectrum pair coefficient, and PLP refers to perception linear predictor coefficient, and these coefficients can characterize the tamber characteristic of sound well, can choose any one kind of them.The present invention preferably extracts MFCC or LPCC sound characteristic coefficient.

S2: singer's model training process, as shown in Fig. 1～Fig. 2.

The sound characteristic coefficient of extraction is concentrated in together and trains universal background model UBM, and according to the corresponding sound characteristic coefficient set of each singer, utilize background model UBM adaptive training to go out the model of all singers in audio repository.UBM model is the Gauss model of a high degree of mixing in fact, and its training process and GMM are similar, adopts EM iterative algorithm, does not describe in detail here.

Adaptive training goes out singer's model process, as shown in Figure 2, adopts Bayesian adaptation, specific as follows:

I for UBM is mixed member, calculates the posterior probability of component i:

P (i | x_{i}) = \frac{w_{i} p_{i} (x_{t})}{Σ_{j = 1}^{M} w_{j} p_{j} (x_{t})}

X representation feature coefficient wherein, w represents weight coefficient;

Then calculate weight, average and variance:

\begin{matrix} n_{i} = Σ_{t = 1}^{T} p (i | x_{t}), E_{i} (x) = \frac{1}{n} Σ_{t = 1}^{T} P (i | x_{t}) x_{t} & E_{i} (x^{2}) = \frac{1}{n} Σ_{t = 1}^{T} p (i | x_{t}) x_{t}^{2} \end{matrix}

Then revise the parameter w of each Gaussian distribution in old UBM _i, μ _i,

Revised new weight:

{\overset{Λ}{w}}_{i} = [α_{i}^{w} n_{i} / T + (1 - α_{i}^{w}) w_{i}] γ;

Revised new average:

{\overset{Λ}{μ}}_{i} = α_{i}^{m} E_{i} (x) + (1 - α_{i}^{m}) μ_{i};

Revised new variance:

{\overset{Λ}{δ}}_{i}^{2} = α_{i}^{v} E_{i} (x^{2}) + (1 - α_{i}^{v}) (δ_{i}^{2} + μ_{i}^{2}) - {\overset{Λ}{μ}}_{i}^{2};

in formula, r ^ρfor constant, be used for retraining the variation yardstick of modifying factor, generally select 16.

This step can train a general UBM model and all singers' tone color model.

S3: tone color matching process (as shown in Fig. 3～4):

S31: user voice sample process: for given user, singer's sample sound, carries out pre-service equally, and extracts sound characteristic coefficient;

S32: then calculate the sound characteristic coefficient that extracts and singer's model and with the log-likelihood ratio (as shown in Figure 4) of universal model UBM, using the corresponding singer of log-likelihood ratio maximal value as recommendation singer.

The computing formula of log-likelihood ratio is:

S (X) = Σ_{t = 1}^{T} \log p (x_{t} | λ_{star}) - \log p (x_{t} | λ_{ubm}),

X representation feature coefficient wherein, λ _star, λ _ubmrepresent singer's model and UBM model, p represents the likelihood score of singer's model or UBM model output characteristic vector sequence;

Adopt the log-likelihood ratio of time normalization herein,

S (X) = \frac{1}{T} Σ_{t = 1}^{T} \log p (x_{t} | λ_{star}) - \log p (x_{t} | λ_{ubm}) .

This step can be found out the singer approaching with user's tone color as recommendation, thereby increases the enjoyment that user sings.

In other embodiments, also can use the sound models such as GMM, HMM as the method for singer's model training and tone color coupling.

The second embodiment:

Singer's recommendation apparatus that singer's tone color is similar, it comprises: audio repository processing module, singer's model training module and tone color matching module,

Specifically describe this embodiment below.

Singer's recommendation apparatus that singer's tone color is similar, as shown in Figure 5, comprising:

Audio repository processing module, obtains pure people's sound audio for all songs of audio repository are removed to accompaniment, purer people's sound audio is carried out to pre-service, then to pretreated audio extraction sound characteristic coefficient;

Singer's model training module, for the sound characteristic coefficient of extraction is concentrated in together and trains universal background model UBM, and according to the corresponding sound characteristic coefficient set of each singer, utilize background model UBM adaptive training to go out the model of all singers in audio repository;

Tone color matching module, for carrying out pre-service and extracting sound characteristic coefficient to given user's sample sound; Then the sound characteristic coefficient calculate extracting and singer's model and with the log-likelihood ratio of universal model UBM, using the corresponding singer of log-likelihood ratio maximal value as recommendation singer.

All songs in audio repository are removed to the method that accompaniment obtains pure people's sound audio, referenced patent name is called the disposal route and device > > of a < < stereo audio, and number of patent application is: 201410263446.3 Chinese invention patent.The method is mainly utilized between stereo left and right acoustic channels the otherness of accompaniment and voice, accompaniment is suppressed to filtering, thereby extract voice.

Specifically comprise:

k _n(i)＝abs(fft_frameR _n(i))/abs(fft_frameL _n(i))*(2/π)，

p _n(i)＝angel(fft_frameL _n(i))-angel(fft_frameR _n(i))，

n＝0，1，2，…，N-1；i＝0，1，2，…，FN/2；

P _n(i) < φ or here φ gets-0.1, get 0.1, classify frequency to be decayed as;

Described sound characteristic coefficient, a kind of in MFCC, LPCC, LSP, PLP.

Pre-service in described audio repository processing module and tone color matching module, comprising: minute frame, windowing, go quiet;

Described minute frame, refers to sound signal is divided into some frames, and every frame comprises the sampled voice point of predetermined number, and between consecutive frame, has the coincidence sampled point of predetermined number;

Described windowing, refers to that adding Hanning window filtering processes.

In described pre-treatment step, go quiet operation, comprising:

Calculate the short-time energy of every frame, formula is:

E_{n} = Σ_{m = 0}^{N - 1} {[w (m) x (n + m)]}^{2}

Adaptive training in described singer's model training module goes out singer's model process, adopts Bayesian adaptation, specifically comprises:

I for UBM is mixed member, calculates the posterior probability of component i:

P (i | x_{i}) = \frac{w_{i} p_{i} (x_{t})}{Σ_{j = 1}^{M} w_{j} p_{j} (x_{t})}

X representation feature coefficient wherein, w represents weight coefficient;

Then calculate weight, average and variance:

\begin{matrix} n_{i} = Σ_{t = 1}^{T} p (i | x_{t}), E_{i} (x) = \frac{1}{n} Σ_{t = 1}^{T} P (i | x_{t}) x_{t} & E_{i} (x^{2}) = \frac{1}{n} Σ_{t = 1}^{T} p (i | x_{t}) x_{t}^{2} \end{matrix}

Then revise the parameter w of each Gaussian distribution in old UBM _i, μ _i,

Revised new weight:

{\overset{Λ}{w}}_{i} = [α_{i}^{w} n_{i} / T + (1 - α_{i}^{w}) w_{i}] γ;

Revised new average:

{\overset{Λ}{μ}}_{i} = α_{i}^{m} E_{i} (x) + (1 - α_{i}^{m}) μ_{i};

Revised new variance:

{\overset{Λ}{δ}}_{i}^{2} = α_{i}^{v} E_{i} (x^{2}) + (1 - α_{i}^{v}) (δ_{i}^{2} + μ_{i}^{2}) - {\overset{Λ}{μ}}_{i}^{2};

The computing formula of the log-likelihood ratio in described tone color matching module is:

S (X) = Σ_{t = 1}^{T} \log p (x_{t} | λ_{star}) - \log p (x_{t} | λ_{ubm}),

Adopt the log-likelihood ratio of time normalization herein,

S (X) = \frac{1}{T} Σ_{t = 1}^{T} \log p (x_{t} | λ_{star}) - \log p (x_{t} | λ_{ubm}) .

The present invention proposes a kind of singer's tone color similar singer's recommend method and device thereof, for singer finds out the singer similar to its tone color as a reference, can increase the enjoyment of performance.Be applied to, in KTV scene, can attract a large number of users, boost consumption, and improve the level that user imitates singer's tone color.

The foregoing is only better enforcement use-case of the present invention, be not intended to limit protection scope of the present invention.Within the spirit and principles in the present invention all, any modification of doing, be equal to and replace and improvement etc., within all should being included in protection scope of the present invention.

Claims

1. the similar singer's recommend method of singer's tone color, is characterized in that: comprise the steps:

2. the similar singer's recommend method of a kind of singer's tone color as claimed in claim 1, is characterized in that: pure people's sound audio acquisition pattern of described singer comprises: by song, go accompaniment mode to obtain.

3. the similar singer's recommend method of a kind of singer's tone color as claimed in claim 1, is characterized in that: described singer's model training step comprises: first all sound characteristic coefficient set of extracting in audio repository are concentrated in together and train universal background model UBM; Then according to the corresponding characteristic coefficient collection of each singer, utilize universal background model UBM adaptive training to go out the model of all singers in audio repository.

4. the similar singer's recommend method of a kind of singer's tone color as claimed in claim 1, it is characterized in that: in described tone color coupling step, the operation of " characteristic coefficient of user voice sample being mated with all singer's models; find out the most similar singer of tone color " comprising: the characteristic coefficient collection that calculates user voice sample and singer's model and with the log-likelihood ratio of universal model UBM, using the corresponding singer of log-likelihood ratio maximal value as recommendation singer.

5. the similar singer's recommend method of a kind of singer's tone color as claimed in claim 1, is characterized in that: described sound characteristic coefficient, a kind of in MFCC, LPCC, LSP, PLP.

6. the similar singer's recommend method of a kind of singer's tone color as claimed in claim 1, is characterized in that: the pre-treatment step in described audio repository treatment step and tone color coupling step all comprises successively: minute frame, windowing, go quiet;

Describedly go quietly, comprise the steps:

Calculate the short-time energy of every frame, formula is:

E_{n} = Σ_{m = 0}^{N - 1} {[w (m) x (n + m)]}^{2}

7. the similar singer's recommend method of a kind of singer's tone color as claimed in claim 3, is characterized in that: described adaptive training goes out the model of all singers in audio repository, adopts Bayesian adaptation, specifically comprises:

I for UBM is mixed member, calculates the posterior probability of component i:

P (i | x_{i}) = \frac{w_{i} p_{i} (x_{t})}{Σ_{j = 1}^{M} w_{j} p_{j} (x_{t})}

X representation feature coefficient wherein, w represents weight coefficient;

Then calculate weight, average and variance:

\begin{matrix} n_{i} = Σ_{t = 1}^{T} p (i | x_{t}), E_{i} (x) = \frac{1}{n} Σ_{t = 1}^{T} P (i | x_{t}) x_{t} & E_{i} (x^{2}) = \frac{1}{n} Σ_{t = 1}^{T} p (i | x_{t}) x_{t}^{2} \end{matrix}

Then revise the parameter w of each Gaussian distribution in old UBM _i, μ _i,

Revised new weight:

{\overset{Λ}{w}}_{i} = [α_{i}^{w} n_{i} / T + (1 - α_{i}^{w}) w_{i}] γ;

Revised new average:

{\overset{Λ}{μ}}_{i} = α_{i}^{m} E_{i} (x) + (1 - α_{i}^{m}) μ_{i};

Revised new variance:

{\overset{Λ}{δ}}_{i}^{2} = α_{i}^{v} E_{i} (x^{2}) + (1 - α_{i}^{v}) (δ_{i}^{2} + μ_{i}^{2}) - {\overset{Λ}{μ}}_{i}^{2};

8. the similar singer's recommend method of a kind of singer's tone color as claimed in claim 4, is characterized in that: the computing formula of described log-likelihood ratio is:

S (X) = \frac{1}{T} Σ_{t = 1}^{T} \log p (x_{t} | λ_{star}) - \log p (x_{t} | λ_{ubm}),

9. the similar singer's recommendation apparatus of singer's tone color, is characterized in that: comprising: audio repository processing module, singer's model training module and tone color matching module,

10. the similar singer's recommendation apparatus of a kind of singer's tone color as claimed in claim 9, is characterized in that: pure people's sound audio acquisition pattern of described singer comprises: by song, go accompaniment mode to obtain.

The similar singer's recommendation apparatus of 11. a kind of singer's tone color as claimed in claim 9, is characterized in that: described singer's model training module comprises: all sound characteristic coefficient set of extracting in audio repository are concentrated in together and train universal background model UBM;

The similar singer's recommendation apparatus of 12. a kind of singer's tone color as claimed in claim 9, it is characterized in that: in described tone color matching module, the operation of " characteristic coefficient of user voice sample being mated with all singer's models; find out the most similar singer of tone color " comprising: the characteristic coefficient collection that calculates user voice sample and singer's model and with the log-likelihood ratio of universal model UBM, using the corresponding singer of log-likelihood ratio maximal value as recommendation singer.

The similar singer's recommend method of 13. a kind of singer's tone color as claimed in claim 9, is characterized in that: the pre-treatment step in described audio repository processing module and tone color matching module all comprises successively: minute frame, windowing, go quiet;

Describedly go quietly, comprise the steps:

Calculate the short-time energy of every frame, formula is:

E_{n} = Σ_{m = 0}^{N - 1} {[w (m) x (n + m)]}^{2}

The similar singer's recommendation apparatus of 14. a kind of singer's tone color as claimed in claim 11, is characterized in that: described adaptive training goes out the model of all singers in audio repository, adopts Bayesian adaptation, specifically comprises:

I for UBM is mixed member, calculates the posterior probability of component i:

P (i | x_{i}) = \frac{w_{i} p_{i} (x_{t})}{Σ_{j = 1}^{M} w_{j} p_{j} (x_{t})}

X representation feature coefficient wherein, w represents weight coefficient;

Then calculate weight, average and variance:

\begin{matrix} n_{i} = Σ_{t = 1}^{T} p (i | x_{t}), E_{i} (x) = \frac{1}{n} Σ_{t = 1}^{T} P (i | x_{t}) x_{t} & E_{i} (x^{2}) = \frac{1}{n} Σ_{t = 1}^{T} p (i | x_{t}) x_{t}^{2} \end{matrix}

Then revise the parameter w of each Gaussian distribution in old UBM _i, μ _i,

Revised new weight:

{\overset{Λ}{w}}_{i} = [α_{i}^{w} n_{i} / T + (1 - α_{i}^{w}) w_{i}] γ;

Revised new average:

{\overset{Λ}{μ}}_{i} = α_{i}^{m} E_{i} (x) + (1 - α_{i}^{m}) μ_{i};

Revised new variance:

{\overset{Λ}{δ}}_{i}^{2} = α_{i}^{v} E_{i} (x^{2}) + (1 - α_{i}^{v}) (δ_{i}^{2} + μ_{i}^{2}) - {\overset{Λ}{μ}}_{i}^{2};

The similar singer's recommendation apparatus of 15. a kind of singer's tone color as claimed in claim 12, is characterized in that: the computing formula of described log-likelihood ratio is:

S (X) = \frac{1}{T} Σ_{t = 1}^{T} \log p (x_{t} | λ_{star}) - \log p (x_{t} | λ_{ubm}),