CN104183245A - Method and device for recommending music stars with tones similar to those of singers - Google Patents

Method and device for recommending music stars with tones similar to those of singers Download PDF

Info

Publication number
CN104183245A
CN104183245A CN201410448290.6A CN201410448290A CN104183245A CN 104183245 A CN104183245 A CN 104183245A CN 201410448290 A CN201410448290 A CN 201410448290A CN 104183245 A CN104183245 A CN 104183245A
Authority
CN
China
Prior art keywords
singer
model
tone color
ubm
similar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410448290.6A
Other languages
Chinese (zh)
Inventor
王子亮
刘旺
邹应双
蔡智力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Kaimi Network Science & Technology Co Ltd
Original Assignee
Fujian Star Net eVideo Information Systems Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Star Net eVideo Information Systems Co Ltd filed Critical Fujian Star Net eVideo Information Systems Co Ltd
Priority to CN201410448290.6A priority Critical patent/CN104183245A/en
Publication of CN104183245A publication Critical patent/CN104183245A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for recommending music stars with tones similar to those of singers. The method includes the steps that pure human voice frequencies are obtained, preprocessing is conducted on the pure human voice frequencies, a voice feature coefficient set of each pure human voice frequency is extracted, and corresponding music star models are trained through the voice model algorithm; preprocessing is conducted on given user voice samples, and feature coefficient sets are extracted; the feature coefficient sets of the user voice samples are matched with all the music star models, and the music stars with the tones most similar to the tones of the singers are found out. The invention further provides a corresponding device. The method and device for recommending the music stars with the tones similar to those of the singers can be applied to KTV scenes, are used for recommending music stars with the tones similar to those of users, can increase singing pleasure, and improve the level of the users for simulating the tones of the music stars.

Description

Singer's recommend method and device that a kind of singer's tone color is similar
[technical field]
The present invention relates to intelligent sound technical field, be specifically related to a kind of singer's tone color similar singer's recommend method and device.
[background technology]
Universal along with intelligent terminal, people are more and more higher to the requirement of life Intelligent Service, speech-sound intelligent can change service become people in the urgent need to.
The assessment method that has couple singer " to carry a tune inaccurate " in existing singing evaluation and test technology, such as accuracy in pitch scoring technology, but lessly passes a judgement to " singing to such an extent that like " or " sing picture who ".The intellectuality of K song system, in the urgent need to a kind of technology, can go out and the immediate singer of its tone color according to user's Sound Match, and then the song of recommending corresponding singer to user, thereby increases the enjoyment that user sings, and improves the level that user imitates singer's tone color.
[summary of the invention]
The singer's recommend method that provides a kind of singer's tone color similar is provided one of technical matters to be solved by this invention, is embodied as the function that singer finds out the singer similar to its tone color.
The present invention is what one of to solve the problems of the technologies described above by the following technical solutions:
Singer's recommend method that singer's tone color is similar, comprises the steps:
Audio repository is processed: obtain pure people's sound audio of all singers, purer people's sound audio is carried out to pre-service, then extract respectively the sound characteristic coefficient set of each pure people's sound audio;
Singer's model training: according to the corresponding characteristic coefficient collection of each singer, go out corresponding singer's model with sound model Algorithm for Training;
Tone color coupling: for given user's sample sound, carry out pre-service, and extract characteristic coefficient collection; Then the characteristic coefficient collection of user voice sample is mated with all singer's models, find out the most similar singer of tone color.
Further, pure people's sound audio acquisition pattern of described singer comprises: by song, go accompaniment mode to obtain.
Further, described singer's model training step comprises: first all sound characteristic coefficient set of extracting in audio repository are concentrated in together and train universal background model UBM; Then according to the corresponding characteristic coefficient collection of each singer, utilize universal background model UBM adaptive training to go out the model of all singers in audio repository.
Further, in described tone color coupling step, the operation of " characteristic coefficient of user voice sample being mated with all singer's models; find out the most similar singer of tone color " comprising: the characteristic coefficient collection that calculates user voice sample and singer's model and with the log-likelihood ratio of universal model UBM, using the corresponding singer of log-likelihood ratio maximal value as recommendation singer.
Further, described sound characteristic coefficient, a kind of in MFCC, LPCC, LSP, PLP.
Further, the pre-treatment step in described audio repository treatment step and tone color coupling step all comprises successively: minute frame, windowing, go quiet;
Describedly go quietly, comprise the steps:
Calculate the short-time energy of every frame, formula is:
E n = Σ m = 0 N - 1 [ w ( m ) x ( n + m ) ] 2
In above formula, w represents window function, and x is voice signal, n=0, and 1L, 2L ..., N is frame length, L is that frame moves length;
When the short-time energy of this frame is during lower than a certain threshold value, just think that it is mute frame, directly remove.
Further, described adaptive training goes out the model of all singers in audio repository, adopts Bayesian adaptation, specifically comprises:
I for UBM is mixed member, calculates the posterior probability of component i:
P ( i | x i ) = w i p i ( x t ) Σ j = 1 M w j p j ( x t )
X representation feature coefficient wherein, w represents weight coefficient;
Then calculate weight, average and variance:
n i = Σ t = 1 T p ( i | x t ) , E i ( x ) = 1 n Σ t = 1 T P ( i | x t ) x t E i ( x 2 ) = 1 n Σ t = 1 T p ( i | x t ) x t 2
Then revise the parameter w of each Gaussian distribution in old UBM i, μ i,
Revised new weight: w Λ i = [ α i w n i / T + ( 1 - α i w ) w i ] γ ;
Revised new average: μ Λ i = α i m E i ( x ) + ( 1 - α i m ) μ i ;
Revised new variance: δ Λ i 2 = α i v E i ( x 2 ) + ( 1 - α i v ) ( δ i 2 + μ i 2 ) - μ Λ i 2 ;
Wherein, γ is rule factor, is used for guaranteeing and be 1, be respectively i Gauss's weight, average, the modifying factor of variance,
in formula, r ρfor constant, be used for retraining the variation yardstick of modifying factor.
Further, the characteristic coefficient collection of described calculating user voice sample and singer's model and with the log-likelihood ratio of universal model UBM, formula is:
S ( X ) = 1 T Σ t = 1 T log p ( x t | λ star ) - log p ( x t | λ ubm ) ,
X representation feature coefficient wherein, T represents frame number, λ star, λ ubmrepresent singer's model and UBM model, p represents the likelihood score of singer's model or UBM model output characteristic vector sequence.
The present invention also provides a kind of singer's tone color similar singer's recommendation apparatus, and it comprises: audio repository processing module, singer's model training module and tone color matching module,
Audio repository processing module: for obtaining pure people's sound audio of all singers, purer people's sound audio is carried out to pre-service, then extract respectively the sound characteristic coefficient set of each pure people's sound audio;
Singer's model training module: for according to the corresponding characteristic coefficient collection of each singer, adopt sound model Algorithm for Training to go out corresponding singer's model;
Tone color matching module: for the sample sound of the user to given, carry out pre-service, and extract characteristic coefficient collection; Then the characteristic coefficient collection of user voice sample is mated with all singer's models, find out the most similar singer of tone color.
Further, pure people's sound audio acquisition pattern of described singer comprises: by song, go accompaniment mode to obtain.
Further, described singer's model training module comprises: all sound characteristic coefficient set of extracting in audio repository are concentrated in together and train universal background model UBM;
Then according to the corresponding characteristic coefficient collection of each singer, utilize universal background model UBM adaptive training to go out the model of all singers in audio repository.
Further, in described tone color matching module, the operation of " characteristic coefficient of user voice sample being mated with all singer's models; find out the most similar singer of tone color " comprising: the characteristic coefficient collection that calculates user voice sample and singer's model and with the log-likelihood ratio of universal model UBM, using the corresponding singer of log-likelihood ratio maximal value as recommendation singer.
Further, described sound characteristic coefficient, a kind of in MFCC, LPCC, LSP, PLP.
Further, the pre-treatment step in described audio repository processing module and tone color matching module all comprises successively: minute frame, windowing, go quiet;
Describedly go quietly, comprise the steps:
Calculate the short-time energy of every frame, formula is:
E n = Σ m = 0 N - 1 [ w ( m ) x ( n + m ) ] 2
In above formula, w represents window function, and x is voice signal, n=0, and 1L, 2L ..., N is frame length, L is that frame moves length;
When the short-time energy of this frame is during lower than a certain threshold value, just think that it is mute frame, directly remove.
Further, described adaptive training goes out the model of all singers in audio repository, adopts Bayesian adaptation, specifically comprises:
I for UBM is mixed member, calculates the posterior probability of component i:
P ( i | x i ) = w i p i ( x t ) Σ j = 1 M w j p j ( x t )
X representation feature coefficient wherein, w represents weight coefficient;
Then calculate weight, average and variance:
n i = Σ t = 1 T p ( i | x t ) , E i ( x ) = 1 n Σ t = 1 T P ( i | x t ) x t E i ( x 2 ) = 1 n Σ t = 1 T p ( i | x t ) x t 2
Then revise the parameter w of each Gaussian distribution in old UBM i, μ i,
Revised new weight: w Λ i = [ α i w n i / T + ( 1 - α i w ) w i ] γ ;
Revised new average: μ Λ i = α i m E i ( x ) + ( 1 - α i m ) μ i ;
Revised new variance: δ Λ i 2 = α i v E i ( x 2 ) + ( 1 - α i v ) ( δ i 2 + μ i 2 ) - μ Λ i 2 ;
Wherein, γ is rule factor, is used for guaranteeing and be 1, be respectively i Gauss's weight, average, the modifying factor of variance,
in formula, r ρfor constant, be used for retraining the variation yardstick of modifying factor.
Further, the characteristic coefficient collection of described calculating user voice sample and singer's model and with the log-likelihood ratio of universal model UBM, formula is:
S ( X ) = 1 T Σ t = 1 T log p ( x t | λ star ) - log p ( x t | λ ubm ) ,
X representation feature coefficient wherein, T represents frame number, λ star, λ ubmrepresent singer's model and UBM model, p represents the likelihood score of singer's model or UBM model output characteristic vector sequence.
The invention has the advantages that: the present invention proposes a kind of singer's tone color similar singer's recommend method and device, for singer finds out the singer similar to its tone color as a reference, increase the enjoyment of singing.Be applied to, in KTV scene, can attract a large number of users, boost consumption, and improve the level that user imitates singer's tone color.
[accompanying drawing explanation]
The invention will be further described in conjunction with the embodiments with reference to the accompanying drawings.
Fig. 1 is the process flow diagram of method audio repository processing of the present invention and singer's model training process.
Fig. 2 is single singer's model training process flow diagram in method of the present invention.
Fig. 3 is method tone color matching process process flow diagram of the present invention.
Fig. 4 calculates likelihood ratio process flow diagram in method middle pitch colour matching process of the present invention.
Fig. 5 is apparatus structure schematic diagram of the present invention.
[embodiment]
The first embodiment:
Singer's recommend method that singer's tone color is similar, comprises the steps:
Audio repository is processed: obtain pure people's sound audio of all singers, purer people's sound audio is carried out to pre-service, then extract respectively the sound characteristic coefficient set of each pure people's sound audio;
Singer's model training: according to the corresponding characteristic coefficient collection of each singer, go out corresponding singer's model with sound model Algorithm for Training;
Tone color coupling: for given user's sample sound, carry out pre-service, and extract characteristic coefficient collection; Then the characteristic coefficient collection of user voice sample is mated with all singer's models, find out the most similar singer of tone color.
Below this embodiment is described in detail.
Singer's recommend method that singer's tone color is similar, comprises the steps:
S1: audio repository processing procedure (as shown in Figure 1):
S11: prepare audio repository, collect some songs of the singer of some, such as 300 singers, the corresponding stereo audio of each singer's 5 song;
S12: all songs in audio repository are removed to accompaniment and obtain pure people's audio frequency, its method can referenced patent name be called the disposal route of a < < stereo audio and install > >, and number of patent application is: 201410263446.3 Chinese invention patent.The method is mainly utilized between stereo left and right acoustic channels the otherness of accompaniment and voice, accompaniment is suppressed to filtering, thereby extract voice.It is to reduce accompaniment composition in song for the impact of singer's tone color model training that song is gone to the object of accompaniment.
All songs in audio repository are removed to accompaniment and obtain pure people's sound audio, specifically comprise:
The left and right sound track signals of stereo audio is transformed to frequency domain;
Calculate the right amplitude ratio of L channel frequency-region signal frequency corresponding to R channel frequency-region signal, frequency to amplitude ratio in preset range is classified frequency to be decayed as, and calculate the right phase differential of L channel frequency-region signal frequency corresponding to R channel frequency-region signal, the frequency by phase differential difference in preset range is also classified frequency to be decayed as; The computing formula of described amplitude ratio is:
k n(i)=abs(fft_frameR n(i))/abs(fft_frameL n(i))*(2/π),
N=0 in formula, 1,2 ..., N-1, represents frame number i=0,1,2 ..., FN/2, FN represents counting of Fourier transform, the computing formula of phase differential is:
p n(i)=angel(fft_frameL n(i))-angel(fft_frameR n(i)),
n=0,1,2,…,N-1;i=0,1,2,…,FN/2;
Then, filter out frequency to be decayed, namely amplitude ratio is dropped on to the frequency of certain limit, its intermediate-frequeney point i meets
K n(i) < α or k n(i) > β, 0< α <0.5,0.5< β <1, α gets 0.4, β and gets 0.6,
Or phase difference value is dropped on to the frequency of certain limit, wherein i meets
P n(i) < φ or here φ gets-0.1, get 0.1, classify frequency to be decayed as;
Treat the frequency of decay, the composition of accompanying carries out attenuation processing, and formula is:
Fft_frameR n(i)=0 or fft_frameL n(i)=0, in formula, i is frequency to be decayed;
Frequency-region signal after decay is inversely transformed into time domain, can obtains removing the song audio frequency after accompaniment.
In other embodiments, also can obtain pure people's sound audio by additive method, be not limited to above-mentioned algorithm.
In other embodiments, if collected pure people's sound audio of all singers in step S11, skip over step S12.
S13: the song of removing after accompaniment is carried out to pre-service, comprising: minute frame, windowing, go quiet;
Divide frame, refer to sound signal is divided into some frames, every frame comprises the sampled voice point of predetermined number, and between consecutive frame, has the coincidence sampled point of predetermined number;
Windowing, adopts and adds Hanning window filtering processing, can also be other windowing mode.
Go quietly, comprising:
Calculate the short-time energy of every frame, formula is:
E n = &Sigma; m = 0 N - 1 [ w ( m ) x ( n + m ) ] 2
In above formula, w represents window function, and x is voice signal, n=0, and 1L, 2L ..., N is frame length, L is that frame moves length;
When the short-time energy of this frame is during lower than a certain threshold value, just think that it is mute frame, directly remove.Quietly do not comprise effective sound characteristic, therefore need to remove.
S14: to pretreated audio extraction sound characteristic coefficient.Described sound characteristic coefficient can be a kind of in MFCC, LPCC, LSP, PLP; MFCC refers to Mel frequency cepstral coefficient, and LPCC refers to linear prediction cepstrum coefficient coefficient, and LSP refers to line spectrum pair coefficient, and PLP refers to perception linear predictor coefficient, and these coefficients can characterize the tamber characteristic of sound well, can choose any one kind of them.The present invention preferably extracts MFCC or LPCC sound characteristic coefficient.
S2: singer's model training process, as shown in Fig. 1~Fig. 2.
The sound characteristic coefficient of extraction is concentrated in together and trains universal background model UBM, and according to the corresponding sound characteristic coefficient set of each singer, utilize background model UBM adaptive training to go out the model of all singers in audio repository.UBM model is the Gauss model of a high degree of mixing in fact, and its training process and GMM are similar, adopts EM iterative algorithm, does not describe in detail here.
Adaptive training goes out singer's model process, as shown in Figure 2, adopts Bayesian adaptation, specific as follows:
I for UBM is mixed member, calculates the posterior probability of component i:
P ( i | x i ) = w i p i ( x t ) &Sigma; j = 1 M w j p j ( x t )
X representation feature coefficient wherein, w represents weight coefficient;
Then calculate weight, average and variance:
n i = &Sigma; t = 1 T p ( i | x t ) , E i ( x ) = 1 n &Sigma; t = 1 T P ( i | x t ) x t E i ( x 2 ) = 1 n &Sigma; t = 1 T p ( i | x t ) x t 2
Then revise the parameter w of each Gaussian distribution in old UBM i, μ i,
Revised new weight: w &Lambda; i = [ &alpha; i w n i / T + ( 1 - &alpha; i w ) w i ] &gamma; ;
Revised new average: &mu; &Lambda; i = &alpha; i m E i ( x ) + ( 1 - &alpha; i m ) &mu; i ;
Revised new variance: &delta; &Lambda; i 2 = &alpha; i v E i ( x 2 ) + ( 1 - &alpha; i v ) ( &delta; i 2 + &mu; i 2 ) - &mu; &Lambda; i 2 ;
Wherein, γ is rule factor, is used for guaranteeing and be 1, be respectively i Gauss's weight, average, the modifying factor of variance,
in formula, r ρfor constant, be used for retraining the variation yardstick of modifying factor, generally select 16.
This step can train a general UBM model and all singers' tone color model.
S3: tone color matching process (as shown in Fig. 3~4):
S31: user voice sample process: for given user, singer's sample sound, carries out pre-service equally, and extracts sound characteristic coefficient;
S32: then calculate the sound characteristic coefficient that extracts and singer's model and with the log-likelihood ratio (as shown in Figure 4) of universal model UBM, using the corresponding singer of log-likelihood ratio maximal value as recommendation singer.
The computing formula of log-likelihood ratio is:
S ( X ) = &Sigma; t = 1 T log p ( x t | &lambda; star ) - log p ( x t | &lambda; ubm ) ,
X representation feature coefficient wherein, λ star, λ ubmrepresent singer's model and UBM model, p represents the likelihood score of singer's model or UBM model output characteristic vector sequence;
Adopt the log-likelihood ratio of time normalization herein,
S ( X ) = 1 T &Sigma; t = 1 T log p ( x t | &lambda; star ) - log p ( x t | &lambda; ubm ) .
This step can be found out the singer approaching with user's tone color as recommendation, thereby increases the enjoyment that user sings.
In other embodiments, also can use the sound models such as GMM, HMM as the method for singer's model training and tone color coupling.
The second embodiment:
Singer's recommendation apparatus that singer's tone color is similar, it comprises: audio repository processing module, singer's model training module and tone color matching module,
Audio repository processing module: for obtaining pure people's sound audio of all singers, purer people's sound audio is carried out to pre-service, then extract respectively the sound characteristic coefficient set of each pure people's sound audio;
Singer's model training module: for according to the corresponding characteristic coefficient collection of each singer, adopt sound model Algorithm for Training to go out corresponding singer's model;
Tone color matching module: for the sample sound of the user to given, carry out pre-service, and extract characteristic coefficient collection; Then the characteristic coefficient collection of user voice sample is mated with all singer's models, find out the most similar singer of tone color.
Specifically describe this embodiment below.
Singer's recommendation apparatus that singer's tone color is similar, as shown in Figure 5, comprising:
Audio repository processing module, obtains pure people's sound audio for all songs of audio repository are removed to accompaniment, purer people's sound audio is carried out to pre-service, then to pretreated audio extraction sound characteristic coefficient;
Singer's model training module, for the sound characteristic coefficient of extraction is concentrated in together and trains universal background model UBM, and according to the corresponding sound characteristic coefficient set of each singer, utilize background model UBM adaptive training to go out the model of all singers in audio repository;
Tone color matching module, for carrying out pre-service and extracting sound characteristic coefficient to given user's sample sound; Then the sound characteristic coefficient calculate extracting and singer's model and with the log-likelihood ratio of universal model UBM, using the corresponding singer of log-likelihood ratio maximal value as recommendation singer.
All songs in audio repository are removed to the method that accompaniment obtains pure people's sound audio, referenced patent name is called the disposal route and device > > of a < < stereo audio, and number of patent application is: 201410263446.3 Chinese invention patent.The method is mainly utilized between stereo left and right acoustic channels the otherness of accompaniment and voice, accompaniment is suppressed to filtering, thereby extract voice.
Specifically comprise:
The left and right sound track signals of stereo audio is transformed to frequency domain;
Calculate the right amplitude ratio of L channel frequency-region signal frequency corresponding to R channel frequency-region signal, frequency to amplitude ratio in preset range is classified frequency to be decayed as, and calculate the right phase differential of L channel frequency-region signal frequency corresponding to R channel frequency-region signal, the frequency by phase differential difference in preset range is also classified frequency to be decayed as; The computing formula of described amplitude ratio is:
k n(i)=abs(fft_frameR n(i))/abs(fft_frameL n(i))*(2/π),
N=0 in formula, 1,2 ..., N-1, represents frame number i=0,1,2 ..., FN/2, FN represents counting of Fourier transform, the computing formula of phase differential is:
p n(i)=angel(fft_frameL n(i))-angel(fft_frameR n(i)),
n=0,1,2,…,N-1;i=0,1,2,…,FN/2;
Then, filter out frequency to be decayed, namely amplitude ratio is dropped on to the frequency of certain limit, its intermediate-frequeney point i meets
K n(i) < α or k n(i) > β, 0< α <0.5,0.5< β <1, α gets 0.4, β and gets 0.6,
Or phase difference value is dropped on to the frequency of certain limit, wherein i meets
P n(i) < φ or here φ gets-0.1, get 0.1, classify frequency to be decayed as;
Treat the frequency of decay, the composition of accompanying carries out attenuation processing, and formula is:
Fft_frameR n(i)=0 or fft_frameL n(i)=0, in formula, i is frequency to be decayed;
Frequency-region signal after decay is inversely transformed into time domain, can obtains removing the song audio frequency after accompaniment.
Described sound characteristic coefficient, a kind of in MFCC, LPCC, LSP, PLP.
Pre-service in described audio repository processing module and tone color matching module, comprising: minute frame, windowing, go quiet;
Described minute frame, refers to sound signal is divided into some frames, and every frame comprises the sampled voice point of predetermined number, and between consecutive frame, has the coincidence sampled point of predetermined number;
Described windowing, refers to that adding Hanning window filtering processes.
In described pre-treatment step, go quiet operation, comprising:
Calculate the short-time energy of every frame, formula is:
E n = &Sigma; m = 0 N - 1 [ w ( m ) x ( n + m ) ] 2
In above formula, w represents window function, and x is voice signal, n=0, and 1L, 2L ..., N is frame length, L is that frame moves length;
When the short-time energy of this frame is during lower than a certain threshold value, just think that it is mute frame, directly remove.
Adaptive training in described singer's model training module goes out singer's model process, adopts Bayesian adaptation, specifically comprises:
I for UBM is mixed member, calculates the posterior probability of component i:
P ( i | x i ) = w i p i ( x t ) &Sigma; j = 1 M w j p j ( x t )
X representation feature coefficient wherein, w represents weight coefficient;
Then calculate weight, average and variance:
n i = &Sigma; t = 1 T p ( i | x t ) , E i ( x ) = 1 n &Sigma; t = 1 T P ( i | x t ) x t E i ( x 2 ) = 1 n &Sigma; t = 1 T p ( i | x t ) x t 2
Then revise the parameter w of each Gaussian distribution in old UBM i, μ i,
Revised new weight: w &Lambda; i = [ &alpha; i w n i / T + ( 1 - &alpha; i w ) w i ] &gamma; ;
Revised new average: &mu; &Lambda; i = &alpha; i m E i ( x ) + ( 1 - &alpha; i m ) &mu; i ;
Revised new variance: &delta; &Lambda; i 2 = &alpha; i v E i ( x 2 ) + ( 1 - &alpha; i v ) ( &delta; i 2 + &mu; i 2 ) - &mu; &Lambda; i 2 ;
Wherein, γ is rule factor, is used for guaranteeing and be 1, be respectively i Gauss's weight, average, the modifying factor of variance,
in formula, r ρfor constant, be used for retraining the variation yardstick of modifying factor.
The computing formula of the log-likelihood ratio in described tone color matching module is:
S ( X ) = &Sigma; t = 1 T log p ( x t | &lambda; star ) - log p ( x t | &lambda; ubm ) ,
X representation feature coefficient wherein, λ star, λ ubmrepresent singer's model and UBM model, p represents the likelihood score of singer's model or UBM model output characteristic vector sequence;
Adopt the log-likelihood ratio of time normalization herein,
S ( X ) = 1 T &Sigma; t = 1 T log p ( x t | &lambda; star ) - log p ( x t | &lambda; ubm ) .
The present invention proposes a kind of singer's tone color similar singer's recommend method and device thereof, for singer finds out the singer similar to its tone color as a reference, can increase the enjoyment of performance.Be applied to, in KTV scene, can attract a large number of users, boost consumption, and improve the level that user imitates singer's tone color.
The foregoing is only better enforcement use-case of the present invention, be not intended to limit protection scope of the present invention.Within the spirit and principles in the present invention all, any modification of doing, be equal to and replace and improvement etc., within all should being included in protection scope of the present invention.

Claims (15)

1. the similar singer's recommend method of singer's tone color, is characterized in that: comprise the steps:
Audio repository is processed: obtain pure people's sound audio of all singers, purer people's sound audio is carried out to pre-service, then extract respectively the sound characteristic coefficient set of each pure people's sound audio;
Singer's model training: according to the corresponding characteristic coefficient collection of each singer, go out corresponding singer's model with sound model Algorithm for Training;
Tone color coupling: for given user's sample sound, carry out pre-service, and extract characteristic coefficient collection; Then the characteristic coefficient collection of user voice sample is mated with all singer's models, find out the most similar singer of tone color.
2. the similar singer's recommend method of a kind of singer's tone color as claimed in claim 1, is characterized in that: pure people's sound audio acquisition pattern of described singer comprises: by song, go accompaniment mode to obtain.
3. the similar singer's recommend method of a kind of singer's tone color as claimed in claim 1, is characterized in that: described singer's model training step comprises: first all sound characteristic coefficient set of extracting in audio repository are concentrated in together and train universal background model UBM; Then according to the corresponding characteristic coefficient collection of each singer, utilize universal background model UBM adaptive training to go out the model of all singers in audio repository.
4. the similar singer's recommend method of a kind of singer's tone color as claimed in claim 1, it is characterized in that: in described tone color coupling step, the operation of " characteristic coefficient of user voice sample being mated with all singer's models; find out the most similar singer of tone color " comprising: the characteristic coefficient collection that calculates user voice sample and singer's model and with the log-likelihood ratio of universal model UBM, using the corresponding singer of log-likelihood ratio maximal value as recommendation singer.
5. the similar singer's recommend method of a kind of singer's tone color as claimed in claim 1, is characterized in that: described sound characteristic coefficient, a kind of in MFCC, LPCC, LSP, PLP.
6. the similar singer's recommend method of a kind of singer's tone color as claimed in claim 1, is characterized in that: the pre-treatment step in described audio repository treatment step and tone color coupling step all comprises successively: minute frame, windowing, go quiet;
Describedly go quietly, comprise the steps:
Calculate the short-time energy of every frame, formula is:
E n = &Sigma; m = 0 N - 1 [ w ( m ) x ( n + m ) ] 2
In above formula, w represents window function, and x is voice signal, n=0, and 1L, 2L ..., N is frame length, L is that frame moves length;
When the short-time energy of this frame is during lower than a certain threshold value, just think that it is mute frame, directly remove.
7. the similar singer's recommend method of a kind of singer's tone color as claimed in claim 3, is characterized in that: described adaptive training goes out the model of all singers in audio repository, adopts Bayesian adaptation, specifically comprises:
I for UBM is mixed member, calculates the posterior probability of component i:
P ( i | x i ) = w i p i ( x t ) &Sigma; j = 1 M w j p j ( x t )
X representation feature coefficient wherein, w represents weight coefficient;
Then calculate weight, average and variance:
n i = &Sigma; t = 1 T p ( i | x t ) , E i ( x ) = 1 n &Sigma; t = 1 T P ( i | x t ) x t E i ( x 2 ) = 1 n &Sigma; t = 1 T p ( i | x t ) x t 2
Then revise the parameter w of each Gaussian distribution in old UBM i, μ i,
Revised new weight: w &Lambda; i = [ &alpha; i w n i / T + ( 1 - &alpha; i w ) w i ] &gamma; ;
Revised new average: &mu; &Lambda; i = &alpha; i m E i ( x ) + ( 1 - &alpha; i m ) &mu; i ;
Revised new variance: &delta; &Lambda; i 2 = &alpha; i v E i ( x 2 ) + ( 1 - &alpha; i v ) ( &delta; i 2 + &mu; i 2 ) - &mu; &Lambda; i 2 ;
Wherein, γ is rule factor, is used for guaranteeing and be 1, be respectively i Gauss's weight, average, the modifying factor of variance,
in formula, r ρfor constant, be used for retraining the variation yardstick of modifying factor.
8. the similar singer's recommend method of a kind of singer's tone color as claimed in claim 4, is characterized in that: the computing formula of described log-likelihood ratio is:
S ( X ) = 1 T &Sigma; t = 1 T log p ( x t | &lambda; star ) - log p ( x t | &lambda; ubm ) ,
X representation feature coefficient wherein, T represents frame number, λ star, λ ubmrepresent singer's model and UBM model, p represents the likelihood score of singer's model or UBM model output characteristic vector sequence.
9. the similar singer's recommendation apparatus of singer's tone color, is characterized in that: comprising: audio repository processing module, singer's model training module and tone color matching module,
Audio repository processing module: for obtaining pure people's sound audio of all singers, purer people's sound audio is carried out to pre-service, then extract respectively the sound characteristic coefficient set of each pure people's sound audio;
Singer's model training module: for according to the corresponding characteristic coefficient collection of each singer, adopt sound model Algorithm for Training to go out corresponding singer's model;
Tone color matching module: for the sample sound of the user to given, carry out pre-service, and extract characteristic coefficient collection; Then the characteristic coefficient collection of user voice sample is mated with all singer's models, find out the most similar singer of tone color.
10. the similar singer's recommendation apparatus of a kind of singer's tone color as claimed in claim 9, is characterized in that: pure people's sound audio acquisition pattern of described singer comprises: by song, go accompaniment mode to obtain.
The similar singer's recommendation apparatus of 11. a kind of singer's tone color as claimed in claim 9, is characterized in that: described singer's model training module comprises: all sound characteristic coefficient set of extracting in audio repository are concentrated in together and train universal background model UBM;
Then according to the corresponding characteristic coefficient collection of each singer, utilize universal background model UBM adaptive training to go out the model of all singers in audio repository.
The similar singer's recommendation apparatus of 12. a kind of singer's tone color as claimed in claim 9, it is characterized in that: in described tone color matching module, the operation of " characteristic coefficient of user voice sample being mated with all singer's models; find out the most similar singer of tone color " comprising: the characteristic coefficient collection that calculates user voice sample and singer's model and with the log-likelihood ratio of universal model UBM, using the corresponding singer of log-likelihood ratio maximal value as recommendation singer.
The similar singer's recommend method of 13. a kind of singer's tone color as claimed in claim 9, is characterized in that: the pre-treatment step in described audio repository processing module and tone color matching module all comprises successively: minute frame, windowing, go quiet;
Describedly go quietly, comprise the steps:
Calculate the short-time energy of every frame, formula is:
E n = &Sigma; m = 0 N - 1 [ w ( m ) x ( n + m ) ] 2
In above formula, w represents window function, and x is voice signal, n=0, and 1L, 2L ..., N is frame length, L is that frame moves length;
When the short-time energy of this frame is during lower than a certain threshold value, just think that it is mute frame, directly remove.
The similar singer's recommendation apparatus of 14. a kind of singer's tone color as claimed in claim 11, is characterized in that: described adaptive training goes out the model of all singers in audio repository, adopts Bayesian adaptation, specifically comprises:
I for UBM is mixed member, calculates the posterior probability of component i:
P ( i | x i ) = w i p i ( x t ) &Sigma; j = 1 M w j p j ( x t )
X representation feature coefficient wherein, w represents weight coefficient;
Then calculate weight, average and variance:
n i = &Sigma; t = 1 T p ( i | x t ) , E i ( x ) = 1 n &Sigma; t = 1 T P ( i | x t ) x t E i ( x 2 ) = 1 n &Sigma; t = 1 T p ( i | x t ) x t 2
Then revise the parameter w of each Gaussian distribution in old UBM i, μ i,
Revised new weight: w &Lambda; i = [ &alpha; i w n i / T + ( 1 - &alpha; i w ) w i ] &gamma; ;
Revised new average: &mu; &Lambda; i = &alpha; i m E i ( x ) + ( 1 - &alpha; i m ) &mu; i ;
Revised new variance: &delta; &Lambda; i 2 = &alpha; i v E i ( x 2 ) + ( 1 - &alpha; i v ) ( &delta; i 2 + &mu; i 2 ) - &mu; &Lambda; i 2 ;
Wherein, γ is rule factor, is used for guaranteeing and be 1, be respectively i Gauss's weight, average, the modifying factor of variance,
in formula, r ρfor constant, be used for retraining the variation yardstick of modifying factor.
The similar singer's recommendation apparatus of 15. a kind of singer's tone color as claimed in claim 12, is characterized in that: the computing formula of described log-likelihood ratio is:
S ( X ) = 1 T &Sigma; t = 1 T log p ( x t | &lambda; star ) - log p ( x t | &lambda; ubm ) ,
X representation feature coefficient wherein, T represents frame number, λ star, λ ubmrepresent singer's model and UBM model, p represents the likelihood score of singer's model or UBM model output characteristic vector sequence.
CN201410448290.6A 2014-09-04 2014-09-04 Method and device for recommending music stars with tones similar to those of singers Pending CN104183245A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410448290.6A CN104183245A (en) 2014-09-04 2014-09-04 Method and device for recommending music stars with tones similar to those of singers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410448290.6A CN104183245A (en) 2014-09-04 2014-09-04 Method and device for recommending music stars with tones similar to those of singers

Publications (1)

Publication Number Publication Date
CN104183245A true CN104183245A (en) 2014-12-03

Family

ID=51964235

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410448290.6A Pending CN104183245A (en) 2014-09-04 2014-09-04 Method and device for recommending music stars with tones similar to those of singers

Country Status (1)

Country Link
CN (1) CN104183245A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104464725A (en) * 2014-12-30 2015-03-25 福建星网视易信息系统有限公司 Method and device for singing imitation
CN105554281A (en) * 2015-12-21 2016-05-04 联想(北京)有限公司 Information processing method and electronic device
CN105575393A (en) * 2015-12-02 2016-05-11 中国传媒大学 Personalized song recommendation method based on voice timbre
CN105679324A (en) * 2015-12-29 2016-06-15 福建星网视易信息系统有限公司 Voiceprint identification similarity scoring method and apparatus
CN106095925A (en) * 2016-06-12 2016-11-09 北京邮电大学 A kind of personalized song recommendations system based on vocal music feature
CN106250400A (en) * 2016-07-19 2016-12-21 腾讯科技(深圳)有限公司 A kind of audio data processing method, device and system
CN106910506A (en) * 2017-02-23 2017-06-30 广东小天才科技有限公司 A kind of method and device that identification character is imitated by sound
CN106997765A (en) * 2017-03-31 2017-08-01 福州大学 The quantitatively characterizing method of voice tone color
CN109031200A (en) * 2018-05-24 2018-12-18 华南理工大学 A kind of sound source dimensional orientation detection method based on deep learning
CN109300485A (en) * 2018-11-19 2019-02-01 北京达佳互联信息技术有限公司 Methods of marking, device, electronic equipment and the computer storage medium of audio signal
CN109308901A (en) * 2018-09-29 2019-02-05 百度在线网络技术(北京)有限公司 Chanteur's recognition methods and device
CN109754820A (en) * 2018-12-07 2019-05-14 百度在线网络技术(北京)有限公司 Target audio acquisition methods and device, storage medium and terminal
CN109903780A (en) * 2019-02-22 2019-06-18 宝宝树(北京)信息技术有限公司 Crying cause model method for building up, system and crying reason discriminating conduct
CN110083772A (en) * 2019-04-29 2019-08-02 北京小唱科技有限公司 Singer's recommended method and device based on singing skills
CN110364182A (en) * 2019-08-01 2019-10-22 腾讯音乐娱乐科技(深圳)有限公司 A kind of audio signal processing method and device
CN110489659A (en) * 2019-07-18 2019-11-22 平安科技(深圳)有限公司 Data matching method and device

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1567431A (en) * 2003-07-10 2005-01-19 上海优浪信息科技有限公司 Method and system for identifying status of speaker
US20050027514A1 (en) * 2003-07-28 2005-02-03 Jian Zhang Method and apparatus for automatically recognizing audio data
CN1897109A (en) * 2006-06-01 2007-01-17 电子科技大学 Single audio-frequency signal discrimination based on MFCC
CN101021854A (en) * 2006-10-11 2007-08-22 鲍东山 Audio analysis system based on content
CN101351761A (en) * 2005-10-27 2009-01-21 高通股份有限公司 Method and apparatus for achieving flexible bandwidth using variable guard bands
CN101577117A (en) * 2009-03-12 2009-11-11 北京中星微电子有限公司 Extracting method of accompaniment music and device
CN101944359A (en) * 2010-07-23 2011-01-12 杭州网豆数字技术有限公司 Voice recognition method facing specific crowd
CN101980336A (en) * 2010-10-18 2011-02-23 福州星网视易信息系统有限公司 Hidden Markov model-based vehicle sound identification method
CN102394062A (en) * 2011-10-26 2012-03-28 华南理工大学 Method and system for automatically identifying voice recording equipment source
CN102543073A (en) * 2010-12-10 2012-07-04 上海上大海润信息系统有限公司 Shanghai dialect phonetic recognition information processing method
CN103065623A (en) * 2012-12-17 2013-04-24 深圳Tcl新技术有限公司 Timbre matching method and timbre matching device
CN103177722A (en) * 2013-03-08 2013-06-26 北京理工大学 Tone-similarity-based song retrieval method
CN103236260A (en) * 2013-03-29 2013-08-07 京东方科技集团股份有限公司 Voice recognition system
CN103474065A (en) * 2013-09-24 2013-12-25 贵阳世纪恒通科技有限公司 Method for determining and recognizing voice intentions based on automatic classification technology
CN103730121A (en) * 2013-12-24 2014-04-16 中山大学 Method and device for recognizing disguised sounds
CN103871423A (en) * 2012-12-13 2014-06-18 上海八方视界网络科技有限公司 Audio frequency separation method based on NMF non-negative matrix factorization
CN103943113A (en) * 2014-04-15 2014-07-23 福建星网视易信息系统有限公司 Method and device for removing accompaniment from song

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1567431A (en) * 2003-07-10 2005-01-19 上海优浪信息科技有限公司 Method and system for identifying status of speaker
US20050027514A1 (en) * 2003-07-28 2005-02-03 Jian Zhang Method and apparatus for automatically recognizing audio data
CN101351761A (en) * 2005-10-27 2009-01-21 高通股份有限公司 Method and apparatus for achieving flexible bandwidth using variable guard bands
CN1897109A (en) * 2006-06-01 2007-01-17 电子科技大学 Single audio-frequency signal discrimination based on MFCC
CN101021854A (en) * 2006-10-11 2007-08-22 鲍东山 Audio analysis system based on content
CN101577117A (en) * 2009-03-12 2009-11-11 北京中星微电子有限公司 Extracting method of accompaniment music and device
CN101944359A (en) * 2010-07-23 2011-01-12 杭州网豆数字技术有限公司 Voice recognition method facing specific crowd
CN101980336A (en) * 2010-10-18 2011-02-23 福州星网视易信息系统有限公司 Hidden Markov model-based vehicle sound identification method
CN102543073A (en) * 2010-12-10 2012-07-04 上海上大海润信息系统有限公司 Shanghai dialect phonetic recognition information processing method
CN102394062A (en) * 2011-10-26 2012-03-28 华南理工大学 Method and system for automatically identifying voice recording equipment source
CN103871423A (en) * 2012-12-13 2014-06-18 上海八方视界网络科技有限公司 Audio frequency separation method based on NMF non-negative matrix factorization
CN103065623A (en) * 2012-12-17 2013-04-24 深圳Tcl新技术有限公司 Timbre matching method and timbre matching device
CN103177722A (en) * 2013-03-08 2013-06-26 北京理工大学 Tone-similarity-based song retrieval method
CN103236260A (en) * 2013-03-29 2013-08-07 京东方科技集团股份有限公司 Voice recognition system
CN103474065A (en) * 2013-09-24 2013-12-25 贵阳世纪恒通科技有限公司 Method for determining and recognizing voice intentions based on automatic classification technology
CN103730121A (en) * 2013-12-24 2014-04-16 中山大学 Method and device for recognizing disguised sounds
CN103943113A (en) * 2014-04-15 2014-07-23 福建星网视易信息系统有限公司 Method and device for removing accompaniment from song

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
任雪妮: "《语音相似度评价算法研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
刘杰: "《自动语种识别系统设计与实现》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
徐永华: "《基于GMM-UBM模型的语种识别》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
朱少雄: "《声纹识别系统与模式匹配算法研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
李丽娟: "《基于统计模型的说话人识别研究与实现》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
颜凯: "《基于高斯混合模型的说话人识别算法研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104464725B (en) * 2014-12-30 2017-09-05 福建凯米网络科技有限公司 A kind of method and apparatus imitated of singing
CN104464725A (en) * 2014-12-30 2015-03-25 福建星网视易信息系统有限公司 Method and device for singing imitation
CN105575393A (en) * 2015-12-02 2016-05-11 中国传媒大学 Personalized song recommendation method based on voice timbre
CN105554281A (en) * 2015-12-21 2016-05-04 联想(北京)有限公司 Information processing method and electronic device
CN105679324B (en) * 2015-12-29 2019-03-22 福建星网视易信息系统有限公司 A kind of method and apparatus of Application on Voiceprint Recognition similarity score
CN105679324A (en) * 2015-12-29 2016-06-15 福建星网视易信息系统有限公司 Voiceprint identification similarity scoring method and apparatus
CN106095925A (en) * 2016-06-12 2016-11-09 北京邮电大学 A kind of personalized song recommendations system based on vocal music feature
CN106095925B (en) * 2016-06-12 2018-07-03 北京邮电大学 A kind of personalized song recommendations method based on vocal music feature
CN106250400A (en) * 2016-07-19 2016-12-21 腾讯科技(深圳)有限公司 A kind of audio data processing method, device and system
CN106910506A (en) * 2017-02-23 2017-06-30 广东小天才科技有限公司 A kind of method and device that identification character is imitated by sound
CN106997765A (en) * 2017-03-31 2017-08-01 福州大学 The quantitatively characterizing method of voice tone color
CN106997765B (en) * 2017-03-31 2020-09-01 福州大学 Quantitative characterization method for human voice timbre
CN109031200A (en) * 2018-05-24 2018-12-18 华南理工大学 A kind of sound source dimensional orientation detection method based on deep learning
CN109308901A (en) * 2018-09-29 2019-02-05 百度在线网络技术(北京)有限公司 Chanteur's recognition methods and device
CN109300485A (en) * 2018-11-19 2019-02-01 北京达佳互联信息技术有限公司 Methods of marking, device, electronic equipment and the computer storage medium of audio signal
CN109300485B (en) * 2018-11-19 2022-06-10 北京达佳互联信息技术有限公司 Scoring method and device for audio signal, electronic equipment and computer storage medium
CN109754820A (en) * 2018-12-07 2019-05-14 百度在线网络技术(北京)有限公司 Target audio acquisition methods and device, storage medium and terminal
CN109903780A (en) * 2019-02-22 2019-06-18 宝宝树(北京)信息技术有限公司 Crying cause model method for building up, system and crying reason discriminating conduct
CN110083772A (en) * 2019-04-29 2019-08-02 北京小唱科技有限公司 Singer's recommended method and device based on singing skills
CN110489659A (en) * 2019-07-18 2019-11-22 平安科技(深圳)有限公司 Data matching method and device
CN110364182A (en) * 2019-08-01 2019-10-22 腾讯音乐娱乐科技(深圳)有限公司 A kind of audio signal processing method and device
CN110364182B (en) * 2019-08-01 2022-06-14 腾讯音乐娱乐科技(深圳)有限公司 Sound signal processing method and device

Similar Documents

Publication Publication Date Title
CN104183245A (en) Method and device for recommending music stars with tones similar to those of singers
CN102054480B (en) Method for separating monaural overlapping speeches based on fractional Fourier transform (FrFT)
Luo et al. Music source separation with band-split RNN
CN110019931B (en) Audio classification method and device, intelligent equipment and storage medium
CN104700843A (en) Method and device for identifying ages
CN103943104B (en) A kind of voice messaging knows method for distinguishing and terminal unit
CN102129456B (en) Method for monitoring and automatically classifying music factions based on decorrelation sparse mapping
CN106024010B (en) A kind of voice signal dynamic feature extraction method based on formant curve
CN108447495A (en) A kind of deep learning sound enhancement method based on comprehensive characteristics collection
CN105469807B (en) A kind of more fundamental frequency extracting methods and device
CN104123934A (en) Speech composition recognition method and system
CN102436809A (en) Network speech recognition method in English oral language machine examination system
CN103440872A (en) Transient state noise removing method
CN102610236A (en) Method for improving voice quality of throat microphone
CN108281150B (en) Voice tone-changing voice-changing method based on differential glottal wave model
FitzGerald et al. Single channel vocal separation using median filtering and factorisation techniques
CN111081249A (en) Mode selection method, device and computer readable storage medium
CN105976803B (en) A kind of note cutting method of combination music score
CN112116909A (en) Voice recognition method, device and system
Kamble et al. Teager energy subband filtered features for near and far-field automatic speech recognition
CN111091847A (en) Deep clustering voice separation method based on improvement
Allen et al. Warped magnitude and phase-based features for language identification
Sofianos et al. H-Semantics: A hybrid approach to singing voice separation
Pandey et al. Significance of glottal activity detection for speaker verification in degraded and limited data condition
Kumari et al. Audio signal classification based on optimal wavelet and support vector machine

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20141203

Assignee: FUJIAN KAIMI NETWORK SCIENCE & TECHNOLOGY CO., LTD.

Assignor: Fujian Starnet e-Video Information System Co., Ltd.

Contract record no.: 2015350000072

Denomination of invention: Method and device for recommending music stars with tones similar to those of singers

License type: Common License

Record date: 20150925

LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model
C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20151027

Address after: 350018 Fujian city of Fuzhou province Nanjiang gate town of Cangshan District West Coast Road No. 198 Fuzhou Strait International Convention and Exhibition Center basement East Office Center No. A-029 (FTA test area)

Applicant after: FUJIAN KAIMI NETWORK SCIENCE & TECHNOLOGY CO., LTD.

Address before: Cangshan District of Fuzhou City, Fujian province 350000 to build a new town, Jinshan Road No. 618, juyuanzhou Industrial Park No. 19 building one or two layer

Applicant before: Fujian Starnet e-Video Information System Co., Ltd.

WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20141203

WD01 Invention patent application deemed withdrawn after publication