CN104183245A - Method and device for recommending music stars with tones similar to those of singers - Google Patents

Method and device for recommending music stars with tones similar to those of singers Download PDF

Info

Publication number
CN104183245A
CN104183245A CN 201410448290 CN201410448290A CN104183245A CN 104183245 A CN104183245 A CN 104183245A CN 201410448290 CN201410448290 CN 201410448290 CN 201410448290 A CN201410448290 A CN 201410448290A CN 104183245 A CN104183245 A CN 104183245A
Authority
CN
China
Prior art keywords
singer
model
voice
audio
similar
Prior art date
Application number
CN 201410448290
Other languages
Chinese (zh)
Inventor
王子亮
刘旺
邹应双
蔡智力
Original Assignee
福建星网视易信息系统有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 福建星网视易信息系统有限公司 filed Critical 福建星网视易信息系统有限公司
Priority to CN 201410448290 priority Critical patent/CN104183245A/en
Publication of CN104183245A publication Critical patent/CN104183245A/en

Links

Abstract

The invention provides a method for recommending music stars with tones similar to those of singers. The method includes the steps that pure human voice frequencies are obtained, preprocessing is conducted on the pure human voice frequencies, a voice feature coefficient set of each pure human voice frequency is extracted, and corresponding music star models are trained through the voice model algorithm; preprocessing is conducted on given user voice samples, and feature coefficient sets are extracted; the feature coefficient sets of the user voice samples are matched with all the music star models, and the music stars with the tones most similar to the tones of the singers are found out. The invention further provides a corresponding device. The method and device for recommending the music stars with the tones similar to those of the singers can be applied to KTV scenes, are used for recommending music stars with the tones similar to those of users, can increase singing pleasure, and improve the level of the users for simulating the tones of the music stars.

Description

一种演唱者音色相似的歌星推荐方法与装置 A similar star singers voice recommended method and apparatus

【技术领域】 TECHNICAL FIELD

[0001] 本发明涉及智能语音技术领域,具体涉及一种演唱者音色相似的歌星推荐方法与 [0001] The present invention relates to the field of intelligent speech technology, particularly relates to a singer's voice similar recommendation method and singer

>JU ρςα装直。 > JU ρςα loaded straight.

【背景技术】 【Background technique】

[0002] 随着智能终端的普及,人们对生活智能化服务的要求越来越高,语音智能能化服务成为人们迫切需要。 [0002] With the popularity of intelligent terminals, intelligent people on the lives of more and more services, intelligent voice services can become an urgent need.

[0003] 现有唱歌评测技术中有对演唱者“唱得准不准”的评定方法,比如音准评分技术,但较少对“唱得像不像”或者“唱得像谁”作出评定。 [0003] There are prior art evaluation singing "sing whether the Right" Evaluation method artist, such as pitch scoring techniques, but less to the assessment of "sing a lot like" or "like who sing." K歌系统的智能化迫切需要一种技术,能根据用户的声音匹配出与其音色最接近的歌手,进而向用户推荐相应歌手的歌曲,从而增加用户演唱的乐趣,并提高用户模仿歌星音色的水平。 K song intelligent systems urgent need for a technology that can match the user's voice sound out its closest singer, and then recommend the appropriate singer of songs to the user, thereby increasing the fun of singing the user and improve the user level singers imitate the sounds .

【发明内容】 [SUMMARY]

[0004] 本发明所要解决的技术问题之一在于提供一种演唱者音色相似的歌星推荐方法,实现为演唱者找出与其音色相似的歌星的功能。 [0004] One of the present invention to solve the technical problem is to provide a sound similar to the singer's singer recommended methods to achieve similar singers for the singer to find its voice is turned on.

[0005] 本发明是采用以下技术方案解决上述技术问题之一的: [0005] The present invention adopts the following technical solutions is one of the technical problems described above:

[0006] 一种演唱者音色相似的歌星推荐方法,包括如下步骤: [0006] A similar star singer Voice recommendation method comprising the steps of:

[0007] 音频库处理:获得所有歌星的纯人声音频,再对纯人声音频进行预处理,然后分别提取每个纯人声音频的声音特征系数集; [0007] Audio processing library: obtaining a pure vocal singer all audio, then pure vocal audio preprocessing, and then were extracted feature of each pure vocal sound audio set of coefficients;

[0008] 歌星模型训练:根据每个歌星所对应的特征系数集,用声音模型算法训练出对应歌星模型; [0008] Singer Model Training: The set of coefficients corresponding to each of the star feature, corresponding to the training model singer voice model algorithm;

[0009] 音色匹配:对于给定的用户的声音样本,进行预处理,并提取特征系数集;然后将用户声音样本的特征系数集与所有歌星模型进行匹配,找出音色最相似的歌星。 [0009] Voice match: for a given user's voice samples, pretreatment, and extract a feature coefficient set; and wherein the set of coefficients with all the user's voice sample match model singer, singers voice is the most similar.

[0010] 进一步,所述歌星的纯人声音频获得方式包括:通过歌曲去伴奏方式获得。 [0010] Further, the singer pure vocal audio way to obtain comprising: accompaniment pattern to be obtained by the song.

[0011] 进一步,所述歌星模型训练步骤包括:首先将音频库中提取的所有声音特征系数集集中在一起训练出通用背景模型UBM;接着根据每个歌星所对应的特征系数集,利用通用背景模型UBM自适应训练出音频库中所有歌星的模型。 [0011] Further, the star model training step comprises: a first set of coefficients centralize all sound feature extracted audio library a universal background model is trained with the UBM; then the feature set of coefficients corresponding to each of the star, using a general background UBM model adaptive training all the audio library model singer.

[0012] 进一步,所述音色匹配步骤中,“将用户声音样本的特征系数与所有歌星模型进行匹配,找出音色最相似的歌星”的操作包括:计算用户声音样本的特征系数集与歌星模型以及与通用模型UBM的对数似然比,将对数似然比最大值所对应的歌星作为推荐歌星。 Operation [0012] Further, the tone color matching step, "the coefficients of the user's voice sample match with all models singer, singers voice is the most similar" include: a feature set of coefficients and calculating the user's voice sample model singer and the UBM generic model log likelihood ratio, logarithmic likelihood ratio corresponding to the maximum as recommended singer singer.

[0013] 进一步,所述声音特征系数,为MFCC、LPCC、LSP、PLP中的一种。 [0013] Further, the sound characteristic coefficient of MFCC, LPCC, LSP, PLP of one.

[0014] 进一步,所述音频库处理步骤和音色匹配步骤中的预处理步骤均依次包括:分帧、加窗、去静音; [0014] Further, the audio tone library matching step and the processing step in the pretreatment step are successively comprising: a sub-frame, windowing, to mute;

[0015] 所述去静音,包括如下步骤: [0015] to the mute, comprising the steps of:

[0016] 计算每帧的短时能量,公式为: N-1 [0016] The short-term energy calculation for each frame, the formula is: N-1

[0017] En = ^ [ W'(m )χ(η + m)]2 [0017] En = ^ [W '(m) χ (η + m)] 2

m — O m - O

[0018] 上式中,w表不窗函数,χ为声音信号,η = 0,1L, 2L,..., N为巾贞长,L为巾贞移长度; [0018] the above formula, w is a window function table is not, χ sound signal, η = 0,1L, 2L, ..., N is the length of towel Chen, L is the length of towel Zhen shift;

[0019] 当该帧的短时能量低于某一阈值时,就认为它是静音帧,直接去除。 [0019] When the short-term energy of the frame is below a certain threshold, it is considered silence frame, removed directly.

[0020] 进一步,所述自适应训练出音频库中所有歌星的模型,采用贝叶斯自适应算法,具体包括: [0020] Further, the adaptive training of all the audio library model singer, Bayesian adaptive algorithm comprises:

[0021 ] 对于UBM的第i个混合成员,计算分量i的后验概率: [0021] For the i-th members of UBM mixing of component i is calculated posterior probability:

[0022] 广W" — —^ ~ [0022] wide W "- - ^ ~

Lj-1wJ PAx^ Lj-1wJ PAx ^

[0023] 其中χ表示特征系数,w表示权重系数; [0023] where χ represents a characteristic coefficient, w represents a weighting coefficient;

[0024] 然后计算权重、均值和方差: [0024] and then calculating the weight, mean and variance:

T IT I Γ T IT I Γ

[0025] ni=Y^p{i\xt), Ej(X) = -J^P(IlXl)Xt eM1) [0025] ni = Y ^ p {i \ xt), Ej (X) = -J ^ P (IlXl) Xt eM1)

,μ « ί=ι η ί=ι , Μ «ί = ι η ί = ι

[0026] 接着修正旧UBM中各个高斯分布的参数Wi, μ i,W, [0026] Next correction parameter Wi of each Gaussian distribution in the old UBM, μ i, W,

[0027]修正后的新权重:u'.,.=[<«,./厂+.(1-<)H:.];,; [0027] Right after correction the new weight: u ', = [< «, / plant + (1 - <) H:...];,;..

[0028]修正后的新均值:μ.= «;η£.(λ-) + (1 - α';')μ,; [0028] The new corrected mean: μ = «; η £ (λ-) + (1 - α ';') μ ,;..

[0029]修正后的新方差:^ = α w ^ ) + (丨—< )(< +μ:)_^: [0029] The variance of the new corrected: ^ = α w ^) + (Shu - <) (<+ μ:) _ ^:

[0030] 其中,Y为规则因子,用来保证&的和为I, a.',a';',a.'分别为对第i个高斯的权重、均值,方差的修正因子, [0030] wherein, Y is the rule factor, and is used to ensure & I, a ', a';. ', A.', Respectively, for the i-th Gaussian weights, mean, variance, a correction factor,

η. η.

[0031] < 式中,rP为常数,用来约束修正因子的变化尺度。 [0031] <in the formula, rP is constant, changes in the scale used to constrain the correction factor.

/2.十/ / 2 X. /

ί ί

[0032] 进一步,所述计算用户声音样本的特征系数集与歌星模型以及与通用模型UBM的对数似然比,公式为: [0032] Further, the user's voice sample is calculated feature sets and coefficients of the singer and the number of models and the general model UBM likelihood ratio, the formula is:

[0033] S(X) = ^Xlog ρ{χ, I Λ„„.) — 1Og p{x, I ), [0033] S (X) = ^ Xlog ρ {χ, I Λ "") -. 1Og p {x, I),

i i

[0034] 其中χ表示特征系数,T表示帧数,λ star, λ ubm表示歌星模型与UBM模型,P表示歌星模型或UBM模型输出特征矢量序列的似然度。 [0034] where χ represents a characteristic coefficient, T is the number of frames, λ star, λ ubm UBM model represents a model singer, P denotes likelihood model singer or UBM model output feature vector sequences.

[0035] 本发明还提供一种演唱者音色相似的歌星推荐装置,其包括:音频库处理模块、歌星模型训练模块和音色匹配模块, [0035] The present invention also provides a singer's voice singer similar recommendation apparatus, comprising: an audio processing module library, singers voice model training module and a matching module,

[0036] 音频库处理模块:用于获得所有歌星的纯人声音频,再对纯人声音频进行预处理,然后分别提取每个纯人声音频的声音特征系数集; [0036] The audio processing module library: the star for obtaining pure vocal all audio, then pure vocal audio preprocessing, and then were extracted feature of each pure vocal sound audio set of coefficients;

[0037] 歌星模型训练模块:用于根据每个歌星所对应的特征系数集,采用声音模型算法训练出对应歌星模型; [0037] Singer Model training module: The set of coefficients for each of the singer corresponding features used to train the acoustic model corresponding to the singer model algorithm;

[0038] 音色匹配模块:用于对给定的用户的声音样本,进行预处理,并提取特征系数集;然后将用户声音样本的特征系数集与所有歌星模型进行匹配,找出音色最相似的歌星。 [0038] Voice matching module: for a given user's voice samples, pretreatment, and extract a feature coefficient set; feature set of coefficients is then the user's voice sample match all star model is the most similar timbre singer.

[0039] 进一步,所述歌星的纯人声音频获得方式包括:通过歌曲去伴奏方式获得。 [0039] Further, the singer pure vocal audio way to obtain comprising: accompaniment pattern to be obtained by the song.

[0040] 进一步,所述歌星模型训练模块包括:将音频库中提取的所有声音特征系数集集中在一起训练出通用背景模型UBM ; [0040] Further, the star model training module comprising: a database of all audio sound feature coefficients extracted in the training set together a universal background model the UBM;

[0041] 接着根据每个歌星所对应的特征系数集,利用通用背景模型UBM自适应训练出音频库中所有歌星的模型。 [0041] The next set of feature coefficients corresponding to each of the star, using a universal background model UBM adaptation training models for all the audio library singer.

[0042] 进一步,所述音色匹配模块中,“将用户声音样本的特征系数与所有歌星模型进行匹配,找出音色最相似的歌星”的操作包括:计算用户声音样本的特征系数集与歌星模型以及与通用模型UBM的对数似然比,将对数似然比最大值所对应的歌星作为推荐歌星。 Operation [0042] Further, the tone color matching module, "the coefficients of the user's voice sample match with all models singer, singers voice is the most similar" include: a feature set of coefficients and calculating the user's voice sample model singer and the UBM generic model log likelihood ratio, logarithmic likelihood ratio corresponding to the maximum as recommended singer singer.

[0043] 进一步,所述声音特征系数,为MFCC、LPCC、LSP、PLP中的一种。 [0043] Further, the sound characteristic coefficient of MFCC, LPCC, LSP, PLP of one.

[0044] 进一步,所述音频库处理模块和音色匹配模块中的预处理步骤均依次包括:分帧、加窗、去静音; [0044] Further, the audio processing module library and tone matching module preprocessing steps are successively comprising: a sub-frame, windowing, to mute;

[0045] 所述去静音,包括如下步骤: [0045] to the mute, comprising the steps of:

[0046] 计算每帧的短时能量,公式为: [0046] The short-term energy calculation for each frame, the formula is:

N-1 N-1

[0047]五„ = Σ [w(m )x(n + m)}2 [0047] V. "= Σ [w (m) x (n + m)} 2

m = 0 m = 0

[0048] 上式中,w表不窗函数,χ为声音信号,η = 0,1L, 2L,..., N为巾贞长,L为巾贞移长度; [0048] the above formula, w is a window function table is not, χ sound signal, η = 0,1L, 2L, ..., N is the length of towel Chen, L is the length of towel Zhen shift;

[0049] 当该帧的短时能量低于某一阈值时,就认为它是静音帧,直接去除。 [0049] When the short-term energy of the frame is below a certain threshold, it is considered silence frame, removed directly.

[0050] 进一步,所述自适应训练出音频库中所有歌星的模型,采用贝叶斯自适应算法,具体包括: [0050] Further, the adaptive training of all the audio library model singer, Bayesian adaptive algorithm comprises:

[0051 ] 对于UBM的第i个混合成员,计算分量i的后验概率: [0051] For the i-th members of UBM mixing of component i is calculated posterior probability:

wIPiix,) wIPiix,)

[0052] Ρ、1 \χί)- -Γ~ [0052] Ρ, 1 \ χί) - -Γ ~

[0053] 其中χ表示特征系数,w表示权重系数; [0053] where χ represents a characteristic coefficient, w represents a weighting coefficient;

[0054] 然后计算权重、均值和方差: [0054] and then calculating the weight, mean and variance:

[0055] Hi = [ /;(/' IX,) , E1 ⑴=士Σ 1-VK Ει (χ2)=丄Σ P^i I Λ ) [0055] Hi = [/; (/ 'IX,), E1 ⑴ = Disabled Σ 1-VK Ει (χ2) = Shang Σ P ^ i I Λ)

卜I n /-1 nt=x Bu I n / -1 nt = x

[0056] 接着修正旧UBM中各个高斯分布的参数Wi, μ i,σ,;, [0056] Next UBM old correction parameter Wi of each Gaussian distribution, μ i, σ,;,

[0057] 修正后的新权重:& =[α广/v/7>(卜<>、.];/ ; [0057] New weights after correction: & = [α wide / v / 7> (BU <>,.]; /;

[0058]修正后的新均值:Iii = cr;B£f(x) + (l-0//.; [0058] The new corrected mean: Iii = cr; B £ f (x) + (l-0 // .;

[0059]修正后的新方差:(” =Q,; £■ ( γ-) + (I — α'.)(^- + jut) — μ); [0059] The variance of the new corrected :( "= Q ,; £ ■ (γ-) + (I - α '.) (^ - + jut) - μ);

[0060] 其中,Υ为规则因子,用来保证&的和为1,分别为对第i个高斯的权重、均值,方差的修正因子, [0060] wherein, the rule Upsilon factor, and is used to ensure & 1, respectively, for the i-th Gaussian weights, mean, variance, a correction factor,

[0061] af ,式中,rP为常数,用来约束修正因子的变化尺度。 [0061] af, wherein, rP is constant, changes in the scale used to constrain the correction factor.

[0062] 进一步,所述计算用户声音样本的特征系数集与歌星模型以及与通用模型UBM的对数似然比,公式为: [0062] Further, the user's voice sample is calculated feature sets and coefficients of the singer and the number of models and the general model UBM likelihood ratio, the formula is:

[0063] Si= ^Xlog Pixi I Λ,„,.) - *Og [Αχ, I Kn,,), [0063] Si = ^ Xlog Pixi I Λ, ",.) - * Og [Αχ, I Kn ,,),

I /-1 I / -1

[0064] 其中χ表示特征系数,T表示帧数,λ star, λ ubm表示歌星模型与UBM模型,P表示歌星模型或UBM模型输出特征矢量序列的似然度。 [0064] where χ represents a characteristic coefficient, T is the number of frames, λ star, λ ubm UBM model represents a model singer, P denotes likelihood model singer or UBM model output feature vector sequences.

[0065] 本发明的优点在于:本发明提出一种演唱者音色相似的歌星推荐方法及装置,为演唱者找出与其音色相似的歌星作为参考,增加演唱的乐趣。 [0065] The advantage of the present invention is that: the present invention provides a method and apparatus similar recommendation singer a singer sound, tone identify similar thereto as a reference star, increasing singer singing fun. 应用于KTV场景中,能够吸引大量用户,刺激消费,并提高用户模仿歌星音色的水平。 KTV applied to the scene, able to attract large numbers of users, stimulate consumption, and improve the level of user imitate singers sound.

【附图说明】 BRIEF DESCRIPTION

[0066] 下面参照附图结合实施例对本发明作进一步的描述。 [0066] The following embodiments with reference to the accompanying drawings in conjunction with embodiments of the present invention will be further described.

[0067] 图1是本发明的方法音频库处理与歌星模型训练过程的流程图。 [0067] FIG. 1 is a flowchart of a method singer model training process of the present invention is a process audio library.

[0068] 图2是本发明的方法中单个歌星模型训练流程图。 [0068] FIG 2 is a single star model training method of the present invention in a flowchart.

[0069] 图3是本发明的方法音色匹配过程流程图。 [0069] FIG. 3 is a matching method of the present invention is a process flow diagram tone.

[0070] 图4是本发明的方法中音色匹配过程中计算似然比流程图。 [0070] FIG. 4 is a method of the present invention Voice matching likelihood ratio calculation process flow chart.

[0071] 图5是本发明的装置结构示意图。 [0071] FIG. 5 is a schematic view of the device structure of the present invention.

【具体实施方式】 【Detailed ways】

[0072] 第一实施例: [0072] First Embodiment:

[0073] 一种演唱者音色相似的歌星推荐方法,包括如下步骤: [0073] A similar star singer Voice recommendation method comprising the steps of:

[0074] 音频库处理:获得所有歌星的纯人声音频,再对纯人声音频进行预处理,然后分别提取每个纯人声音频的声音特征系数集; [0074] Audio processing library: obtaining a pure vocal singer all audio, then pure vocal audio preprocessing, and then were extracted feature of each pure vocal sound audio set of coefficients;

[0075] 歌星模型训练:根据每个歌星所对应的特征系数集,用声音模型算法训练出对应歌星模型; [0075] Singer Model Training: The set of coefficients corresponding to each of the star feature, corresponding to the training model singer voice model algorithm;

[0076] 音色匹配:对于给定的用户的声音样本,进行预处理,并提取特征系数集;然后将用户声音样本的特征系数集与所有歌星模型进行匹配,找出音色最相似的歌星。 [0076] Voice match: for a given user's voice samples, pretreatment, and extract a feature coefficient set; and wherein the set of coefficients with all the user's voice sample match model singer, singers voice is the most similar.

[0077] 下面对该实施例进行详细描述。 [0077] The following detailed description of the embodiments.

[0078] —种演唱者音色相似的歌星推荐方法,包括如下步骤: [0078] - Voice similar artist star recommendation method comprising the steps of:

[0079] S1:音频库处理过程(如图1所示): [0079] S1: audio database processes (Figure 1):

[0080] Sll:准备音频库,收集一定数量的歌星的若干歌曲,比如300个歌星,每个歌星5首歌曲所对应的立体声音频; [0080] Sll: ready audio library, collected a number of singers of a certain number of songs, such as 300 singers, singer five songs each corresponding stereo audio;

[0081] S12:对音频库中的所有歌曲去除伴奏得到纯人声频,其方法可参考专利名称为《一种立体声音频的处理方法与装置》,专利申请号为:201410263446.3的中国发明专利。 [0081] S12: all song audio library removed to give pure vocal track frequency, the method can be referenced patent entitled "Method and apparatus for the processing of stereo audio", patent application number: 201410263446.3 Chinese invention patent. 该方法主要利用立体声左右声道之间伴奏与人声的差异性,对伴奏进行抑制滤波,从而提取人声。 The stereo method using left and right main difference between the vocal and accompaniment channel, suppress accompaniment filter, thereby extracting the human voice. 对歌曲去伴奏的目的是减少歌曲中的伴奏成分对于歌星音色模型训练的影响。 The purpose of the song to the accompaniment of a song is to reduce the impact of the accompaniment component of the model training for singers sound.

[0082] 对音频库中的所有歌曲去除伴奏得到纯人声音频,具体包括: [0082] for all the songs in Audio Library to get pure vocal accompaniment to remove the audio, including:

[0083] 将立体声音频的左右声道信号变换到频域; [0083] The left and right channel stereo audio signal into the frequency domain;

[0084] 计算左声道频域信号与右声道频域信号相应频点对的幅度比值,对幅度比值在预设范围内的频点列为待衰减的频点,且计算左声道频域信号与右声道频域信号相应频点对的相位差,将相位差差值在预设范围内的频点也列为待衰减的频点;所述幅度比值的计算公式为: [0084] Calculation of the left channel frequency domain signal and the right channel frequency domain signal corresponding to the amplitude of the frequency ratio, amplitude ratio is within a predetermined frequency range as the frequency to be attenuated, and calculates the left channel frequency frequency-domain signal and the right channel frequency domain signal corresponding to the phase difference, the phase difference is within a preset range frequencies can also be attenuated as frequency; calculated as the amplitude ratio:

[0085] kn(i) = abs (fft_frameRn(i)) /abs (fft_frameLn(i)) * (2/ π ), [0085] kn (i) = abs (fft_frameRn (i)) / abs (fft_frameLn (i)) * (2 / π),

[0086] 公式中η = 0,1,2,..., Ν_1,表示巾贞号i = O, I, 2..., FN/2, FN表示傅里叶变换的点数,相位差的计算公式为: [0086] In the formula η = 0,1,2, ..., Ν_1, Chen indicates towel number i = O, I, 2 ..., FN / 2, FN represents the number of points of the Fourier transform, the phase difference The formula is:

[0087] pn(i) = angel (fft_frameLn(i)) -angel (fft_frameRn(i)), [0087] pn (i) = angel (fft_frameLn (i)) -angel (fft_frameRn (i)),

[0088] n = 0,1,2,…,N_1 ;i = 0,1,2,...,FN/2 ; [0088] n = 0,1,2, ..., N_1; i = 0,1,2, ..., FN / 2;

[0089] 接着,筛选出待衰减的频点,也就是将幅度比值落在一定范围的频点,其中频点i符合 [0089] Next, filter out frequency to be attenuated, i.e. the amplitude ratio falls frequency range, wherein the frequency i meet

[0090] kn(i)〈a 或匕⑴〉^,0〈α〈0.5,0.5〈β〈1,a 取0.4,β 取0.6, [0090] kn (i) <a 或匕⑴> ^, 0 <α <0.5,0.5 <β <1, a take 0.4, β is 0.6,

[0091] 或将相位差值落在一定范围的频点,其中i符合 [0091] or the retardation value falls in a certain frequency range, where i meet

[0092] pn (i) < φ 或O) > Ψ,<Φ < O, O < φ < π,这里φ 取-0.1,沪取0.1,列为待衰减的频点; [0092] pn (i) <φ, or O)> Ψ, <Φ <O, O <φ <π, where [Phi] take -0.1, 0.1 Shanghai take, as the frequency to be attenuated;

[0093] 对待衰减的频点,即伴奏成分进行衰减处理,公式为: [0093] The treatment frequency attenuation, i.e. accompaniment component attenuation processing, the formula is:

[0094] fft_frameRn(i) = O 或fft_frameLn(i) = O,公式中,i 为待衰减的频点; [0094] fft_frameRn (i) = O or fft_frameLn (i) = O, equations, i is the frequency to be attenuated;

[0095] 将衰减后的频域信号逆变换为时域,即可得到去除伴奏后的歌曲音频。 [0095] The frequency domain signal after the inverse transform attenuated time domain, can be obtained after removal of song audio accompaniment.

[0096] 在其他实施方式中,也可以通过其他方法得到纯人声音频,并不局限于上述算法。 [0096] In other embodiments, the human voice can be obtained pure audio by other methods, not limited to the above algorithm.

[0097] 在其他实施方式中,如果步骤Sll中已搜集到所有歌星的纯人声音频,则略过步骤S12。 [0097] In other embodiments, if the step Sll has been collected pure vocal singer of all audio, then skip the step S12.

[0098] S13:对去除伴奏后的歌曲进行预处理,包括:分帧、加窗、去静音; [0098] S13: after removal of the song accompaniment pretreatment, comprising: a sub-frame, windowing, to mute;

[0099] 分帧,是指将音频信号分成若干帧,每帧包括预设数量的声音采样点,并且相邻帧之间有预设数量的重合采样点; [0099] sub-frame refers to an audio signal divided into frames, each frame comprising a predetermined number of sound samples, and there are a predetermined number of sample points overlap between adjacent frames;

[0100] 加窗,采用加汉宁窗滤波处理,还可以是其他的加窗方式。 [0100] windowing, filtering process using the Hanning window, it may also be other windowing mode.

[0101] 去静音,包括: [0101] to silence, comprising:

[0102] 计算每帧的短时能量,公式为: [0102] The short-term energy calculation for each frame, the formula is:

N-1 N-1

[0103] En = ^lu,(m )λ.(n + m)]2 [0103] En = ^ lu, (m) λ. (N + m)] 2

m — Q m - Q

[0104] 上式中,w表不窗函数,χ为声音信号,η = 0,1L, 2L,..., N为巾贞长,L为巾贞移长度; [0104] the above formula, w is a window function table is not, χ sound signal, η = 0,1L, 2L, ..., N is the length of towel Chen, L is the length of towel Zhen shift;

[0105] 当该帧的短时能量低于某一阈值时,就认为它是静音帧,直接去除。 [0105] When the short-term energy of the frame is below a certain threshold, it is considered silence frame, removed directly. 静音并不包含有效的声音特征,因此需要去除。 It does not contain valid voice mute feature, thus requiring removal.

[0106] S14:对预处理后的音频提取声音特征系数。 [0106] S14: audio sound feature extraction coefficient after pretreatment. 所述声音特征系数可以是MFCC、LPCC, LSP、PLP中的一种;MFCC是指Mel频率倒谱系数,LPCC是指线性预测倒谱系数,LSP是指线谱对系数,PLP是指感知线性预测系数,这些系数都能很好地表征声音的音色特征,可任选一种。 The sound may be a feature coefficients MFCC, LPCC, LSP, PLP in; refers MFCCs Mel-frequency cepstral coefficients, LPCC refers to a linear predictive cepstral coefficients, LSP refers to a line spectral pair coefficients, PLP refers to a perceptual linear prediction coefficients that characterize the timbre characteristics of the sound can be good, can choose one. 本发明优选提取MFCC或LPCC声音特征系数。 The present invention is preferably extracted or LPCC sound feature MFCC coefficients.

[0107] S2:歌星模型训练过程,如图1〜图2所示。 [0107] S2: singer model training process, as shown in Figure 1 ~ 2.

[0108] 将提取的声音特征系数集中在一起训练出通用背景模型UBM,并根据每个歌星所对应的声音特征系数集,利用背景模型UBM自适应训练出音频库中所有歌星的模型。 [0108] The coefficients of the extracted sound feature together a universal background model UBM training, and in accordance with the sound feature coefficient set corresponding to each singer, UBM using Adaptive Background Audio Library trained models for all of the singer. UBM模型其实是一个高混合度的高斯模型,其训练过程与GMM类似,采用EM迭代算法,这里不详述。 UBM model Gaussian model is actually a high degree of mixing, which is similar to the training process and GMM, using the EM iterative algorithm, not detailed here.

[0109]自适应训练出歌星的模型过程,如图2所示,采用贝叶斯自适应算法,具体如下: [0109] an adaptive training process model singer, shown in Figure 2, Bayesian adaptive algorithm, as follows:

[0110] 对于UBM的第i个混合成员,计算分量i的后验概率: [0110] For the i-th members of UBM mixing of component i is calculated posterior probability:

[0111] Am 财 [0111] Am Choi

LmwJpAx^ LmwJpAx ^

[0112] 其中χ表示特征系数,w表示权重系数; [0112] where χ represents a characteristic coefficient, w represents a weighting coefficient;

[0113] 然后计算权重、均值和方差: [0113] and then calculating the weight, mean and variance:

[0114] ", = Σ /,(/1 -V,) ,Ej (.V)=丄ZP(i 1-V, )λ:Ei (.ν?)=丄Σ /,(/.丨λ., ).v; [0114] ", = Σ /, (/ 1 -V,), Ej (.V) = Shang ZP (i 1-V,) λ: (? .Ν). Ei = Shang Σ /, (/ [lambda] Shu .,) .v;

ί=ι nt=1 η t=x ί = ι nt = 1 η t = x

[0115] 接着修正旧UBM中各个高斯分布的参数Wi,Ui,「 [0115] Next Wi various parameters of the Gaussian distribution correction in the old UBM, Ui, "

[0116]修正后的新权重:IT+ {\-17;)u.;.Jy ; [0116] Right after correction the new weight: IT + {\ -17;) u; Jy;..

[0117]修正后的新均值:Κμ; = α';Έί(χ) + {\-α'!')μί ; [0117] The new corrected mean: Κμ; = α '; Έί (χ) + {\ -α' ') μί;!

[0118]修正后的新方差:h = a:Ei(x2) + (la; + //:)-μ:; [0118] The new corrected variance: h = a: Ei (x2) + (la; + //:) - μ :;

[0119] 其中,Y为规则因子,用来保证&的和为I, 分别为对第i个高斯的权重、均值,方差的修正因子, [0119] wherein, Y is the rule factor, and is used to ensure & I, respectively, for the i-th Gaussian weights, mean, variance, a correction factor,

[0120] < = 二.P= U.,/»,V,式中,rP为常数,用来约束修正因子的变化尺度,一 [0120] <= diethylene .P = U., / », V, wherein, rP is a constant for a constrained variation in the scale factor correction, a

Ui + Ui +

般选16。 16 general election.

[0121] 本步骤可训练出一个通用UBM模型以及所有歌星的音色模型。 [0121] The present step can train a model and the general tone UBM all model singer.

[0122] S3:音色匹配过程(如图3〜4所示): [0122] S3: Voice matching process (shown in FIG. 3 to 4):

[0123] S31:用户声音样本处理:对于给定的用户,即演唱者的声音样本,同样进行预处理,并提取声音特征系数; [0123] S31: the user's voice sample processing: for a given user, i.e. the singer's voice samples, the same pretreatment, and sound feature extraction coefficient;

[0124] S32:接着计算提取的声音特征系数与歌星模型以及与通用模型UBM的对数似然比(如图4所示),将对数似然比最大值所对应的歌星作为推荐歌星。 [0124] S32: then calculate the extracted sound features and the number of coefficients of the model singer generic model UBM likelihood ratio (FIG. 4), logarithmic likelihood ratio corresponding to the maximum as recommended singer singer.

[0125] 对数似然比的计算公式为: [0125] The log likelihood ratio is calculated as:

Γ Γ

[0126] S(X) = J] log p(x, I ) - log p(x, I Anlm ), [0126] S (X) = J] log p (x, I) - log p (x, I Anlm),

[0127] 其中χ表示特征系数,λ stm,Aubm表示歌星模型与UBM模型,p表示歌星模型或UBM模型输出特征矢量序列的似然度; [0127] where χ represents a characteristic coefficient, λ stm, Aubm UBM model represents a model singer, p represents the model singer or UBM model output likelihood feature vector sequence;

[0128] 此处采用时间归一化的对数似然比, [0128] Here using the time normalized log-likelihood ratio,

I τ I τ

[0129] SiX) = ~Y^ log p(x, I 人„".) - log p(x, I Aihn)。 [0129] SiX) = ~ Y ^ log p (x, I al "") -. Log p (x, I Aihn).

Z 1-1 Z 1-1

[0130] 本步骤可找出与用户音色接近的歌星作为推荐,从而增加用户演唱的乐趣。 [0130] The present step is to identify the user tone close to the singer as a recommendation, thereby increasing the user's concert fun.

[0131] 在其他实施方式中,也可以使用GMM、HMM等声音模型作为歌星模型训练以及音色匹配的方法。 [0131] In other embodiments, it may also be used GMM, HMM acoustic model, etc. As a method for training model and singer's voice matching.

[0132] 第二实施例: [0132] Second Example:

[0133] 一种演唱者音色相似的歌星推荐装置,其包括:音频库处理模块、歌星模型训练模块和音色匹配模块, [0133] A similar tone singer singer recommendation apparatus comprising: an audio processing module library, singers voice model training module and a matching module,

[0134] 音频库处理模块:用于获得所有歌星的纯人声音频,再对纯人声音频进行预处理,然后分别提取每个纯人声音频的声音特征系数集; [0134] The audio processing module library: the star for obtaining pure vocal all audio, then pure vocal audio preprocessing, and then were extracted feature of each pure vocal sound audio set of coefficients;

[0135] 歌星模型训练模块:用于根据每个歌星所对应的特征系数集,采用声音模型算法训练出对应歌星模型; [0135] Singer Model training module: The set of coefficients for each of the singer corresponding features used to train the acoustic model corresponding to the singer model algorithm;

[0136] 音色匹配模块:用于对给定的用户的声音样本,进行预处理,并提取特征系数集;然后将用户声音样本的特征系数集与所有歌星模型进行匹配,找出音色最相似的歌星。 [0136] Voice matching module: for a given user's voice samples, pretreatment, and extract a feature coefficient set; feature set of coefficients is then the user's voice sample match all star model is the most similar timbre singer.

[0137] 下面具体描述该实施例。 [0137] This embodiment is described in detail below.

[0138] 一种演唱者音色相似的歌星推荐装置,如图5所示,包括: [0138] A similar star singer Voice recommendation means 5, comprising:

[0139] 音频库处理模块,用于对音频库中的所有歌曲去除伴奏得到纯人声音频,再对纯人声音频进行预处理,然后对预处理后的音频提取声音特征系数; [0139] The audio processing module library, a library of all song audio removed to give pure vocal audio track, and then the pure vocal audio preprocessing and feature extraction coefficient audio sound after pretreatment;

[0140] 歌星模型训练模块,用于将提取的声音特征系数集中在一起训练出通用背景模型UBM,并根据每个歌星所对应的声音特征系数集,利用背景模型UBM自适应训练出音频库中所有歌星的模型; [0140] Singer model training module, for extracting sound features to train the coefficients grouped together UBM universal background model, and in accordance with the sound feature coefficient set corresponding to each of the singers, the background model using an adaptive training UBM Audio Library All singers model;

[0141] 音色匹配模块,用于对给定用户的声音样本进行预处理并提取声音特征系数;然后计算提取的声音特征系数与歌星模型以及与通用模型UBM的对数似然比,将对数似然比最大值所对应的歌星作为推荐歌星。 [0141] Voice matching module, configured to sample a given user's voice sound feature extraction and preprocessing coefficient; and calculating the extracted sound features and the number of coefficients of the model singer generic model UBM likelihood ratio, logarithmic likelihood ratio corresponding to the maximum as recommended singer singer.

[0142] 对音频库中的所有歌曲去除伴奏得到纯人声音频的方法,参考专利名称为《一种立体声音频的处理方法与装置》,专利申请号为:201410263446.3的中国发明专利。 [0142] All song audio library removed to give pure vocal accompaniment audio method, refer to Patent entitled "Method and apparatus for the processing of stereo audio", patent application number: 201410263446.3 Chinese invention patent. 该方法主要利用立体声左右声道之间伴奏与人声的差异性,对伴奏进行抑制滤波,从而提取人声。 The stereo method using left and right main difference between the vocal and accompaniment channel, suppress accompaniment filter, thereby extracting the human voice.

[0143] 具体包括: [0143] comprises:

[0144] 将立体声音频的左右声道信号变换到频域; [0144] The left and right channel stereo audio signal into the frequency domain;

[0145] 计算左声道频域信号与右声道频域信号相应频点对的幅度比值,对幅度比值在预设范围内的频点列为待衰减的频点,且计算左声道频域信号与右声道频域信号相应频点对的相位差,将相位差差值在预设范围内的频点也列为待衰减的频点;所述幅度比值的计算公式为: [0145] Calculation of the left channel frequency domain signal and the right channel frequency domain signal corresponding to the ratio of the amplitude of the frequency, the amplitude ratio is within a predetermined frequency range as the frequency to be attenuated, and calculates the left channel frequency frequency-domain signal and the right channel frequency domain signal corresponding to the phase difference, the phase difference is within a preset range frequencies can also be attenuated as frequency; calculated as the amplitude ratio:

[0146] kn(i) = abs (fft_frameRn(i)) /abs (fft_frameLn(i)) * (2/ π ), [0146] kn (i) = abs (fft_frameRn (i)) / abs (fft_frameLn (i)) * (2 / π),

[0147] 公式中η = 0,1,2,..., Ν_1,表示巾贞号i = O, I, 2..., FN/2, FN表示傅里叶变换的点数,相位差的计算公式为: [0147] In the formula η = 0,1,2, ..., Ν_1, Chen indicates towel number i = O, I, 2 ..., FN / 2, FN represents the number of points of the Fourier transform, the phase difference The formula is:

[0148] pn(i) = angel (fft_frameLn(i)) -angel (fft_frameRn(i)), [0148] pn (i) = angel (fft_frameLn (i)) -angel (fft_frameRn (i)),

[0149] n = 0,1,2,…,N_1 ;i = 0,1,2,...,FN/2 ; [0149] n = 0,1,2, ..., N_1; i = 0,1,2, ..., FN / 2;

[0150] 接着,筛选出待衰减的频点,也就是将幅度比值落在一定范围的频点,其中频点i符合 [0150] Next, filter out frequency to be attenuated, i.e. the amplitude ratio falls frequency range, wherein the frequency i meet

[0151] kn(i)〈a 或匕⑴〉^,0〈α〈0.5,0.5〈β〈1,a 取0.4,β 取0.6, [0151] kn (i) <a 或匕⑴> ^, 0 <α <0.5,0.5 <β <1, a take 0.4, β is 0.6,

[0152] 或将相位差值落在一定范围的频点,其中i符合 [0152] or the retardation value falls in a certain frequency range, where i meet

[0153] ρηα)〈Φ ^ρΛΐ)>φ^^<Φ<0,0<φ<π,这里φ 取_0.L 妒取0.1,列为待衰减的频点; [0153] ρηα) <Φ ^ ρΛΐ)> φ ^^ <Φ <0,0 <φ <π, where [Phi] is set to 0.1 to take _0.L jealous, as the frequency to be attenuated;

[0154] 对待衰减的频点,即伴奏成分进行衰减处理,公式为: [0154] treat frequency attenuation, i.e. accompaniment component attenuation processing, the formula is:

[0155] fft_frameRn(i) = O 或fft_frameLn(i) = O,公式中,i 为待衰减的频点; [0155] fft_frameRn (i) = O or fft_frameLn (i) = O, equations, i is the frequency to be attenuated;

[0156] 将衰减后的频域信号逆变换为时域,即可得到去除伴奏后的歌曲音频。 [0156] The frequency domain signal after the inverse transform attenuated time domain, can be obtained after removal of song audio accompaniment.

[0157] 所述声音特征系数,为MFCC、LPCC、LSP、PLP中的一种。 The [0157] sound characteristic coefficient of MFCC, LPCC, LSP, PLP of one.

[0158] 所述音频库处理模块和音色匹配模块中的预处理,包括:分帧、加窗、去静音; [0158] The audio processing module and a library of pre-tone matching module, comprising: a sub-frame, windowing, to mute;

[0159] 所述分帧,是指将音频信号分成若干帧,每帧包括预设数量的声音采样点,并且相邻帧之间有预设数量的重合采样点; [0159] The sub-frame refers to the audio signal divided into frames, each frame comprising a predetermined number of sound samples, and there are a predetermined number of sample points overlap between adjacent frames;

[0160] 所述加窗,是指加汉宁窗滤波处理。 [0160] The windowing, filtering means Hanning window.

[0161] 所述预处理步骤中的去静音操作,包括: [0161] The pretreatment step to mute operation, comprising:

[0162] 计算每帧的短时能量,公式为: [0162] The short-term energy calculation for each frame, the formula is:

N -1 N -1

[0163]五 „ 二I [ W (m ) χ (n + m )]2 [0163] V. "two I [W (m) χ (n + m)] 2

m = 0 m = 0

[0164] 上式中,w表不窗函数,χ为声音信号,η = 0,1L, 2L,..., N为巾贞长,L为巾贞移长度; [0164] the above formula, w is a window function table is not, χ sound signal, η = 0,1L, 2L, ..., N is the length of towel Chen, L is the length of towel Zhen shift;

[0165] 当该帧的短时能量低于某一阈值时,就认为它是静音帧,直接去除。 [0165] When the short-term energy of the frame is below a certain threshold, it is considered silence frame, removed directly.

[0166] 所述歌星模型训练模块中的自适应训练出歌星的模型过程,采用贝叶斯自适应算法,具体包括: [0166] The star adaptive model training module to train the model of the process of the singer, Bayesian adaptive algorithm comprises:

[0167] 对于UBM的第i个混合成员,计算分量i的后验概率: [0167] For the i-th members of UBM mixing of component i is calculated posterior probability:

Ρ(ί IM _ wiPi(x,) Ρ (ί IM _ wiPi (x,)

[0168] AM λ,.)— [0168] AM λ,). -

LmwJ P λχ>) LmwJ P λχ>)

[0169] 其中X表示特征系数,w表示权重系数; [0169] wherein X represents a characteristic coefficient, w represents a weighting factor;

[0170] 然后计算权重、均值和方差: [0170] and then calculating the weight, mean and variance:

[0171] ", = Σ I) ,eI ⑴=丄Σ I χ丨K.Ei (χ2)=丄Σ 1-ν, [0171] ", = Σ I), eI ⑴ = Σ I χ Shang Shu K.Ei (χ2) = Shang Σ 1-ν,

t-\ n ί-1 n. t- \ n ί-1 n.

[0172] 接着修正旧UBM中各个高斯分布的参数Wi, Ui, ^ [0172] Next correction parameter Wi of each Gaussian distribution in the old UBM, Ui, ^

[0173]修正后的新权重:w, = |_0./ Γ + (丨-f jy [0173] The new right corrected re: w, = | _0./ Γ + (Shu -f jy

[0174]修正后的新均值:1、= ⑴+ (卜0.; [0174] The new corrected mean: 1, = ⑴ + (BU 0 .;

[0175]修正后的新方差:= a'; Ej(χ2) + (1- a,' )(d'f + μ]) - μ]; [0175] The new variance corrected: = a '; Ej (χ2) + (1- a,') (d'f + μ]) - μ];

[0176] 其中,Υ为规则因子,用来保证&的和为1,,<分别为对第i个高斯的权重、均值,方差的修正因子, [0176] wherein, the rule Upsilon factor, and is used to ensure & TE01 <i th are Gaussian weights, mean, variance, a correction factor,

P Hi P Hi

[0177] < =^r7W = u'./".V,式中,rp为常数,用来约束修正因子的变化尺度。 [0177] <= ^ r7W = u './ ". V, wherein, RP is constant, changes in the scale used to constrain the correction factor.

11 j 十/ 11 j ten /

[0178] 所述音色匹配模块中的对数似然比的计算公式为: [0178] The matching module Voice log- likelihood ratio is calculated as:

[0179] S(X) = Yj log p(x, I /I,,,,,.) - log p(x, i Anhm), [0179] S (X) = Yj log p (x, I / I ,,,,,.) - log p (x, i Anhm),

/-1 /-1

[0180] 其中χ表示特征系数,λ stm,λ-表示歌星模型与UBM模型,p表示歌星模型或UBM模型输出特征矢量序列的似然度; [0180] where χ represents a characteristic coefficient, λ stm, λ- UBM model represents a model singer, p represents the model singer or UBM model output likelihood feature vector sequence;

[0181] 此处采用时间归一化的对数似然比, [0181] Here using the time normalized log-likelihood ratio,

[οι82] 5(^) = 7ΣloS P1-' IK,,.) - 1Og I Khm)。 [Οι82] 5 (^) = 7ΣloS P1- 'IK ,,) -. 1Og I Khm).

丄t=\ Shang t = \

[0183] 本发明提出一种演唱者音色相似的歌星推荐方法及其装置,为演唱者找出与其音色相似的歌星作为参考,可以增加演唱的乐趣。 [0183] The present invention proposes a singer's voice singing star recommended similar method and apparatus, to find pleasure similar thereto singers sound as a reference, may be added to the singer singing. 应用于KTV场景中,能够吸引大量用户,刺激消费,并提高用户模仿歌星音色的水平。 KTV applied to the scene, able to attract large numbers of users, stimulate consumption, and improve the level of user imitate singers sound.

[0184] 以上所述仅为本发明的较佳实施用例而已,并非用于限定本发明的保护范围。 [0184] The foregoing is only preferred embodiments of the present invention with embodiments, but not intended to limit the scope of the present invention. 凡在本发明的精神和原则之内,所作的任何修改、等同替换以及改进等,均应包含在本发明的保护范围之内。 Within the spirit and principle of the present invention, any modifications, equivalent substitutions and improvements should be included within the scope of the present invention.

Claims (15)

1.一种演唱者音色相似的歌星推荐方法,其特征在于:包括如下步骤: 音频库处理:获得所有歌星的纯人声音频,再对纯人声音频进行预处理,然后分别提取每个纯人声音频的声音特征系数集; 歌星模型训练:根据每个歌星所对应的特征系数集,用声音模型算法训练出对应歌星模型; 音色匹配:对于给定的用户的声音样本,进行预处理,并提取特征系数集;然后将用户声音样本的特征系数集与所有歌星模型进行匹配,找出音色最相似的歌星。 A similar star singers voice recommendation method, characterized by: comprising the steps of: processing an audio library: obtaining a pure vocal singer of all audio, then pure vocal audio preprocessing, and then were extracted every pure audio vocal sound feature coefficient set; singer model training: according to a feature set of coefficients corresponding to each of the star, with the corresponding acoustic model algorithm to train model singer; voice match: for a given user's voice samples, pretreatment, and extract a feature coefficient set; feature set of coefficients is then the user's voice sample match all star model is the most similar sounds singer.
2.如权利要求1所述的一种演唱者音色相似的歌星推荐方法,其特征在于:所述歌星的纯人声音频获得方式包括:通过歌曲去伴奏方式获得。 2. An singer's voice according to a similar method as claimed in claim recommended singer, wherein: the singer obtained pure vocal audio mode comprising: a song to the accompaniment pattern is obtained.
3.如权利要求1所述的一种演唱者音色相似的歌星推荐方法,其特征在于:所述歌星模型训练步骤包括:首先将音频库中提取的所有声音特征系数集集中在一起训练出通用背景模型UBM ;接着根据每个歌星所对应的特征系数集,利用通用背景模型UBM自适应训练出音频库中所有歌星的模型。 3. An singer's voice according to a similar method as claimed in claim recommended singer, wherein: the star model training step comprises: first set all together training audio sound feature coefficients extracted from a universal library UBM background model; then the feature set of coefficients corresponding to each of the star, using a universal background model UBM adaptation training models for all the audio library singer.
4.如权利要求1所述的一种演唱者音色相似的歌星推荐方法,其特征在于:所述音色匹配步骤中,“将用户声音样本的特征系数与所有歌星模型进行匹配,找出音色最相似的歌星”的操作包括:计算用户声音样本的特征系数集与歌星模型以及与通用模型UBM的对数似然比,将对数似然比最大值所对应的歌星作为推荐歌星。 4. An singer's voice according to a similar method as claimed in claim recommended singer, wherein: said step of matching tone, "wherein the user's voice sample coefficients matched with all star models, to identify the most timbre similar star "operation comprising: calculating the user's voice characteristics with a sample set of coefficients and the model and the generic model singer UBM logarithmic likelihood ratio, logarithmic likelihood ratio corresponding to the maximum as recommended singer singer.
5.如权利要求1所述的一种演唱者音色相似的歌星推荐方法,其特征在于:所述声音特征系数,为MFCC、LPCC、LSP、PLP中的一种。 5. An singer's voice according to a similar method as claimed in claim recommended singer, wherein: said sound characteristic coefficient of MFCC, LPCC, LSP, PLP of one.
6.如权利要求1所述的一种演唱者音色相似的歌星推荐方法,其特征在于:所述音频库处理步骤和音色匹配步骤中的预处理步骤均依次包括:分帧、加窗、去静音; 所述去静音,包括如下步骤: 计算每帧的短时能量,公式为: 6. An singer's voice according to a similar method as claimed in claim recommended singer, wherein: said step of processing audio and tone library matching step preprocessing steps are successively comprising: a sub-frame, windowing, to mute; to the mute, comprising the steps of: calculating a short-term energy of each frame, the formula is:
Figure CN104183245AC00021
上式中,W表不窗函数,X为声音信号,η = 0,1L, 2L,..., N为帧长,L为帧移长度; 当该帧的短时能量低于某一阈值时,就认为它是静音帧,直接去除。 In the above formula, W is not window function table, X is a sound signal, η = 0,1L, 2L, ..., N is the frame length, L is the length of a frame shift; frame when the short-term energy below a certain threshold when it is considered silence frame, direct removal.
7.如权利要求3所述的一种演唱者音色相似的歌星推荐方法,其特征在于:所述自适应训练出音频库中所有歌星的模型,采用贝叶斯自适应算法,具体包括: 对于UBM的第i个混合成员,计算分量i的后验概率: 7. An singers voice of the singer 3 similar recommendation method as claimed in claim, wherein: said adaptive training all the audio library model singer, Bayesian adaptive algorithm comprises: for mixing the i-th members of UBM, the posterior probability calculation component i is:
Figure CN104183245AC00022
其中X表示特征系数,W表示权重系数; 然后计算权重、均值和方差: Wherein X represents a characteristic coefficient, W represents a weighting coefficient; and calculating the weight, mean and variance:
Figure CN104183245AC00023
接着修正旧UBM中各个高斯分布的参数Wi, Ui, 4, 修正后的新权重:& Γ +(卜; 修正后的新均值:& = α;"Ε,(χ)+ {\-α;")μί ; 修正后的新方差:片=a]Ej (χ') + (1 ){0; + μ)) - μ;; 其中,Y为规则因子,用来保证ΐ的和为1,<,«分别为对第i个高斯的权重、均值,方差的修正因子, < == w,w,v,式中,rp为常数,用来约束修正因子的变化尺度。 Then the correction old UBM respective Gaussian distribution parameters Wi, Ui, 4, revised new weights: & Γ + (BU; new mean value after correction: & = α; "Ε, (χ) + {\ -α; ") μί; new corrected variance: sheet = a] Ej (χ ') + (1) {0; + μ)) - μ ;; wherein, Y is the rule factor, and is used to ensure ΐ 1, <, «respectively for the i-th Gaussian weights, mean, variance, a correction factor, <== w, w, v, wherein, RP is constant, changes in the scale used to constrain the correction factor. Tl ,十V Tl, V ten
8.如权利要求4所述的一种演唱者音色相似的歌星推荐方法,其特征在于:所述对数似然比的计算公式为: = ^X1g p(x, I Λ„„.) - log ρ(χ, I Λ,/„„.), I j=i 其中χ表示特征系数,T表示帧数,λ star, λ ubm表示歌星模型与UBM模型,P表示歌星模型或UBM模型输出特征矢量序列的似然度。 8. An singers voice 4 star similar recommendation method as claimed in claim, wherein: the log-likelihood ratio is calculated as: = ^ X1g p - (x, I Λ "".) log ρ (χ, I Λ, / "".), I j = i where [chi] represents a characteristic coefficient, T is the number of frames, λ star, λ ubm UBM model represents a model singer, P denotes the output characteristic UBM model or model singer likelihood vector sequence.
9.一种演唱者音色相似的歌星推荐装置,其特征在于:包括:音频库处理模块、歌星模型训练模块和音色匹配模块, 音频库处理模块:用于获得所有歌星的纯人声音频,再对纯人声音频进行预处理,然后分别提取每个纯人声音频的声音特征系数集; 歌星模型训练模块:用于根据每个歌星所对应的特征系数集,采用声音模型算法训练出对应歌星模型; 音色匹配模块:用于对给定的用户的声音样本,进行预处理,并提取特征系数集;然后将用户声音样本的特征系数集与所有歌星模型进行匹配,找出音色最相似的歌星。 A similar star singer Voice recommendation apparatus, characterized by: comprising: an audio processing module library, singers voice model training module and a matching module, an audio processing module library: the star for obtaining pure vocal all audio, then pure vocal audio preprocessing, and then were extracted from each audio pure vocal sound feature coefficient set; singer model training module: according to a feature for a set of coefficients corresponding to each singer, using the corresponding algorithm to train the acoustic model singer model; voice matching module: for a given user's voice samples, pretreatment, and extract a feature coefficient set; and wherein the set of coefficients for all users and singer's voice samples to match the model, is the most similar sounds singer .
10.如权利要求9所述的一种演唱者音色相似的歌星推荐装置,其特征在于:所述歌星的纯人声音频获得方式包括:通过歌曲去伴奏方式获得。 10. An singers voice of the singer 9 similar recommendation apparatus as claimed in claim, wherein: the singer obtained pure vocal audio mode comprising: a song to the accompaniment pattern is obtained.
11.如权利要求9所述的一种演唱者音色相似的歌星推荐装置,其特征在于:所述歌星模型训练模块包括:将音频库中提取的所有声音特征系数集集中在一起训练出通用背景模型丽; 接着根据每个歌星所对应的特征系数集,利用通用背景模型UBM自适应训练出音频库中所有歌星的模型。 11. An singers voice of the singer 9 similar recommendation apparatus as claimed in claim, wherein: the star model training module comprising: the current collector concentrate all coefficients Audio Library sound feature extraction trained with the general context Li model; then the feature set of coefficients corresponding to each singer, UBM adaptation using the universal background model is trained models for all audio library singer.
12.如权利要求9所述的一种演唱者音色相似的歌星推荐装置,其特征在于:所述音色匹配模块中,“将用户声音样本的特征系数与所有歌星模型进行匹配,找出音色最相似的歌星”的操作包括:计算用户声音样本的特征系数集与歌星模型以及与通用模型UBM的对数似然比,将对数似然比最大值所对应的歌星作为推荐歌星。 12. An singers voice of the singer 9 similar recommendation apparatus as claimed in claim, characterized in that: said tone matching module, "the coefficients of the user's voice sample match all model singer, tone identify the most similar star "operation comprising: calculating the user's voice characteristics with a sample set of coefficients and the model and the generic model singer UBM logarithmic likelihood ratio, logarithmic likelihood ratio corresponding to the maximum as recommended singer singer.
13.如权利要求9所述的一种演唱者音色相似的歌星推荐方法,其特征在于:所述音频库处理模块和音色匹配模块中的预处理步骤均依次包括:分帧、加窗、去静音; 所述去静音,包括如下步骤: 计算每帧的短时能量,公式为: En νν (m ) χ (η + ηι)]2 m = O 上式中,W表不窗函数,X为声音信号,η = 0,1L, 2L,..., N为巾贞长,L为巾贞移长度; 当该帧的短时能量低于某一阈值时,就认为它是静音帧,直接去除。 13. The one of the singer's voice of singer 9 similar to the recommended method as claimed in claim, wherein: the audio processing module and a voice database matching module preprocessing steps are successively comprising: a sub-frame, windowing, to mute; to the mute, comprising the steps of: calculating a short-term energy of each frame, the formula is: En νν (m) χ (η + ηι)] 2 m = O the formula, W is a window function table is not, X is sound signal, η = 0,1L, 2L, ..., N is the length of towel Chen, L the length of towel shift Zhen; short when the energy of the frame is below a certain threshold, it is considered a silence frame, directly removed.
14.如权利要求11所述的一种演唱者音色相似的歌星推荐装置,其特征在于:所述自适应训练出音频库中所有歌星的模型,采用贝叶斯自适应算法,具体包括: 对于UBM的第i个混合成员,计算分量i的后验概率: LmwjPAx^ 其中χ表示特征系数,w表示权重系数; 然后计算权重、均值和方差: = YjPOlxt),瓦O)=丄Σ厂GI.ν/K G(-—)=丄ΣP^i Ir,K2 nt=x η ΐ=χ 接着修正旧UBM中各个高斯分布的参数Wi, μ i,, 修正后的新权重: =[a]'η;/T + (\ -a; )η)]ν ; 修正后的新均值://,.= a;'%(x) + (la;H)"丨; 修正后的新方差:(r- =α;Ξ:(χ2) + (\-α;)(0~ +//;)-//;; 其中,Y为规则因子,用来保证&的和为1,<,«分别为对第i个高斯的权重、均值,方差的修正因子, a? = 式中,rP为常数,用来约束修正因子的变化尺度。 14. The one of the singer's voice 11 similar star recommendation apparatus as claimed in claim wherein: said adaptive audio library model singer all the training, Bayesian adaptive algorithm comprises: for the i-th mixing members of UBM calculated component i posterior probability: LmwjPAx ^ where χ represents a characteristic coefficient, W represents a weighting coefficient; and calculating the weight, mean and variance: = YjPOlxt), W O) = Shang Σ plant GI. ν / KG (-) = Shang ΣP ^ i Ir, K2 nt = x η ΐ = χ Next UBM old correction parameter Wi of each Gaussian distribution, μ i ,, new weights corrected weight: = [a] 'η ; / T + (\ -a;) η)] ν; new mean corrected: //,.= a; '% (x) + (la; H) "Shu; new variance after the correction: (r - = α; Ξ: (χ2) + (\ -α;) (0 ~ + //;) - // ;; wherein, Y is the rule factor, and is used to ensure & 1, <, «respectively of the i-th Gaussian weights, mean, variance, a correction factor, a? = where, rP is constant, changes in the scale used to constrain the correction factor.
15.如权利要求12所述的一种演唱者音色相似的歌星推荐装置,其特征在于:所述对数似然比的计算公式为: S(x) = ^E1Og /H vi I λΜΓ) - log p(x, I Aiihiil), I /-! 其中χ表示特征系数,T表示帧数,λ star, λ ubm表示歌星模型与UBM模型,P表示歌星模型或UBM模型输出特征矢量序列的似然度。 15. An artist tone similar to star 12 of the recommendation apparatus of claim, wherein: the log-likelihood ratio is calculated as: S (x) = ^ E1Og / H vi I λΜΓ) - log p (x, I Aiihiil), I / -! where χ represents a characteristic coefficient, T is the number of frames, λ star, λ ubm UBM model represents a model singer, P represents the likelihood model singer or UBM model output feature vector sequences degree.
CN 201410448290 2014-09-04 2014-09-04 Method and device for recommending music stars with tones similar to those of singers CN104183245A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201410448290 CN104183245A (en) 2014-09-04 2014-09-04 Method and device for recommending music stars with tones similar to those of singers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201410448290 CN104183245A (en) 2014-09-04 2014-09-04 Method and device for recommending music stars with tones similar to those of singers

Publications (1)

Publication Number Publication Date
CN104183245A true CN104183245A (en) 2014-12-03

Family

ID=51964235

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201410448290 CN104183245A (en) 2014-09-04 2014-09-04 Method and device for recommending music stars with tones similar to those of singers

Country Status (1)

Country Link
CN (1) CN104183245A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104464725A (en) * 2014-12-30 2015-03-25 福建星网视易信息系统有限公司 Method and device for singing imitation
CN105554281A (en) * 2015-12-21 2016-05-04 联想(北京)有限公司 Information processing method and electronic device
CN105575393A (en) * 2015-12-02 2016-05-11 中国传媒大学 Personalized song recommendation method based on voice timbre
CN105679324A (en) * 2015-12-29 2016-06-15 福建星网视易信息系统有限公司 Voiceprint identification similarity scoring method and apparatus
CN106095925A (en) * 2016-06-12 2016-11-09 北京邮电大学 Individualized song recommending system based on vocal music characteristics

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1567431A (en) * 2003-07-10 2005-01-19 上海优浪信息科技有限公司 Method and system for identifying status of speaker
US20050027514A1 (en) * 2003-07-28 2005-02-03 Jian Zhang Method and apparatus for automatically recognizing audio data
CN1897109A (en) * 2006-06-01 2007-01-17 电子科技大学 Single audio-frequency signal discrimination based on MFCC
CN101021854A (en) * 2006-10-11 2007-08-22 鲍东山 Audio analysis system based on content
CN101351761A (en) * 2005-10-27 2009-01-21 高通股份有限公司 Method and apparatus for achieving flexible bandwidth using variable guard bands
CN101577117A (en) * 2009-03-12 2009-11-11 北京中星微电子有限公司 Extracting method of accompaniment music and device
CN101944359A (en) * 2010-07-23 2011-01-12 杭州网豆数字技术有限公司 Voice recognition method facing specific crowd
CN101980336A (en) * 2010-10-18 2011-02-23 福州星网视易信息系统有限公司 Hidden Markov model-based vehicle sound identification method
CN102394062A (en) * 2011-10-26 2012-03-28 华南理工大学 Method and system for automatically identifying voice recording equipment source
CN102543073A (en) * 2010-12-10 2012-07-04 上海上大海润信息系统有限公司 Shanghai dialect phonetic recognition information processing method
CN103065623A (en) * 2012-12-17 2013-04-24 深圳Tcl新技术有限公司 Timbre matching method and timbre matching device
CN103177722A (en) * 2013-03-08 2013-06-26 北京理工大学 Tone-similarity-based song retrieval method
CN103236260A (en) * 2013-03-29 2013-08-07 京东方科技集团股份有限公司 Voice recognition system
CN103474065A (en) * 2013-09-24 2013-12-25 贵阳世纪恒通科技有限公司 Method for determining and recognizing voice intentions based on automatic classification technology
CN103730121A (en) * 2013-12-24 2014-04-16 中山大学 Method and device for recognizing disguised sounds
CN103871423A (en) * 2012-12-13 2014-06-18 上海八方视界网络科技有限公司 Audio frequency separation method based on NMF non-negative matrix factorization
CN103943113A (en) * 2014-04-15 2014-07-23 福建星网视易信息系统有限公司 Method and device for removing accompaniment from song

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1567431A (en) * 2003-07-10 2005-01-19 上海优浪信息科技有限公司 Method and system for identifying status of speaker
US20050027514A1 (en) * 2003-07-28 2005-02-03 Jian Zhang Method and apparatus for automatically recognizing audio data
CN101351761A (en) * 2005-10-27 2009-01-21 高通股份有限公司 Method and apparatus for achieving flexible bandwidth using variable guard bands
CN1897109A (en) * 2006-06-01 2007-01-17 电子科技大学 Single audio-frequency signal discrimination based on MFCC
CN101021854A (en) * 2006-10-11 2007-08-22 鲍东山 Audio analysis system based on content
CN101577117A (en) * 2009-03-12 2009-11-11 北京中星微电子有限公司 Extracting method of accompaniment music and device
CN101944359A (en) * 2010-07-23 2011-01-12 杭州网豆数字技术有限公司 Voice recognition method facing specific crowd
CN101980336A (en) * 2010-10-18 2011-02-23 福州星网视易信息系统有限公司 Hidden Markov model-based vehicle sound identification method
CN102543073A (en) * 2010-12-10 2012-07-04 上海上大海润信息系统有限公司 Shanghai dialect phonetic recognition information processing method
CN102394062A (en) * 2011-10-26 2012-03-28 华南理工大学 Method and system for automatically identifying voice recording equipment source
CN103871423A (en) * 2012-12-13 2014-06-18 上海八方视界网络科技有限公司 Audio frequency separation method based on NMF non-negative matrix factorization
CN103065623A (en) * 2012-12-17 2013-04-24 深圳Tcl新技术有限公司 Timbre matching method and timbre matching device
CN103177722A (en) * 2013-03-08 2013-06-26 北京理工大学 Tone-similarity-based song retrieval method
CN103236260A (en) * 2013-03-29 2013-08-07 京东方科技集团股份有限公司 Voice recognition system
CN103474065A (en) * 2013-09-24 2013-12-25 贵阳世纪恒通科技有限公司 Method for determining and recognizing voice intentions based on automatic classification technology
CN103730121A (en) * 2013-12-24 2014-04-16 中山大学 Method and device for recognizing disguised sounds
CN103943113A (en) * 2014-04-15 2014-07-23 福建星网视易信息系统有限公司 Method and device for removing accompaniment from song

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
任雪妮: "《语音相似度评价算法研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
刘杰: "《自动语种识别系统设计与实现》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
徐永华: "《基于GMM-UBM模型的语种识别》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
朱少雄: "《声纹识别系统与模式匹配算法研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
李丽娟: "《基于统计模型的说话人识别研究与实现》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
颜凯: "《基于高斯混合模型的说话人识别算法研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104464725A (en) * 2014-12-30 2015-03-25 福建星网视易信息系统有限公司 Method and device for singing imitation
CN104464725B (en) * 2014-12-30 2017-09-05 福建凯米网络科技有限公司 A method and apparatus mimic singing
CN105575393A (en) * 2015-12-02 2016-05-11 中国传媒大学 Personalized song recommendation method based on voice timbre
CN105554281A (en) * 2015-12-21 2016-05-04 联想(北京)有限公司 Information processing method and electronic device
CN105679324A (en) * 2015-12-29 2016-06-15 福建星网视易信息系统有限公司 Voiceprint identification similarity scoring method and apparatus
CN105679324B (en) * 2015-12-29 2019-03-22 福建星网视易信息系统有限公司 A kind of method and apparatus of Application on Voiceprint Recognition similarity score
CN106095925A (en) * 2016-06-12 2016-11-09 北京邮电大学 Individualized song recommending system based on vocal music characteristics
CN106095925B (en) * 2016-06-12 2018-07-03 北京邮电大学 Kind personalized recommendation based on vocal songs feature

Similar Documents

Publication Publication Date Title
Ming et al. Robust speaker recognition in noisy conditions
Xu et al. An experimental study on speech enhancement based on deep neural networks
Paliwal et al. Single-channel speech enhancement using spectral subtraction in the short-time modulation domain
Iseli et al. Age, sex, and vowel dependencies of acoustic measures related to the voice source
Dave Feature extraction methods LPC, PLP and MFCC in speech recognition
CN101676993B (en) Method and device for the artificial extension of the bandwidth of speech signals
Muda et al. Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques
Xu et al. A regression approach to speech enhancement based on deep neural networks
Mitra et al. Normalized amplitude modulation features for large vocabulary noise-robust speech recognition
Mashao et al. Combining classifier decisions for robust speaker identification
Erro et al. Harmonics plus noise model based vocoder for statistical parametric speech synthesis
Shrawankar et al. Techniques for feature extraction in speech recognition system: A comparative study
Magi et al. Stabilised weighted linear prediction
CN103236260B (en) Voice recognition system
CN101593522B (en) Method and equipment for full frequency domain digital hearing aid
CN101599271B (en) Recognition method of digital music emotion
CN102324232A (en) Voiceprint identification method based on Gauss mixing model and system thereof
Kawahara et al. Nearly defect-free F0 trajectory extraction for expressive speech modifications based on STRAIGHT
Cooke et al. Intelligibility-enhancing speech modifications: the hurricane challenge.
Tachibana et al. Melody line estimation in homophonic music audio signals based on temporal-variability of melodic source
CN102509547B (en) Method and system for voiceprint recognition based on vector quantization based
CN101004911B (en) Method and device for generating frequency bending function and carrying out frequency bending
CN103189913A (en) Method, apparatus and machine-readable storage medium for decomposing a multichannel audio signal
CN1719514A (en) High quality real time sound changing method based on speech sound analysis and synthesis
Dhingra et al. Isolated speech recognition using MFCC and DTW

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model
C41 Transfer of patent application or patent right or utility model
WD01