CN103811020B - A kind of intelligent sound processing method - Google Patents
A kind of intelligent sound processing method Download PDFInfo
- Publication number
- CN103811020B CN103811020B CN201410081493.6A CN201410081493A CN103811020B CN 103811020 B CN103811020 B CN 103811020B CN 201410081493 A CN201410081493 A CN 201410081493A CN 103811020 B CN103811020 B CN 103811020B
- Authority
- CN
- China
- Prior art keywords
- sound
- sound source
- microphone array
- signal
- sound pressure
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 13
- 238000000034 method Methods 0.000 claims abstract description 16
- 239000013598 vector Substances 0.000 claims description 97
- 239000011159 matrix material Substances 0.000 claims description 63
- 230000005236 sound signal Effects 0.000 claims description 54
- 238000009826 distribution Methods 0.000 claims description 50
- 239000000203 mixture Substances 0.000 claims description 22
- 238000005070 sampling Methods 0.000 claims description 18
- 238000000926 separation method Methods 0.000 claims description 16
- 238000001228 spectrum Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 7
- 230000003321 amplification Effects 0.000 claims description 6
- 230000007613 environmental effect Effects 0.000 claims description 6
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 6
- 238000005315 distribution function Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 230000009466 transformation Effects 0.000 claims description 4
- 238000000354 decomposition reaction Methods 0.000 claims description 3
- 230000003595 spectral effect Effects 0.000 claims 2
- 238000012163 sequencing technique Methods 0.000 claims 1
- 230000003993 interaction Effects 0.000 abstract description 3
- 230000010365 information processing Effects 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 35
- 238000010586 diagram Methods 0.000 description 22
- 239000012080 ambient air Substances 0.000 description 2
- 208000016354 hearing loss disease Diseases 0.000 description 2
- 238000003064 k means clustering Methods 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 206010011878 Deafness Diseases 0.000 description 1
- 208000032041 Hearing impaired Diseases 0.000 description 1
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000010370 hearing loss Effects 0.000 description 1
- 231100000888 hearing loss Toxicity 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000011867 re-evaluation Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Landscapes
- Circuit For Audible Band Transducer (AREA)
Abstract
本发明一种智能语音处理方法,属于信息处理技术领域,本发明通过建立对话人声音模型库,实现在多人语音环境下智能识别多个对话人的身份同时分离混合语音得到每个对话人的独立语音,根据用户需求为用户放大要听取的对话人的语音同时消除非用户要求的对话人的语音;与传统助听器不同,该方法可以根据用户个人需求从而自动为用户提供其所需的声音,减少了除噪音外的非目标人声的干扰,体现了该方法的个性化、互动化和智能化。
The invention relates to an intelligent speech processing method, which belongs to the technical field of information processing. The present invention realizes the intelligent identification of the identities of multiple interlocutors in a multi-person voice environment by establishing a voice model library of interlocutors, and at the same time separates the mixed voice to obtain the identity of each interlocutor. Independent voice, according to the user's needs, amplifies the voice of the interlocutor to be heard by the user and eliminates the voice of the interlocutor not required by the user; different from traditional hearing aids, this method can automatically provide the user with the desired voice according to the user's personal needs, The interference of non-target human voices except noise is reduced, reflecting the individuation, interaction and intelligence of the method.
Description
技术领域technical field
本发明属于信息处理技术领域,具体涉及一种智能语音处理方法。The invention belongs to the technical field of information processing, and in particular relates to an intelligent voice processing method.
背景技术Background technique
据2013年世界卫生组织(WHO)发布的最新评估数据显示,全球目前共有3.6亿人存在不同程度的听力障碍,占全球总人口的5%。助听产品的使用可以有效地补偿听力障碍患者的听力损失,提高他们的生活和工作质量。然而,当今助听系统相关技术的研究仍然集中在噪声抑制和源声音幅值放大两个方面,很少涉及到基于声音特征的建模和多声源自动分离技术。当实际应用场景非常复杂时,例如:聚会时,多个说话人同时发声,甚至是伴有音乐等背景声音,由于助听系统无法从混合后的声音输入中分离出感兴趣的声音对象,简单的声音强度扩大功能只能增加使用者的听力负担甚至伤害,不会带来有效的声音输入和理解。因此,针对当前助听系统的技术缺陷,设计一款具有特定声音对象识别功能的、更加智能化和个性化的新型助听系统,具有非常重要的意义。According to the latest assessment data released by the World Health Organization (WHO) in 2013, there are currently 360 million people in the world with hearing impairments of varying degrees, accounting for 5% of the total global population. The use of hearing aid products can effectively compensate for the hearing loss of hearing-impaired patients and improve their quality of life and work. However, the current research on hearing aid system technology still focuses on noise suppression and source sound amplitude amplification, and rarely involves sound feature-based modeling and multi-sound source automatic separation technology. When the actual application scenario is very complex, for example: at a party, multiple speakers speak at the same time, even accompanied by background sounds such as music, since the hearing aid system cannot separate the sound object of interest from the mixed sound input, it is simple The sound intensity expansion function of the mobile phone can only increase the hearing burden or even damage the user, and will not bring effective sound input and understanding. Therefore, aiming at the technical defects of the current hearing aid system, it is of great significance to design a new hearing aid system with specific sound object recognition function, which is more intelligent and personalized.
发明内容Contents of the invention
针对现有技术存在的不足,本发明提出一种智能语音处理方法,以达到保证用户根据自己的需求获得纯净的声音接收和放大,实现助听系统的智能化、互动化和个性化的目的。Aiming at the deficiencies of the existing technology, the present invention proposes an intelligent voice processing method to ensure that users can obtain pure voice reception and amplification according to their own needs, and realize the purpose of intelligence, interaction and personalization of the hearing aid system.
一种智能语音处理方法,包括以下步骤:A kind of intelligent speech processing method, comprises the following steps:
步骤1、采集样本语音段构建样本语音库,对样本语音进行特征提取,获得特征参数,并对特征参数进行训练;Step 1. Collect sample speech segments to build a sample speech library, perform feature extraction on the sample speech, obtain feature parameters, and train the feature parameters;
具体过程如下:The specific process is as follows:
步骤1-1、采集样本语音段,将采集的语音段进行离散化处理,提取语音信号的梅尔频率倒谱系数作为语音信号特征参数,并建立高斯混合模型;Step 1-1, collecting sample speech segments, discretizing the collected speech segments, extracting the Mel frequency cepstral coefficients of the speech signal as the speech signal characteristic parameters, and establishing a Gaussian mixture model;
模型公式如下:The model formula is as follows:
其中,p(XIG)表示样本语音特征参数在模型参数为G的模型中的概率;Wherein, p (XIG) represents the probability that the sample speech characteristic parameter is in the model of G in model parameter;
G表示高斯混合模型参数集,G={pi,μi,∑i},i=1,2,...,I;G represents the Gaussian mixture model parameter set, G={p i , μ i , ∑ i }, i=1, 2,..., I;
I表示高斯混合模型中单一高斯模型个数;I represents the number of single Gaussian models in the Gaussian mixture model;
pi表示第i个单一高斯模型的权重系数, p i represents the weight coefficient of the i-th single Gaussian model,
μi表示第i个单一高斯模型的均值矢量;μ i represents the mean vector of the i-th single Gaussian model;
∑i表示第i个单一高斯模型的协方差矩阵;∑ i represents the covariance matrix of the i-th single Gaussian model;
X表示样本语音特征参数,X={x1,x2,...,xT},T表示特征向量的个数;X represents the sample speech feature parameter, X={x 1 , x 2 ,..., x T }, and T represents the number of feature vectors;
bi(X)表示第i个单一高斯模型的密度函数,bi(X)=N(μi,∑i),N(.)表示标准高斯分布的密度函数;b i (X) represents the density function of the i-th single Gaussian model, b i (X)=N(μ i , ∑ i ), N(.) represents the density function of the standard Gaussian distribution;
步骤1-2、利用语音信号特征参数训练高斯混合模型;Step 1-2, utilize speech signal feature parameter training Gaussian mixture model;
即采用k均值聚类算法对语音信号特征参数进行聚类,获得高斯混合模型参数集初始值G0={pi 0,μi 0,∑i 0},i=1,2,...,I;并根据获得的高斯混合模型参数集初始值,采用最大期望算法对模型进行估计,进而获得高斯混合模型参数,即完成特征参数的训练;That is, the k-means clustering algorithm is used to cluster the characteristic parameters of the speech signal, and the initial value of the Gaussian mixture model parameter set G 0 = {p i 0 , μ i 0 , ∑ i 0 }, i=1, 2,... , I; and according to the initial value of the obtained Gaussian mixture model parameter set, adopt the maximum expectation algorithm to estimate the model, and then obtain the Gaussian mixture model parameters, that is, complete the training of the characteristic parameters;
步骤2、采用M个麦克风组成的麦克风阵列采集被测环境音频信号,确定该环境声音源个数和每个声音源波束到达的方向,即声源到麦克风阵列的入射角度;Step 2, adopting a microphone array composed of M microphones to collect the measured environmental audio signal, determining the number of environmental sound sources and the direction of arrival of each sound source beam, that is, the incident angle of the sound source to the microphone array;
具体过程如下:The specific process is as follows:
步骤2-1、采用M个麦克风组成的麦克风阵列采集被测环境的混合音频信号,并对采集的混合音频信号进行离散化处理,获得每个采样点的幅值;Step 2-1, using a microphone array composed of M microphones to collect the mixed audio signal of the environment under test, and discretize the collected mixed audio signal to obtain the amplitude of each sampling point;
步骤2-2、将每个采样点的幅值进行矩阵化,获得每个麦克风采集到的混合音频矩阵;上述混合音频矩阵的列数为一,行数为采样点个数,矩阵中元素为每个采样点的幅值;Step 2-2, matrix the amplitude of each sampling point to obtain the mixed audio matrix collected by each microphone; the number of columns of the above mixed audio matrix is one, the number of rows is the number of sampling points, and the elements in the matrix are The amplitude of each sampling point;
步骤2-3、根据每个麦克风采集到的混合音频矩阵和麦克风个数,获得被测环境的混合音频信号的矢量协方差矩阵的估计值;Step 2-3, according to the mixed audio matrix collected by each microphone and the number of microphones, obtain the estimated value of the vector covariance matrix of the mixed audio signal of the tested environment;
矢量协方差矩阵的估计值公式如下:The formula for estimating the vector covariance matrix is as follows:
其中,Rxx表示被测环境的混合音频信号的矢量协方差矩阵的估计值;Wherein, R xx represents the estimated value of the vector covariance matrix of the mixed audio signal of tested environment;
X(m)表示第m个麦克风采集到的混合音频矩阵;X(m) represents the mixed audio matrix collected by the mth microphone;
XH(m)表示第m个麦克风采集到的混合音频矩阵的转置矩阵;X H (m) represents the transpose matrix of the mixed audio matrix collected by the m microphone;
步骤2-4、对矢量协方差矩阵的估计值进行特征值分解,获得特征值,并对特征值从大到小进行排序,确定特征值大于阈值的个数,即为声音源的个数;Step 2-4, performing eigenvalue decomposition on the estimated value of the vector covariance matrix, obtaining eigenvalues, and sorting the eigenvalues from large to small, and determining the number of eigenvalues greater than the threshold value, which is the number of sound sources;
步骤2-5、将麦克风个数减去声音源个数获得噪音源个数,进而对应获得噪音矩阵;Step 2-5, subtracting the number of sound sources from the number of microphones to obtain the number of noise sources, and then correspondingly obtain a noise matrix;
步骤2-6、根据各个麦克风与阵列中心之间的距离、混合音频信号的波长、麦克风对于阵列中心的方向角度和声音源的波束到达方向获得麦克风阵列的导向矢量,再根据噪音矩阵和麦克风阵列的导向矢量获得混合音频信号的角度谱函数;Step 2-6, according to the distance between each microphone and the array center, the wavelength of the mixed audio signal, the direction angle of the microphone to the array center and the beam arrival direction of the sound source to obtain the steering vector of the microphone array, and then according to the noise matrix and the microphone array The steering vector obtains the angular spectrum function of the mixed audio signal;
混合音频信号的角度谱函数公式如下:The angular spectrum function formula of the mixed audio signal is as follows:
其中,P(θ)表示混合音频信号的角度谱函数;Wherein, P(θ) represents the angular spectrum function of the mixed audio signal;
α(θ)表示麦克风阵列的导向矢量,α(θ)=(α1(θ),...,αm(θ),...,αM(θ)),其中,j表示虚数单位,k=2π/λ,λ表示混合音频信号的波长,dm表示第m个麦克风与阵列中心的距离,表示第m个麦克风对于阵列中心的方向角度;α(θ) represents the steering vector of the microphone array, α(θ)=(α 1 (θ), ..., α m (θ), ..., α M (θ)), where, j represents the imaginary number unit, k=2π/λ, λ represents the wavelength of the mixed audio signal, d m represents the distance between the mth microphone and the center of the array, Indicates the direction angle of the mth microphone to the center of the array;
θ表示声音源的波束到达方向;θ represents the beam arrival direction of the sound source;
αH(θ)表示麦克风阵列的导向矢量的转置矩阵;α H (θ) represents the transpose matrix of the steering vector of the microphone array;
Vu表示噪音矩阵;V u represents the noise matrix;
VH u表示噪音矩阵的转置矩阵;V H u represents the transpose matrix of the noise matrix;
步骤2-7、根据混合音频信号的角度谱函数的波形,由大到小选取该波形的多个峰值,选择峰值的个数即为声音源的个数;Step 2-7, according to the waveform of the angle spectrum function of the mixed audio signal, select a plurality of peaks of the waveform from large to small, and the number of selected peaks is the number of sound sources;
步骤2-8、确定选取峰值对应的角度值,即获得每个声音源的波束到达方向;Steps 2-8, determine the angle value corresponding to the selected peak value, that is, obtain the beam arrival direction of each sound source;
步骤3、根据每个声音源的音频信号、声音源与麦克风之间的转换关系,获得麦克风接收到的麦克风阵列声压、麦克风阵列水平方向声压梯度和麦克风阵列垂直方向的声压梯度;Step 3, according to the audio signal of each sound source, the conversion relationship between the sound source and the microphone, obtain the sound pressure of the microphone array received by the microphone, the sound pressure gradient in the horizontal direction of the microphone array, and the sound pressure gradient in the vertical direction of the microphone array;
麦克风阵列声压信号公式如下:The formula of the sound pressure signal of the microphone array is as follows:
其中,pw(t)表示t时刻麦克风阵列声压;Among them, p w (t) represents the sound pressure of the microphone array at time t;
N表示声音源个数;N represents the number of sound sources;
t表示时间;t means time;
sn(t)表示第n个声音源的音频信号;s n (t) represents the audio signal of the nth sound source;
hmn(t)表示第n个声音源与第m个麦克风之间的转换矩阵,hmn(t)=p0(t)αm(θn(t)),p0(t)表示t时刻由声波造成的麦克风阵列中心声压;αm(θn(t))表示在t时刻第m个麦克风关于第n个声音源的导向矢量,其中,θn(t)表示t时刻第n个声音源的波束到达方向;h mn (t) represents the transformation matrix between the nth sound source and the mth microphone, h mn (t) = p 0 (t)α m (θ n (t)), p 0 (t) represents t The sound pressure at the center of the microphone array caused by sound waves at time; α m (θ n (t)) represents the steering vector of the mth microphone with respect to the n The beam arrival direction of a sound source;
麦克风阵列水平方向声压梯度公式如下:The formula for the sound pressure gradient in the horizontal direction of the microphone array is as follows:
其中,px(t)表示麦克风阵列水平方向声压梯度;Wherein, p x (t) represents the sound pressure gradient in the horizontal direction of the microphone array;
麦克风阵列垂直方向的声压梯度公式如下:The sound pressure gradient formula in the vertical direction of the microphone array is as follows:
其中,py(t)表示麦克风阵列垂直方向的声压梯度;Wherein, p y (t) represents the sound pressure gradient in the vertical direction of the microphone array;
步骤4、采用傅里叶变换将麦克风阵列中心声压、麦克风阵列水平方向声压梯度和麦克风阵列垂直方向的声压梯度从时域转换到频域;Step 4, using Fourier transform to convert the sound pressure at the center of the microphone array, the sound pressure gradient in the horizontal direction of the microphone array, and the sound pressure gradient in the vertical direction of the microphone array from the time domain to the frequency domain;
步骤5、根据频域内的麦克风阵列声压、麦克风阵列水平方向梯度和麦克风阵列垂直方向声压梯度,获得频率域内的声压信号的强度矢量公式,进而推导出强度矢量方向;Step 5, according to the microphone array sound pressure in the frequency domain, the microphone array horizontal direction gradient and the microphone array vertical direction sound pressure gradient, obtain the intensity vector formula of the sound pressure signal in the frequency domain, and then deduce the intensity vector direction;
频率域内的声压信号的强度矢量公式为:The intensity vector formula of the sound pressure signal in the frequency domain is:
其中,I(ω,t)表示频率域内的声压信号的强度矢量;Wherein, I (ω, t) represents the intensity vector of the sound pressure signal in the frequency domain;
p0表示被测环境空气密度;p 0 represents the measured ambient air density;
c表示声速;c represents the speed of sound;
Re[.]表示取复数实部;Re[.] means to take the real part of a complex number;
pw *(ω,t)表示频域内的麦克风阵列声压的共轭矩阵;p w * (ω, t) represents the conjugate matrix of the sound pressure of the microphone array in the frequency domain;
px(ω,t)表示频域内的麦克风阵列水平方向声压梯度;p x (ω, t) represents the sound pressure gradient in the horizontal direction of the microphone array in the frequency domain;
py(ω,t)表示频域内的麦克风阵列垂直方向声压梯度;p y (ω, t) represents the sound pressure gradient in the vertical direction of the microphone array in the frequency domain;
ux表示横坐标轴方向单位矢量;u x represents the unit vector in the direction of the abscissa axis;
uy表示纵坐标轴方向单位矢量;u y represents the unit vector in the direction of the ordinate axis;
强度矢量方向公式如下:The intensity vector direction formula is as follows:
其中,γ(ω,t)表示麦克风阵列接收到的混合声音的声压信号的强度矢量方向;Wherein, γ (ω, t) represents the intensity vector direction of the sound pressure signal of the mixed sound received by the microphone array;
步骤6、对强度矢量方向进行统计获得其概率密度分布,采用混合冯米修斯分布进行拟合,获得语音强度矢量方向服从混合冯米修斯分布的模型参数,进而得到每个声压信号的强度矢量方向函数;Step 6. Perform statistics on the intensity vector direction to obtain its probability density distribution, and use the mixed von Misius distribution for fitting to obtain the model parameters that the voice intensity vector direction obeys the mixed von Misius distribution, and then obtain the intensity vector direction function of each sound pressure signal ;
具体过程如下:The specific process is as follows:
步骤6-1、对强度矢量方向进行统计获得其概率密度分布,采用混合冯米修斯分布进行拟合,获得语音的强度矢量方向服从的混合冯米修斯分布的模型参数集;Step 6-1. Perform statistics on the direction of the intensity vector to obtain its probability density distribution, and use a mixed von Misius distribution for fitting to obtain a model parameter set of the mixed von Misius distribution that the intensity vector direction of the speech obeys;
所述的混合冯米修斯分布模型公式如下:The formula of the mixed von Misius distribution model is as follows:
其中,表示混合冯米修斯分布概率密度;in, Represents the mixed von Misius distribution probability density;
表示混合声音方向角度; Indicates the mixed sound direction angle;
αn表示第n个声音源的声压信号的强度矢量方向函数的权重;α n represents the weight of the intensity vector direction function of the sound pressure signal of the nth sound source;
其中,I0(kn)表示第n个声音源对应的一阶修正贝塞尔函数,kn表示第n个声音源声压信号的强度矢量方向服从的单一冯米修斯分布对应的浓度参数,即冯米修斯分布的方差的倒数; Among them, I 0 (k n ) represents the first-order modified Bessel function corresponding to the nth sound source, and k n represents the concentration parameter corresponding to the single von Misius distribution that the intensity vector direction of the sound pressure signal of the nth sound source obeys, That is, the reciprocal of the variance of the von Misius distribution;
混合冯米修斯分布函数参数集如下:The parameter set of the mixed von Misius distribution function is as follows:
Γ={αn,kn},i=1,...,N(11)Γ={α n , k n }, i=1, . . . , N(11)
步骤6-2、初始化模型参数,获得初始函数参数集;Step 6-2, initialize the model parameters, and obtain the initial function parameter set;
步骤6-3、根据获得的初始模型参数,采用最大期望算法估计得到混合冯米修斯分布模型的参数;Step 6-3, according to the obtained initial model parameters, use the maximum expectation algorithm to estimate the parameters of the mixed von Misius distribution model;
步骤6-4、根据估计得到的混合冯米修斯分布模型参数,求得每个声压信号的强度矢量方向函数;Step 6-4, obtain the intensity vector direction function of each sound pressure signal according to the estimated mixed von Misius distribution model parameters;
声压信号的强度矢量方向函数公式如下:The formula of the intensity vector direction function of the sound pressure signal is as follows:
其中,表示第n个声音源的强度矢量方向函数;in, Represents the intensity vector direction function of the nth sound source;
步骤7、根据得到的每个声压信号的强度矢量方向函数和麦克风阵列声压,获得每个声音源在频率域信号,并采用傅里叶反变换将该频域中的每个声源信号转换为时域内的声源信号;Step 7, according to the obtained intensity vector direction function of each sound pressure signal and the sound pressure of the microphone array, obtain the signal of each sound source in the frequency domain, and use the inverse Fourier transform to convert each sound source signal in the frequency domain Convert to a sound source signal in the time domain;
每个声音源在频域中的信号公式如下:The signal formula for each sound source in the frequency domain is as follows:
其中,表示混合语音分离后得到的第n个声源信号的频率域信号;in, Represents the frequency domain signal of the nth sound source signal obtained after the mixed speech separation;
将经过傅里叶反变换得到时域信号 Will The time domain signal is obtained by inverse Fourier transform
步骤8、计算每个声音源信号与样本语音库中指定声音源的匹配概率,选择概率值最大的声音源为目标声音源,保留该声音源信号,删除其他非目标声音源;Step 8, calculate the matching probability of each sound source signal and the specified sound source in the sample speech library, select the sound source with the largest probability value as the target sound source, keep the sound source signal, and delete other non-target sound sources;
每个声音源信号与样本语音库中指定声音源的匹配概率公式如下:The matching probability formula of each sound source signal and the specified sound source in the sample speech library is as follows:
式中:表示由分离后语音提取的语音特征参数,即提取语音的梅尔频率倒谱系数作为语音的特征参数;In the formula: voice after separation The extracted speech feature parameters, that is, the extracted speech Mel frequency cepstral coefficients as speech The characteristic parameters;
表示第n个声音源信号与样本语音库中指定声音源的匹配概率; Indicates the matching probability of the nth sound source signal and the specified sound source in the sample speech library;
Gc表示用户指定人的声音模型参数;G c represents the voice model parameter of the user designated person;
表示分离后语音属于用户指定人声音的概率; Indicates the probability that the voice after separation belongs to the voice of the user-designated person;
步骤9、对保留的声音源信号进行放大,即完成在被测环境中对指定声音源的放大。Step 9, amplifying the retained sound source signal, that is, completing the amplification of the specified sound source in the tested environment.
步骤2-4所述的阈值取值范围为10-2~10-16。The threshold described in step 2-4 ranges from 10 -2 to 10 -16 .
步骤6-1所述的αn取0~1内的随机数,且满足kn取1~700内的随机数。The α n described in step 6-1 is a random number between 0 and 1, and satisfies k n is a random number from 1 to 700.
本发明优点:Advantage of the present invention:
本发明一种智能语音处理方法,通过建立对话人声音模型库,实现在多人语音环境下智能识别多个对话人的身份同时分离混合语音得到每个对话人的独立语音,根据用户需求为用户放大要听取的对话人的语音同时消除非用户要求的对话人的语音;与传统助听器不同,该方法可以根据用户个人需求从而自动为用户提供其所需的声音,减少了除噪音外的非目标人声的干扰,体现了该方法的个性化、互动化和智能化。An intelligent voice processing method of the present invention, by establishing a voice model library of interlocutors, realizes the intelligent recognition of the identities of multiple interlocutors in a multi-person voice environment, and simultaneously separates and mixes the voices to obtain the independent voice of each interlocutor, and provides the user with a voice according to user needs. Amplify the voice of the interlocutor to be heard while eliminating the voice of the interlocutor who is not required by the user; unlike traditional hearing aids, this method can automatically provide the user with the sound they need according to the user's personal needs, reducing non-targets except noise The interference of human voice reflects the personalization, interaction and intelligence of the method.
附图说明Description of drawings
图1为本发明一种实施例的智能语音处理方法流程图;Fig. 1 is the flow chart of the intelligent voice processing method of an embodiment of the present invention;
图2为本发明一种实施例的建模声音源数据示意图,其中,图(a)表示第一个人的声音Fig. 2 is a schematic diagram of modeling sound source data of an embodiment of the present invention, wherein, figure (a) represents the sound of the first person
数据示意图,图(b)表示第二个人的声音数据示意图,图(c)表示第三个人的声音数据示意图;Data schematic diagram, figure (b) represents the voice data schematic diagram of the second person, and figure (c) represents the voice data schematic diagram of the third person;
图3为本发明一种实施例用于声音混合的声音源数据示意图,其中,图(a)表示第一声音源的数据示意图,图(b)表示第二声音源的数据示意图,图(c)表示第三声音源的数据示意图;Fig. 3 is a schematic diagram of sound source data used for sound mixing in an embodiment of the present invention, wherein, figure (a) represents the data schematic diagram of the first sound source, figure (b) represents the data schematic diagram of the second sound source, and figure (c) ) represents the data schematic diagram of the third sound source;
图4为本发明一种实施例的麦克风阵列示意图;Fig. 4 is a schematic diagram of a microphone array according to an embodiment of the present invention;
图5为本发明一种实施例的四个麦克风接收到的数据示意图,其中,图(a)表示第一个麦克风接收到的混合声音信号示意图,图(b)表示第二个麦克风接收到的混合声音信号示意图,图(c)表示第三个麦克风接收到的混合声音信号示意图,图(d)表示第四个麦克风接收到的混合声音信号示意图;Fig. 5 is a schematic diagram of data received by four microphones in an embodiment of the present invention, wherein, figure (a) represents a schematic diagram of the mixed sound signal received by the first microphone, and figure (b) represents the signal received by the second microphone Mixed sound signal schematic diagram, figure (c) represents the mixed sound signal schematic diagram that the 3rd microphone receives, and figure (d) represents the mixed sound signal schematic diagram that the 4th microphone receives;
图6为本发明一种实施例的四个麦克风接收到的数据采样后的示意图,其中,图(a)表示第一个麦克风接收到的混合声音信号采样后示意图,图(b)表示第二个麦克风接收到的混合声音信号采样后示意图,图(c)表示第三个麦克风接收到的混合声音信号采样后示意图,图(d)表示第四个麦克风接收到的混合声音信号采样后示意图;Fig. 6 is the schematic diagram of the data sampling received by four microphones of an embodiment of the present invention, wherein, figure (a) represents the schematic diagram of the mixed sound signal received by the first microphone after sampling, and figure (b) represents the second Schematic diagram after the mixed sound signal sampling that a microphone receives, figure (c) represents the schematic diagram after the mixed sound signal sampling that the 3rd microphone receives, and figure (d) represents the schematic diagram after the mixed sound signal sampling that the 4th microphone receives;
图7为本发明一种实施例的混合信号的空间谱估计示意图;FIG. 7 is a schematic diagram of spatial spectrum estimation of a mixed signal according to an embodiment of the present invention;
图8为本发明一种实施例的混合声音矢量方向分布概率密度图;Fig. 8 is a probability density diagram of the direction distribution of mixed sound vectors according to an embodiment of the present invention;
图9为本发明一种实施例的极大似然估计混合冯米修斯模型示意图;Fig. 9 is a schematic diagram of a maximum likelihood estimation mixed von Misius model according to an embodiment of the present invention;
图10为本发明一种实施例的理想语音与分离后得到语音对比图,其中,图(a)为第一声音源的原始声音信号,图(b)为分离后第一声音源的原始声音信号,图(c)为第二声音源的原始声音信号,图(d)为分离后第二声音源的原始声音信号,图(e)为第三声音源的原始声音信号,图(f)为分离后第三声音源的原始声音信号。Fig. 10 is the ideal speech of an embodiment of the present invention and obtains the speech contrast chart after separation, and wherein, figure (a) is the original sound signal of the first sound source, and figure (b) is the original sound of the first sound source after separation Signal, figure (c) is the original sound signal of the second sound source, figure (d) is the original sound signal of the second sound source after separation, figure (e) is the original sound signal of the third sound source, figure (f) is the original sound signal of the third sound source after separation.
具体实施方式detailed description
下面结合附图对本发明一种实施例做进一步说明。An embodiment of the present invention will be further described below in conjunction with the accompanying drawings.
本发明实施例中,模型系统主要分为语音建模模块和语音动态实时处理模块两个模块,其中语音建模模块实现说话人语音建模,语音动态实时处理模块实现复杂语音环境下,混合人声的方向定位与分离,混合语音识别与提取(即目标声音的提取放大和其余声音的屏蔽)。In the embodiment of the present invention, the model system is mainly divided into two modules: a speech modeling module and a speech dynamic real-time processing module, wherein the speech modeling module realizes speaker speech modeling, and the speech dynamic real-time processing module realizes mixed human Direction localization and separation of sound, hybrid speech recognition and extraction (that is, extraction and amplification of the target sound and shielding of other sounds).
一种智能语音处理方法,方法流程图如图1所示,包括以下步骤:A kind of intelligent speech processing method, method flowchart as shown in Figure 1, comprises the following steps:
步骤1、采集样本语音段构建样本语音库,对样本语音进行特征提取,获得特征参数,并对特征参数进行训练;具体过程如下:Step 1. Collect sample speech segments to build a sample speech library, perform feature extraction on sample speech, obtain feature parameters, and train feature parameters; the specific process is as follows:
步骤1-1、在安静的室内环境录制样本语音段,将采集的语音段进行离散化处理,提取语音信号的梅尔频率倒谱系数(MFCC)作为语音信号特征参数,并建立高斯混合模型;Step 1-1, recording the sample speech segment in a quiet indoor environment, discretizing the collected speech segment, extracting the Mel frequency cepstral coefficient (MFCC) of the speech signal as the speech signal characteristic parameter, and establishing a Gaussian mixture model;
本发明实施例中,采用windows自带录音机分别录制3个人的语音,每个人录制2段,其中1段用于声音分离与识别,另外1段用于说话人语音建模,设置目标声音源为第一号声音源;如图2中图(a)至图(c)所示,分别取三个人的一段语音,为其建立高斯混合模型,并将得到的模型参数存入模型库中。In the embodiment of the present invention, adopt Windows self-contained recorder to record the voices of 3 people respectively, each person records 2 sections, wherein 1 section is used for sound separation and recognition, and the other 1 section is used for speaker's voice modeling, and the target sound source is set as No. 1 sound source; as shown in Figure 2 (a) to (c), take a piece of speech from three people respectively, establish a Gaussian mixture model for it, and store the obtained model parameters in the model library.
模型公式如下:The model formula is as follows:
其中,p(XIG)表示样本语音特征参数在模型参数为G的模型中的概率;Wherein, p (XIG) represents the probability that the sample speech characteristic parameter is in the model of G in model parameter;
G表示高斯混合模型参数集,G={pi,μi,∑i},i=1,2,...,I;G represents the Gaussian mixture model parameter set, G={p i , μ i , ∑ i }, i=1, 2,..., I;
I表示高斯混合模型中单一高斯模型个数;I represents the number of single Gaussian models in the Gaussian mixture model;
pi表示第i个单一高斯模型的权重系数, p i represents the weight coefficient of the i-th single Gaussian model,
μi表示第i个单一高斯模型的均值矢量;μ i represents the mean vector of the i-th single Gaussian model;
∑i表示第i个单一高斯模型的协方差矩阵;∑ i represents the covariance matrix of the i-th single Gaussian model;
X表示样本语音特征参数,X={x1,x2,...,xT},T表示特征向量的个数;X represents the sample speech feature parameter, X={x 1 , x 2 ,..., x T }, and T represents the number of feature vectors;
bi(X)表示第i个单一高斯模型的密度函数,bi(X)=N(μi,∑i),N(.)表示标准高斯分布的密度函数;b i (X) represents the density function of the i-th single Gaussian model, b i (X)=N(μ i , ∑ i ), N(.) represents the density function of the standard Gaussian distribution;
步骤1-2、利用语音信号特征参数训练高斯混合模型;Step 1-2, utilize speech signal feature parameter training Gaussian mixture model;
即采用k均值聚类算法对语音信号特征参数进行聚类,获得高斯混合模型参数集初始值G0={pi 0,μi 0,∑i 0},i=1,2,...,I;That is, the k-means clustering algorithm is used to cluster the characteristic parameters of the speech signal, and the initial value of the Gaussian mixture model parameter set G 0 = {p i 0 , μ i 0 , ∑ i 0 }, i=1, 2,... , I;
本实例中采用16个单一高斯模型组成高斯混合模型。随机产生16个向量作为聚类中心,每个向量长度为语音帧数,将每帧的特征参数按最小距离准则分配到16个聚类中心中的某一个,然后重新计算每个聚类中心向量的中心值,将其作为新的聚类中心,直到算法收敛计算结束,此时得到的聚类中心就是初始高斯混合模型均值参数μi 0,求特征参数协方差获得初始∑i 0,pi 0则初始取值都为 In this example, 16 single Gaussian models are used to form a Gaussian mixture model. Randomly generate 16 vectors as cluster centers, the length of each vector is the number of speech frames, assign the feature parameters of each frame to one of the 16 cluster centers according to the minimum distance criterion, and then recalculate each cluster center vector Take it as the new clustering center until the algorithm converges and calculates. At this time, the obtained clustering center is the initial Gaussian mixture model mean parameter μ i 0 . Calculate the characteristic parameter covariance to obtain the initial ∑ i 0 , p i 0 , the initial value is
采用最大期望算法对模型进行估计,其原则就是观测值出现的概率最大,通过分别对模型函数关于参数pi 0,μi 0,∑i 0求导等于零计算参数pi,μi,∑i的重估值,直到算法收敛计算结束,此时即完成特征参数的训练。The maximum expectation algorithm is used to estimate the model. The principle is that the probability of occurrence of the observed value is the largest. By deriving the model function with respect to the parameters p i 0 , μ i 0 , ∑ i 0 respectively, the parameters p i , μ i , ∑ i are equal to zero. The revaluation of the algorithm until the end of the algorithm convergence calculation, at this time, the training of the characteristic parameters is completed.
步骤2、采用4个麦克风组成的麦克风阵列采集被测环境音频信号,确定该环境声音源个数和每个声音源波束到达的方向,即声源到麦克风阵列的入射角度;Step 2, adopting a microphone array composed of 4 microphones to collect the measured environmental audio signal, determining the number of the environmental sound sources and the arrival direction of each sound source beam, that is, the incident angle of the sound source to the microphone array;
具体过程如下:The specific process is as follows:
步骤2-1、采用4个麦克风组成的麦克风阵列采集被测环境音频信号,并对采集的混合音频信号进行离散化处理,获得每个采样点的幅值;Step 2-1, using a microphone array composed of 4 microphones to collect the measured environmental audio signal, and discretizing the collected mixed audio signal to obtain the amplitude of each sampling point;
本发明实施例中,如图3中图(a)至图(c)所示,分别取三个人的另一段语音作为混合音频的声音数据源,采用4个麦克风,该4个麦克风组成的阵列如图4所示,一号麦克风与二号麦克风以阵列中心为中心对称分布于水平方向两侧,三号与四号麦克风以阵列中心为中心对称分布于垂直方向两侧;4个麦克风接收的混合数据如图5中图(a)至图(d)所示,对4个麦克风接收的语音进行离散化处理,离散化的频率为12500Hz,并确定每个采样点的幅值,如图6中图(a)至图(d)所示。In the embodiment of the present invention, as shown in Figures (a) to (c) in Figure 3, another section of voice of three people is respectively taken as the sound data source of the mixed audio, and four microphones are used, and the array formed by the four microphones is As shown in Figure 4, the No. 1 microphone and the No. 2 microphone are symmetrically distributed on both sides of the horizontal direction around the center of the array, and the No. 3 and No. 4 microphones are symmetrically distributed on both sides of the vertical direction around the center of the array; The mixed data is shown in Figures (a) to (d) in Figure 5, discretize the speech received by the four microphones, the discretization frequency is 12500Hz, and determine the amplitude of each sampling point, as shown in Figure 6 Figures (a) to (d) shown in the middle.
步骤2-2、将每个采样点的幅值进行矩阵化,获得每个麦克风采集到的混合音频矩阵;上述混合音频矩阵的列数为一,行数为采样点个数,矩阵中元素为每个采样点的幅值;Step 2-2, matrix the amplitude of each sampling point to obtain the mixed audio matrix collected by each microphone; the number of columns of the above mixed audio matrix is one, the number of rows is the number of sampling points, and the elements in the matrix are The amplitude of each sampling point;
步骤2-3、根据每个麦克风采集到的混合音频矩阵和麦克风个数,获得被测环境的混合音频信号的矢量协方差矩阵的估计值;Step 2-3, according to the mixed audio matrix collected by each microphone and the number of microphones, obtain the estimated value of the vector covariance matrix of the mixed audio signal of the tested environment;
矢量协方差矩阵的估计值公式如下:The formula for estimating the vector covariance matrix is as follows:
其中,Rxx表示被测环境的混合音频信号的矢量协方差矩阵的估计值;Wherein, R xx represents the estimated value of the vector covariance matrix of the mixed audio signal of tested environment;
X(m)表示第m个麦克风采集到的混合音频矩阵;X(m) represents the mixed audio matrix collected by the mth microphone;
XH(m)表示第m个麦克风采集到的混合音频矩阵的转置矩阵;XH(m) represents the transpose matrix of the mixed audio matrix collected by the mth microphone;
步骤2-4、本实例中,对矢量协方差矩阵的估计值进行特征值分解,获得特征值[0.00000.01900.03630.1128],并对特征值从大到小进行排序,与阈值10-7比较,即获得3个特征值,因此声音源个数为3;Step 2-4. In this example, perform eigenvalue decomposition on the estimated value of the vector covariance matrix to obtain the eigenvalue [0.00000.01900.03630.1128], and sort the eigenvalues from large to small, and compare with the threshold 10 -7 , that is, get 3 eigenvalues, so the number of sound sources is 3;
步骤2-5、将麦克风个数减去声音源个数获得噪音源个数,进而对应获得噪音矩阵;Step 2-5, subtracting the number of sound sources from the number of microphones to obtain the number of noise sources, and then correspondingly obtain a noise matrix;
本发明实施例中,把与声音源个数3相等的特征值和对应的特征向量看作信号部分空间,剩下的4-3,即1个特征值和特征向量看作噪声部分空间,即噪音源个数为1,根据噪声特征值对应的元素可以得到噪声矩阵In the embodiment of the present invention, the eigenvalues and corresponding eigenvectors equal to the number of sound sources 3 are regarded as the signal part space, and the remaining 4-3, that is, 1 eigenvalue and eigenvector are regarded as the noise part space, namely The number of noise sources is 1, and the noise matrix can be obtained according to the elements corresponding to the noise eigenvalues
Vu=[-0.1218-0.4761i-0.1564+0.4659i-0.5070-0.0374i-0.5084];V u =[-0.1218-0.4761i-0.1564+0.4659i-0.5070-0.0374i-0.5084];
步骤2-6、根据各个麦克风与阵列中心之间的距离、混合音频信号的波长、麦克风对于阵列中心的方向角度和声音源的波束到达方向获得麦克风阵列的导向矢量,再根据噪音矩阵和麦克风阵列的导向矢量获得混合音频信号的角度谱函数;Step 2-6, according to the distance between each microphone and the array center, the wavelength of the mixed audio signal, the direction angle of the microphone to the array center and the beam arrival direction of the sound source to obtain the steering vector of the microphone array, and then according to the noise matrix and the microphone array The steering vector obtains the angular spectrum function of the mixed audio signal;
如图4所示,各个麦克风与阵列中心的距离都为0.02m;本发明实施例中,混合音频信号的波长为30000;一号麦克风对于阵列中心的方向角度为0°,二号麦克风对于阵列中心的方向角度为180°,三号麦克风对于阵列中心的方向角度为90°,一号麦克风对于阵列中心的方向角度为270°;As shown in Figure 4, the distance between each microphone and the array center is 0.02m; in the embodiment of the present invention, the wavelength of the mixed audio signal is 30000; The directional angle of the center is 180°, the directional angle of the No. 3 microphone to the array center is 90°, and the directional angle of the No. 1 microphone to the array center is 270°;
混合音频信号的角度谱函数公式如下:The angular spectrum function formula of the mixed audio signal is as follows:
其中,P(θ)表示混合音频信号的角度谱函数;Wherein, P(θ) represents the angular spectrum function of the mixed audio signal;
α(θ)表示麦克风阵列的导向矢量,α(θ)=(α1(θ),α2(θ),α3(θ),α4(θ)),其中,α1(θ)=ejk0.02cos(0°-θ),α2(θ)=ejk002cos(180°-θ),α3(θ)=ejk002cos(90°-θ),α4(θ)=ejk002cos(270°-θ),j表示虚数单位,k=2π/λ,λ表示混合音频信号的波长;α(θ) represents the steering vector of the microphone array, α(θ)=(α 1 (θ), α 2 (θ), α 3 (θ), α 4 (θ)), where α 1 (θ)= e jk0.02cos(0°-θ) ,α 2 (θ)=e jk002cos(180°-θ) ,α 3 (θ)=e jk002cos(90°-θ) ,α 4 (θ)=e jk002cos( 270°-θ) , j represents the imaginary number unit, k=2π/λ, and λ represents the wavelength of the mixed audio signal;
θ表示声音源的波束到达方向;θ represents the beam arrival direction of the sound source;
αH(θ)表示麦克风阵列的导向矢量的转置矩阵;α H (θ) represents the transpose matrix of the steering vector of the microphone array;
Vu表示噪音矩阵;V u represents the noise matrix;
VH u表示噪音矩阵的转置矩阵;V H u represents the transpose matrix of the noise matrix;
步骤2-7、根据混合音频信号的角度谱函数的波形,由大到小选取该波形的多个峰值,选择峰值的个数即为声音源的个数;Step 2-7, according to the waveform of the angle spectrum function of the mixed audio signal, select a plurality of peaks of the waveform from large to small, and the number of selected peaks is the number of sound sources;
步骤2-8、确定选取峰值对应的角度值,即获得每个声音源的波束到达方向;Steps 2-8, determine the angle value corresponding to the selected peak value, that is, obtain the beam arrival direction of each sound source;
如图7所示,混合音频信号的角度谱函数P(θ)的波形,得到该混合声音中存在的3个声音源的波束到达方向分别为[50°,200°,300°]。As shown in Figure 7, the waveform of the angular spectrum function P(θ) of the mixed audio signal, the beam arrival directions of the three sound sources in the mixed sound are [50°, 200°, 300°] respectively.
步骤3、根据每个声音源的音频信号、声音源与麦克风之间的转换关系,获得麦克风接收到的麦克风阵列声压、麦克风阵列水平方向声压梯度和麦克风阵列垂直方向的声压梯度;Step 3, according to the audio signal of each sound source, the conversion relationship between the sound source and the microphone, obtain the sound pressure of the microphone array received by the microphone, the sound pressure gradient in the horizontal direction of the microphone array, and the sound pressure gradient in the vertical direction of the microphone array;
麦克风阵列声压公式如下:The sound pressure formula of the microphone array is as follows:
其中,pw(t)表示t时刻麦克风阵列声压;Among them, p w (t) represents the sound pressure of the microphone array at time t;
N表示声音源个数;N represents the number of sound sources;
t表示时间;t means time;
sn(t)表示第n个声音源的音频信号;s n (t) represents the audio signal of the nth sound source;
hmn(t)表示第n个声音源与第m个麦克风之间的转换矩阵,hmn(t)=p0(t)αm(θn(t)),p0(t)表示t时刻由声波造成的麦克风阵列中心声压;αm(θn(t))表示在t时刻第m个麦克风关于第n个声音源的导向矢量,其中,θn(t)表示t时刻第n个声音源的波束到达方向;h mn (t) represents the transformation matrix between the nth sound source and the mth microphone, h mn (t) = p 0 (t)α m (θ n (t)), p 0 (t) represents t The sound pressure at the center of the microphone array caused by sound waves at time; α m (θ n (t)) represents the steering vector of the mth microphone with respect to the n The beam arrival direction of a sound source;
麦克风阵列水平方向声压梯度公式如下:The formula for the sound pressure gradient in the horizontal direction of the microphone array is as follows:
其中,px(t)表示麦克风阵列水平方向声压梯度;Wherein, p x (t) represents the sound pressure gradient in the horizontal direction of the microphone array;
麦克风阵列垂直方向的声压梯度公式如下:The sound pressure gradient formula in the vertical direction of the microphone array is as follows:
其中,py(t)表示麦克风阵列垂直方向的声压梯度;Wherein, p y (t) represents the sound pressure gradient in the vertical direction of the microphone array;
步骤4、采用傅里叶变换将麦克风阵列中心声压、麦克风阵列水平方向声压梯度和麦克风阵列垂直方向的声压梯度从时域转换到频域;Step 4, using Fourier transform to convert the sound pressure at the center of the microphone array, the sound pressure gradient in the horizontal direction of the microphone array, and the sound pressure gradient in the vertical direction of the microphone array from the time domain to the frequency domain;
步骤5、根据频域内的麦克风阵列声压、麦克风阵列水平方向梯度和麦克风阵列垂直方向声压梯度,获得频率域内的声压信号的强度矢量公式,进而得出强度矢量方向;Step 5, according to the microphone array sound pressure in the frequency domain, the microphone array horizontal direction gradient and the microphone array vertical direction sound pressure gradient, obtain the intensity vector formula of the sound pressure signal in the frequency domain, and then obtain the intensity vector direction;
频率域内的声压信号的强度矢量公式为:The intensity vector formula of the sound pressure signal in the frequency domain is:
频率域内的声压信号的强度矢量公式为:The intensity vector formula of the sound pressure signal in the frequency domain is:
其中,I(ω,t)表示频率域内的声压信号的强度矢量;Wherein, I (ω, t) represents the intensity vector of the sound pressure signal in the frequency domain;
ρ0表示被测环境空气密度; ρ0 represents the measured ambient air density;
c表示声速;c represents the speed of sound;
Re[.]表示取复数实部;Re[.] means to take the real part of a complex number;
pw *(ω,t)表示频域内的麦克风阵列声压的共轭矩阵;p w * (ω, t) represents the conjugate matrix of the sound pressure of the microphone array in the frequency domain;
px(ω,t)表示频域内的麦克风阵列水平方向声压梯度;p x (ω, t) represents the sound pressure gradient in the horizontal direction of the microphone array in the frequency domain;
py(ω,t)表示频域内的麦克风阵列垂直方向声压梯度;p y (ω, t) represents the sound pressure gradient in the vertical direction of the microphone array in the frequency domain;
ux表示横坐标轴方向单位矢量;u x represents the unit vector in the direction of the abscissa axis;
uy表示纵坐标轴方向单位矢量;u y represents the unit vector in the direction of the ordinate axis;
强度矢量方向公式如下:The intensity vector direction formula is as follows:
其中,γ(ω,t)表示麦克风阵列接收到的混合声音的声压信号的强度矢量方向;Wherein, γ (ω, t) represents the intensity vector direction of the sound pressure signal of the mixed sound received by the microphone array;
步骤6、对强度矢量方向进行统计获得其概率密度分布,采用混合冯米修斯分布进行拟合,获得语音强度矢量方向服从混合冯米修斯分布的模型参数,进而得到每个声压信号的强度矢量方向函数;Step 6. Perform statistics on the intensity vector direction to obtain its probability density distribution, and use the mixed von Misius distribution for fitting to obtain the model parameters that the voice intensity vector direction obeys the mixed von Misius distribution, and then obtain the intensity vector direction function of each sound pressure signal ;
具体过程如下:The specific process is as follows:
步骤6-1、对强度矢量方向进行统计获得其概率密度分布,采用混合冯米修斯分布进行拟合,获得语音的强度矢量方向服从的混合冯米修斯分布的模型参数集;Step 6-1. Perform statistics on the direction of the intensity vector to obtain its probability density distribution, and use a mixed von Misius distribution for fitting to obtain a model parameter set of the mixed von Misius distribution that the intensity vector direction of the speech obeys;
本发明实施例中,如图8所示,γ(ω,t)的分布概率密度图;根据上述所求的声音源个数和角度可以得到符合该概率密度分布的混合冯米修斯分布由3个单一冯米修斯分布组成,且这三个分布的中心角度分别为[50°,200°,300°]。In the embodiment of the present invention, as shown in Figure 8, the distribution probability density map of γ (ω, t); According to the number and angle of the sound source sought above, the mixed Von Misius distribution conforming to the probability density distribution can be obtained by three Composed of a single von Misius distribution, and the central angles of these three distributions are [50°, 200°, 300°], respectively.
所述的混合冯米修斯分布模型公式如下:The formula of the mixed von Misius distribution model is as follows:
其中,表示混合冯米修斯分布概率密度;in, Represents the mixed von Misius distribution probability density;
表示混合声音方向角度; Indicates the mixed sound direction angle;
αn表示第n个声音源的声压信号的强度矢量方向函数的权重;α n represents the weight of the intensity vector direction function of the sound pressure signal of the nth sound source;
其中,I0(kn)表示第n个声音源对应的一阶修正贝塞尔函数,kn表示第n个声音源声压信号的强度矢量方向服从的单一冯米修斯分布对应的浓度参数,即冯米修斯分布的方差的倒数; Among them, I 0 (k n ) represents the first-order modified Bessel function corresponding to the nth sound source, and k n represents the concentration parameter corresponding to the single von Misius distribution that the intensity vector direction of the sound pressure signal of the nth sound source obeys, That is, the reciprocal of the variance of the von Misius distribution;
混合冯米修斯分布函数参数集如下:The parameter set of the mixed von Misius distribution function is as follows:
Γ={αn,kn},i=1,2,3(11)Γ={α n , k n }, i=1, 2, 3 (11)
步骤6-2、初始化模型参数,获得初始函数参数集;Step 6-2, initialize the model parameters, and obtain the initial function parameter set;
本发明实施例中,α取值为[1/3,1/3,1/3],k取值[8,6,3];In the embodiment of the present invention, the value of α is [1/3, 1/3, 1/3], and the value of k is [8, 6, 3];
步骤6-3、根据获得的初始模型参数,建立初始的混合冯米修斯分布函数,得到函数公式为:Step 6-3. According to the obtained initial model parameters, an initial mixed von Misius distribution function is established, and the function formula is obtained as:
采用最大期望算法估计得到混合冯米修斯分布模型的参数,其原则就是观测值出现的概率最大,通过对模型函数关于参数α和k求导等于零计算参数α和k的重估值,The maximum expectation algorithm is used to estimate the parameters of the mixed von Misius distribution model. The principle is that the probability of the observed value is the largest. By deriving the model function with respect to the parameters α and k equal to zero, the revaluation of the parameters α and k is calculated.
将γ(ω,t)作为代入取对数得到初始对数似然值-3.0249e+004,通过计算每个当前单一冯米修斯分布占混合冯米修斯分布的比例可以获得重估的α参数[0.2267,0.2817,0.4516],同时根据求导所得参数k求取方法得到重估k的值为[5.1498,4.0061,3.1277],此时可得到新的对数似然值为-2.9887e+004,比较新旧似然值差值为362.3362远大于阈值所取阈值0.1,故将新似然值赋值给旧似然值,然后再重新用这两个新得到的重估参数重复刚才步骤直到新旧似然值小于阈值即认为算法收敛,本实例中最终得到α参数[0.2689,0.2811,0.4500],k的值为[4.3508,3.3601,2.8332],此时即获得了满足强度矢量方向分布的混合冯米修斯分布函数,如图9所示为得到的混合冯米修斯分布。Take γ(ω,t) as substitute Take the logarithm to get the initial log-likelihood value -3.0249e+004. By calculating the proportion of each current single von Misius distribution to the mixed von Misius distribution, the revalued α parameter [0.2267, 0.2817, 0.4516] can be obtained, and at the same time according to The method of calculating the parameter k obtained by deriving the value of the revalued k is [5.1498, 4.0061, 3.1277]. At this time, the new logarithmic likelihood value can be obtained -2.9887e+004, and the difference between the old and new likelihood values is 362.3362. The threshold value is 0.1, so the new likelihood value is assigned to the old likelihood value, and then the two newly obtained re-evaluation parameters are used to repeat the previous steps until the old and new likelihood values are less than the threshold value, and the algorithm is considered to converge. In this example Finally, the α parameter [0.2689, 0.2811, 0.4500] is obtained, and the value of k is [4.3508, 3.3601, 2.8332]. At this time, the mixed von Misius distribution function satisfying the direction distribution of the intensity vector is obtained, as shown in Figure 9. Mixed von Misius distribution.
步骤6-4、根据估计得到的混合冯米修斯分布模型参数,求得每个声压信号的强度矢量方向函数;Step 6-4, obtain the intensity vector direction function of each sound pressure signal according to the estimated mixed von Misius distribution model parameters;
声压信号的强度矢量方向函数公式如下:The formula of the intensity vector direction function of the sound pressure signal is as follows:
其中,表示第n个声音源的强度矢量方向函数;in, Represents the intensity vector direction function of the nth sound source;
步骤7、根据得到的每个声压信号的强度矢量方向函数和麦克风阵列声压,获得每个声音源在频率域信号,并采用傅里叶反变换将该频域中的每个声源信号转换为时域内的声源信号;Step 7, according to the obtained intensity vector direction function of each sound pressure signal and the sound pressure of the microphone array, obtain the signal of each sound source in the frequency domain, and use the inverse Fourier transform to convert each sound source signal in the frequency domain Convert to a sound source signal in the time domain;
每个声音源在频域中的信号公式如下:The signal formula for each sound source in the frequency domain is as follows:
其中,表示混合语音分离后得到的第n个声源信号的频率域信号;in, Represents the frequency domain signal of the nth sound source signal obtained after the mixed speech separation;
将经过傅里叶反变换得到时域信号 Will The time domain signal is obtained by inverse Fourier transform
步骤8、计算每个声音源信号与样本语音库中指定声音源的匹配概率,认为概率值最大的声音源即为目标声音源,保留该声音源信号,删除其他非目标声音源;Step 8, calculate the matching probability of each sound source signal and the specified sound source in the sample speech library, think that the sound source with the largest probability value is the target sound source, keep the sound source signal, and delete other non-target sound sources;
本发明实施例中,假设第一个人为目标声音源,最终分离后的三个语音与该目标声音模型的匹配概率对数值分别为[-2.0850-2.8807-3.5084]×104,其中最大匹配声音为1号分离后声音,即找到目标声音源。In the embodiment of the present invention, assuming that the first person is the target sound source, the logarithm values of the matching probabilities of the final separated three voices and the target sound model are [-2.0850-2.8807-3.5084]×10 4 , where the largest matching sound After separating the sound for No. 1, the target sound source is found.
每个声音源信号与样本语音库中指定声音源的匹配概率公式如下:The matching probability formula of each sound source signal and the specified sound source in the sample speech library is as follows:
式中:表示由分离后语音提取的语音特征参数,即提取语音的梅尔频率倒谱系数作为语音的特征参数;In the formula: voice after separation The extracted speech feature parameters, that is, the extracted speech Mel frequency cepstral coefficients as speech The characteristic parameters;
表示第n个声音源信号与样本语音库中指定声音源的匹配概率; Indicates the matching probability of the nth sound source signal and the specified sound source in the sample speech library;
Gc表示用户指定人的声音模型参数;G c represents the voice model parameter of the user designated person;
表示分离后语音属于用户指定人声音的概率; Indicates the probability that the voice after separation belongs to the voice of the user-designated person;
步骤9、对保留的声音源信号进行放大,即完成在被测环境中对指定声音源的放大。Step 9, amplifying the retained sound source signal, that is, completing the amplification of the specified sound source in the tested environment.
本发明实施例中,最后根据得到的混合冯米修斯分布模型参数得到每个声音源的方向函数,进一步分离得到原始声音,如图10中图(a)至图(f)所示,即为理想与分离后得到数据的对比图,可以看到相似度极高。In the embodiment of the present invention, finally, according to the obtained mixed von Misius distribution model parameters, the direction function of each sound source is obtained, and the original sound is further separated, as shown in Fig. 10 (a) to (f), which is ideal Compared with the data obtained after separation, it can be seen that the similarity is extremely high.
Claims (3)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410081493.6A CN103811020B (en) | 2014-03-05 | 2014-03-05 | A kind of intelligent sound processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410081493.6A CN103811020B (en) | 2014-03-05 | 2014-03-05 | A kind of intelligent sound processing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103811020A CN103811020A (en) | 2014-05-21 |
CN103811020B true CN103811020B (en) | 2016-06-22 |
Family
ID=50707692
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410081493.6A Active CN103811020B (en) | 2014-03-05 | 2014-03-05 | A kind of intelligent sound processing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103811020B (en) |
Families Citing this family (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104200813B (en) * | 2014-07-01 | 2017-05-10 | 东北大学 | Dynamic blind signal separation method based on real-time prediction and tracking on sound source direction |
CN105609099A (en) * | 2015-12-25 | 2016-05-25 | 重庆邮电大学 | Speech recognition pretreatment method based on human auditory characteristic |
CN105933820A (en) * | 2016-04-28 | 2016-09-07 | 冠捷显示科技(中国)有限公司 | Automatic positioning method of external wireless sound boxes |
CN106205610B (en) * | 2016-06-29 | 2019-11-26 | 联想(北京)有限公司 | A kind of voice information identification method and equipment |
CN106128472A (en) * | 2016-07-12 | 2016-11-16 | 乐视控股(北京)有限公司 | The processing method and processing device of singer's sound |
CN106448722B (en) * | 2016-09-14 | 2019-01-18 | 讯飞智元信息科技有限公司 | The way of recording, device and system |
CN108630193B (en) * | 2017-03-21 | 2020-10-02 | 北京嘀嘀无限科技发展有限公司 | Voice recognition method and device |
CN107220021B (en) * | 2017-05-16 | 2021-03-23 | 北京小鸟看看科技有限公司 | Voice input recognition method and device and head-mounted equipment |
CN107274895B (en) * | 2017-08-18 | 2020-04-17 | 京东方科技集团股份有限公司 | Voice recognition device and method |
CN107527626A (en) * | 2017-08-30 | 2017-12-29 | 北京嘉楠捷思信息技术有限公司 | Audio identification system |
CN108198569B (en) * | 2017-12-28 | 2021-07-16 | 北京搜狗科技发展有限公司 | Audio processing method, device and equipment and readable storage medium |
CN108520756B (en) * | 2018-03-20 | 2020-09-01 | 北京时代拓灵科技有限公司 | Method and device for separating speaker voice |
CN110310642B (en) * | 2018-03-20 | 2023-12-26 | 阿里巴巴集团控股有限公司 | Voice processing method, system, client, equipment and storage medium |
CN108694950B (en) * | 2018-05-16 | 2021-10-01 | 清华大学 | A Speaker Confirmation Method Based on Deep Mixture Model |
CN108766459B (en) * | 2018-06-13 | 2020-07-17 | 北京联合大学 | Target speaker estimation method and system in multi-user voice mixing |
CN108735227B (en) * | 2018-06-22 | 2020-05-19 | 北京三听科技有限公司 | Method and system for separating sound source of voice signal picked up by microphone array |
CN110867191B (en) * | 2018-08-28 | 2024-06-25 | 洞见未来科技股份有限公司 | Speech processing method, information device and computer program product |
CN109505741B (en) * | 2018-12-20 | 2020-07-10 | 浙江大学 | A method and device for detecting damaged blades of a wind turbine based on a rectangular microphone array |
CN110335626A (en) * | 2019-07-09 | 2019-10-15 | 北京字节跳动网络技术有限公司 | Age recognition methods and device, storage medium based on audio |
CN110288996A (en) * | 2019-07-22 | 2019-09-27 | 厦门钛尚人工智能科技有限公司 | A kind of speech recognition equipment and audio recognition method |
CN112289335B (en) * | 2019-07-24 | 2024-11-12 | 阿里巴巴集团控股有限公司 | Voice signal processing method, device and sound pickup device |
CN110473566A (en) * | 2019-07-25 | 2019-11-19 | 深圳壹账通智能科技有限公司 | Audio separation method, device, electronic equipment and computer readable storage medium |
GB2586126A (en) * | 2019-08-02 | 2021-02-10 | Nokia Technologies Oy | MASA with embedded near-far stereo for mobile devices |
CN110706688B (en) * | 2019-11-11 | 2022-06-17 | 广州国音智能科技有限公司 | Construction method, system, terminal and readable storage medium of speech recognition model |
CN111028857B (en) * | 2019-12-27 | 2024-01-19 | 宁波蛙声科技有限公司 | Method and system for reducing noise of multichannel audio-video conference based on deep learning |
CN111816185A (en) * | 2020-07-07 | 2020-10-23 | 广东工业大学 | A method and device for identifying speakers in mixed speech |
CN111696570B (en) * | 2020-08-17 | 2020-11-24 | 北京声智科技有限公司 | Voice signal processing method, device, equipment and storage medium |
CN111899756B (en) * | 2020-09-29 | 2021-04-09 | 北京清微智能科技有限公司 | Single-channel voice separation method and device |
CN114093382A (en) * | 2021-11-23 | 2022-02-25 | 广东电网有限责任公司 | Intelligent interaction method suitable for voice information |
CN114242072A (en) * | 2021-12-21 | 2022-03-25 | 上海帝图信息科技有限公司 | A speech recognition system for intelligent robots |
CN114613385A (en) * | 2022-05-07 | 2022-06-10 | 广州易而达科技股份有限公司 | Far-field voice noise reduction method, cloud server and audio acquisition equipment |
CN115240689B (en) * | 2022-09-15 | 2022-12-02 | 深圳市水世界信息有限公司 | Target sound determination method, target sound determination device, computer equipment and medium |
CN118574049B (en) * | 2024-08-01 | 2024-11-08 | 罗普特科技集团股份有限公司 | Microphone calibration method and system of multi-mode intelligent terminal |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1653519A (en) * | 2002-03-20 | 2005-08-10 | 高通股份有限公司 | Method for robust voice recognition by analyzing redundant features of source signal |
JP2012211768A (en) * | 2011-03-30 | 2012-11-01 | Advanced Telecommunication Research Institute International | Sound source positioning apparatus |
CN103426434A (en) * | 2012-05-04 | 2013-12-04 | 索尼电脑娱乐公司 | Source separation by independent component analysis in conjunction with source direction information |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8249867B2 (en) * | 2007-12-11 | 2012-08-21 | Electronics And Telecommunications Research Institute | Microphone array based speech recognition system and target speech extracting method of the system |
-
2014
- 2014-03-05 CN CN201410081493.6A patent/CN103811020B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1653519A (en) * | 2002-03-20 | 2005-08-10 | 高通股份有限公司 | Method for robust voice recognition by analyzing redundant features of source signal |
JP2012211768A (en) * | 2011-03-30 | 2012-11-01 | Advanced Telecommunication Research Institute International | Sound source positioning apparatus |
CN103426434A (en) * | 2012-05-04 | 2013-12-04 | 索尼电脑娱乐公司 | Source separation by independent component analysis in conjunction with source direction information |
Also Published As
Publication number | Publication date |
---|---|
CN103811020A (en) | 2014-05-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103811020B (en) | A kind of intelligent sound processing method | |
CN109830245B (en) | A method and system for multi-speaker speech separation based on beamforming | |
Zhang et al. | Deep learning based binaural speech separation in reverberant environments | |
CN112116920B (en) | Multi-channel voice separation method with unknown speaker number | |
CN111429939B (en) | Sound signal separation method of double sound sources and pickup | |
CN112634935B (en) | Voice separation method and device, electronic equipment and readable storage medium | |
Brutti et al. | Oriented global coherence field for the estimation of the head orientation in smart rooms equipped with distributed microphone arrays. | |
CN102388416A (en) | Signal processing apparatus and signal processing method | |
JP4964204B2 (en) | Multiple signal section estimation device, multiple signal section estimation method, program thereof, and recording medium | |
Wan et al. | Sound source localization based on discrimination of cross-correlation functions | |
CN109859749A (en) | A kind of voice signal recognition methods and device | |
CN106019230B (en) | A kind of sound localization method based on i-vector Speaker Identification | |
Enzinger et al. | Mismatched distances from speakers to telephone in a forensic-voice-comparison case | |
Xia et al. | Ava: An adaptive audio filtering architecture for enhancing mobile, embedded, and cyber-physical systems | |
Talagala et al. | Binaural localization of speech sources in the median plane using cepstral hrtf extraction | |
Krijnders et al. | Tone-fit and MFCC scene classification compared to human recognition | |
Kothapally et al. | Speech Detection and Enhancement Using Single Microphone for Distant Speech Applications in Reverberant Environments. | |
Sailor et al. | Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection. | |
Oualil et al. | Joint detection and localization of multiple speakers using a probabilistic interpretation of the steered response power | |
CN103544953A (en) | Sound environment recognition method based on background noise minimum statistic feature | |
Ghalamiosgouei et al. | Robust Speaker Identification Based on Binaural Masks | |
Habib et al. | Auditory inspired methods for localization of multiple concurrent speakers | |
Venkatesan et al. | Analysis of monaural and binaural statistical properties for the estimation of distance of a target speaker | |
Guzewich et al. | Cross-Corpora Convolutional Deep Neural Network Dereverberation Preprocessing for Speaker Verification and Speech Enhancement. | |
Nguyen et al. | Location Estimation of Receivers in an Audio Room using Deep Learning with a Convolution Neural Network. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |