WO2020113575A1 - 基于约束半非负矩阵分解的声音分类方法、装置及介质 - Google Patents
基于约束半非负矩阵分解的声音分类方法、装置及介质 Download PDFInfo
- Publication number
- WO2020113575A1 WO2020113575A1 PCT/CN2018/119894 CN2018119894W WO2020113575A1 WO 2020113575 A1 WO2020113575 A1 WO 2020113575A1 CN 2018119894 W CN2018119894 W CN 2018119894W WO 2020113575 A1 WO2020113575 A1 WO 2020113575A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- matrix
- semi
- sound data
- negative
- training
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 239000011159 matrix material Substances 0.000 claims abstract description 300
- 238000012549 training Methods 0.000 claims abstract description 64
- 238000012360 testing method Methods 0.000 claims abstract description 43
- 238000013145 classification model Methods 0.000 claims abstract description 26
- 230000009467 reduction Effects 0.000 claims abstract description 9
- 238000000354 decomposition reaction Methods 0.000 claims description 21
- 230000006870 function Effects 0.000 claims description 17
- 238000004422 calculation algorithm Methods 0.000 claims description 9
- 238000003860 storage Methods 0.000 claims description 4
- 238000012706 support-vector machine Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000004088 simulation Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 235000013324 preserved food Nutrition 0.000 description 3
- 230000005236 sound signal Effects 0.000 description 3
- 208000024827 Alzheimer disease Diseases 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011946 reduction process Methods 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 208000037656 Respiratory Sounds Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012880 independent component analysis Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 238000012372 quality testing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01L—MEASURING FORCE, STRESS, TORQUE, WORK, MECHANICAL POWER, MECHANICAL EFFICIENCY, OR FLUID PRESSURE
- G01L21/00—Vacuum gauges
- G01L21/08—Vacuum gauges by measuring variations in the transmission of acoustic waves through the medium, the pressure of which is to be measured
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Definitions
- the invention relates to the technical field of sound signal processing and pattern recognition, in particular to a sound classification method, device and medium based on constrained semi-non-negative matrix factorization.
- the dimension of sound data is usually higher. If only the original sound data is used for classification, the classification system may have a large amount of calculation and poor classification accuracy. In order to solve this problem, high-dimensional sound data is generally reduced in dimension and compressed into a low-dimensional subspace for recognition. This process is usually called sound feature extraction, and low-dimensional sound features are usually better than the original sound data. More differentiated. Fundamental frequency, short-term average zero-crossing rate, formants, spectrum peaks, etc. are commonly used low-dimensional sound characteristics. However, as the application of sound technology becomes more and more extensive and the demand for sound recognition effect increases, higher requirements are placed on the method of extracting sound features. The above-mentioned low-dimensional sound features are gradually difficult to effectively distinguish sound category attributes. Therefore, scholars at home and abroad have been studying more effective methods of sound dimensionality reduction, such as matrix decomposition, principal component analysis, independent component analysis, etc.
- the sound feature extraction method described in this patent does not use the category information of the training sample to improve the discrimination performance of the low-dimensional representation of the sample during the dimensionality reduction process using the non-negative matrix decomposition algorithm, which may affect the recognition of the feature after the dimensionality reduction Effect;
- sparse constraints are not used in the dimensionality reduction process, and sparse low dimensionality may weaken the features with poor discrimination performance and strengthen the features with better discrimination performance.
- the method described in this patent does not directly reduce the dimensionality of the voice data, but uses a non-negative matrix decomposition algorithm to reduce the dimensionality of the acoustic feature matrix extracted from the voice data, but the above acoustic features do not necessarily fully represent the type of voice data Attributes.
- the non-negative matrix factorization algorithm is not applicable.
- the constrained semi-non-negative matrix factorization algorithm is used to reduce the dimensionality of the sound data.
- the constraints include category constraints and sparse constraints to effectively use the category information of the training sound data samples and make the dimensionality-reduced low-dimensional representations sparse, resulting in more differentiated sample low-dimensional representations, thereby improving the sound data classification method Correct rate.
- the present invention adopts the following technical solutions:
- a sound classification method based on constrained semi-negative matrix factorization includes the following steps:
- the low-dimensional representation corresponding to the test sound data sample in the coefficient matrix H is input to the classifier R, and the classification result of the test sound data sample is output.
- training sound data samples and the test sound data samples described in S1 are represented as a semi-negative matrix X, as follows:
- O represents the zero matrix
- I N2 is an identity matrix with N2 rows and N2 columns.
- ⁇ is the sparsity parameter, and its range is 0 ⁇ 1;
- I M′ is an identity matrix with M′ rows and M′ columns;
- l is an element with all 1, and the dimension is M′ The column vector of l;
- l T is the transpose of l.
- the constrained semi-negative matrix factorization is performed on the semi-non-negative matrix X to obtain the corresponding coefficient matrix H, which is performed as follows:
- Equation (2) Represents the Frobenius norm of the matrix;
- U is a category constraint matrix;
- S is a sparse constraint matrix;
- Z is a non-negative matrix, and the non-negative matrix Z is a matrix with (B+N2) rows of M′ columns;
- (UZ) T is the transpose of (UZ);
- U is the category constraint matrix
- S is the sparse constraint matrix
- Z is a non-negative matrix
- Z is a matrix with (P+N2) rows and M′ columns
- X is a semi-negative matrix
- S T is the transpose of S
- Z T is the transpose of Z
- U T is the transpose of U;
- U is a category constraint matrix
- S is a sparse constraint matrix
- Z is a non-negative matrix
- X is a semi-negative matrix
- W is a semi-non-negative matrix
- S T is S Transpose
- Z T is the transpose of Z
- U T is the transpose of U
- W T is the transpose of W;
- U is the category constraint matrix
- Z is a non-negative matrix
- (UZ) T is the transpose of (UZ).
- the classification model is trained to obtain the classifier R, as follows:
- the classification model is selected, the classification model is recorded as MW, ht i is used as the input of the classification model MW, and a i, b are used as the output of the classification model MW, and the classification model MW is trained to obtain the classifier R.
- the low-dimensional representation corresponding to the test sound data samples in the coefficient matrix H described in S5 is input to the classifier R, and the classification result of the test sound data samples is output, as follows:
- classification model MW selects the nearest neighbor classifier or support vector machine.
- a sound classification device based on constrained semi-non-negative matrix factorization includes:
- the memory is coupled to the processor and stores instructions, and the instructions execute steps of implementing the sound classification method based on constrained semi-non-negative matrix factorization as described above.
- the device acquires training sound data samples and test sound data samples.
- a computer readable storage medium stores an application program for a sound classification method based on constrained semi-non-negative matrix factorization, which implements the steps of the sound classification method based on constrained semi-non-negative matrix factorization as described above.
- the present invention effectively utilizes the category information of the training sound sample data and adds sparse constraints during the process of semi-negative matrix decomposition of the sound data samples, so that it can obtain more distinguishing low-dimensional sound characteristics and solve Semi-non-negative matrix factorization can not use the training data category and the problem of applying sparse constraints, which improves the accuracy of sound data classification methods.
- FIG. 1 is a working flowchart of a sound classification method based on constrained semi-non-negative matrix factorization.
- a sound classification method based on constrained semi-non-negative matrix factorization includes the following steps:
- the low-dimensional representation corresponding to the test sound data sample in the coefficient matrix H is input to the classifier R, and the classification result of the test sound data sample is output.
- the semi-negative matrix according to the present invention means that the elements of the matrix have both positive and negative values
- the non-negative matrix according to the present invention means that the values of the elements of the matrix have no negative numbers
- training sound data samples and the test sound data samples described in S1 are represented as a semi-non-negative matrix X, as follows:
- the amplitudes of the training sound data samples and the test sound data samples are normalized so that the amplitude of each sample is [-1, 1];
- the category constraint matrix U constructed according to the semi-negative matrix X described in S2 is performed as follows:
- O represents a zero matrix (the elements of the zero matrix are all 0)
- I N2 is an identity matrix with N2 rows and N2 columns (the diagonal elements of the identity matrix are all 1, and the remaining elements are all 0).
- ⁇ is the sparsity parameter (the sparsity parameter can be set), and its range is 0 ⁇ 1;
- I M′ is an identity matrix with M′ rows and M′ columns;
- l is an element Is 1, a column vector with dimension M′;
- l T is the transpose of l (the superscript T of l means transpose l).
- the constrained semi-negative matrix factorization is performed on the semi-non-negative matrix X to obtain the corresponding coefficient matrix H, as follows:
- Equation (2) Represents the Frobenius norm of the matrix;
- U is a category constraint matrix;
- S is a sparse constraint matrix;
- Z is a non-negative matrix, and the non-negative matrix Z is a matrix with (B+N2) rows of M′ columns;
- (UZ) T is the transpose of (UZ);
- U is the category constraint matrix
- S is the sparse constraint matrix
- Z is a non-negative matrix
- Z is a matrix with (P+N2) rows and M′ columns
- X is a semi-negative matrix
- S T is the transpose of S
- Z T is the transpose of Z
- U T is the transpose of U;
- U is a category constraint matrix
- S is a sparse constraint matrix
- Z is a non-negative matrix
- X is a semi-negative matrix
- W is a semi-non-negative matrix
- S T is S Transpose
- Z T is the transpose of Z
- U T is the transpose of U
- W T is the transpose of W;
- U is the category constraint matrix
- Z is a non-negative matrix
- (UZ) T is the transpose of (UZ).
- the low-dimensional representation corresponding to the training sound data samples in the coefficient matrix H and the category information of the training sound data samples are used as training data to train the classification model to obtain the classifier R, Proceed as follows:
- the classification model is selected, the classification model is recorded as MW, ht i is used as the input of the classification model MW, and a i, b are used as the output of the classification model MW, and the classification model MW is trained to obtain the classifier R.
- the low-dimensional representation corresponding to the test sound data samples in the coefficient matrix H described in S5 is input to the classifier R to output the classification results of the test sound data samples, as follows:
- the classification model MW selects a nearest neighbor classifier or a support vector machine.
- the iterative update formula for the matrix Z and the base matrix W described in the present invention is derived as follows.
- X is a semi-negative matrix
- W is a base matrix that constrains the decomposition of semi-negative matrices
- U is a category constraint matrix
- S is a sparse constraint matrix
- Z is a non-negative matrix.
- + X T W) / 2 , (X T W) - (
- denotes taking the absolute value of the matrix; (W T W) + and (W T W) - represent the matrix (W T W) is non-negative and negative portion portion, i.e., (W T W) + (
- + W T W) / 2, (W T W) - (
- Z is a non-negative matrix. In order to maintain its non-negativity in the process of iterative update, multiply Z 2 on both sides of the above equal sign, then
- a sound classification device based on constrained semi-non-negative matrix factorization includes:
- the memory is coupled to the processor and stores instructions, and the instructions execute steps of implementing the sound classification method based on constrained semi-non-negative matrix factorization as described above.
- the device acquires training sound data samples and test sound data samples.
- a computer readable storage medium stores an application program for a sound classification method based on constrained semi-non-negative matrix factorization, which implements the steps of the sound classification method based on constrained semi-non-negative matrix factorization as described above.
- the experimental data sample comes from the sound of canned food vibration.
- the way to make canned food vibrate and make sound is to use high-energy electromagnetic pulse signal to excite the can lid, so that the can lid vibrates and makes sound.
- This sound signal can reflect the pressure in the tank.
- a total of 72 sound signals of certain canned foods were collected. Among them, 36 products passed the pressure in the can, 15 products had the pressure in the can, and 21 products had the pressure in the can.
- the simulation of the present invention uses Matlab 9.2.0 simulation software, the sparsity parameter ⁇ of the constrained semi-negative matrix factorization algorithm is set to 0.0, 0.1 and 0.3, the minimum value of the objective function ⁇ ⁇ min is set to 0.0001, and the maximum number of iterations E max For 100 times, the classification model MW selects the nearest neighbor classifier, and each experiment is independently run 5 times repeatedly, and the average value is taken as the final result.
- the sound data corresponding to the products with the qualified pressure in the tank and the products with excessive pressure in the tank are respectively subjected to the constrained semi-non-negative matrix decomposition method and the traditional semi-non-negative matrix decomposition method described in the present invention to perform the pressure classification experiment in the tank.
- Table 1 shows.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Complex Calculations (AREA)
Abstract
一种基于约束半非负矩阵分解的声音分类方法、装置及介质,该声音分类方法包括:将训练声音数据样本和测试声音数据样本表示为半非负矩阵(S1);根据半非负矩阵构建类别约束矩阵,并根据半非负矩阵构建稀疏约束矩阵(S2);在类别约束和稀疏约束下,对半非负矩阵进行约束半非负矩阵分解,得到对应的系数矩阵;将系数矩阵中对应于训练声音数据样本的低维表示以及训练声音数据样本的类别信息作为训练数据,对分类模型进行训练得到分类器(S3);将系数矩阵中对应于测试声音数据样本的低维表示输入分类器,输出测试声音数据样本的分类结果(S4)。该方法有效利用了训练声音数据样本的类别信息并使得降维后的低维表示具有稀疏性,从而得到更具区分性的样本低维表示,提高了声音数据分类方法的正确率。
Description
本发明涉及声音信号处理和模式识别技术领域,具体涉及一种基于约束半非负矩阵分解的声音分类方法、装置及介质。
随着社会发展和科学技术的不断进步,声音识别技术在我们的生产生活中被日益广泛的研究和应用,目前其已经被应用在产品质量检测、车辆NVH性能试验、声学事件分类、语音文字转换、心肺音分类等领域。
声音数据的维数通常较高,如果仅仅利用原始声音数据进行分类,可能会使分类系统计算量较大而且分类精度较差。为解决此问题,一般对高维的声音数据进行降维处理,将其压缩到一个低维的子空间中进行识别,此过程通常也叫做声音特征提取,低维的声音特征通常比原始声音数据更具区分性。基频、短时平均过零率、共振峰、频谱峰值等都是常用的低维声音特征。但是随着声音技术的应用越来越广泛以及对声音识别效果的需求提升,对提取声音特征的方法提出了更高的要求,上述低维声音特征逐渐难以有效地区分声音类别属性。因此,国内外学者一直在研究更有效的声音降维方法,如矩阵分解、主成分分析、独立成分分析等。
目前矩阵分解已经被学者们广泛认为具有较好的特征解释和特征表示能力,其已成为图像、声音、光谱等信号处理领域的研究热点,并在数据降维和特征提取方面获得了较为成功的应用。华南理工大学申请的中国发明专利“一种基于语音特征非负矩阵分解的阿尔茨海默症初筛方法”(申请号201810140213.2),首先从人的语音数据中提取声学特征,包括基频、能量、谐噪比、共振峰、声门波、线性预测系数、常Q倒谱系数,并将上述特征拼接成一个特征矩阵,再采用非负矩阵分解算法对上述特征矩阵进行分解,得到降维 后的特征矩阵,将上述降维后的特征矩阵输入分类器来判断该人正常或者是阿尔茨海默症患者。该专利所述的声音特征提取方法在采用非负矩阵分解算法进行降维的过程中,没有利用训练样本的类别信息来提高样本低维表示的区分性能,从而可能影响降维后的特征的识别效果;此外,在降维过程中也没有采用稀疏约束,而稀疏的低维表示可能弱化区分性能较差的特征而强化区分性能较好的特征。该专利所述方法没有直接对语音数据进行降维,而是采用非负矩阵分解算法对从语音数据中提取的声学特征矩阵进行降维,但是上述声学特征并不一定能充分表示语音数据的类别属性。因此,通常可能需要直接对语音样本进行降维处理,以获得样本的低维表示。如果直接对语音数据进行降维,由于语音数据一般是半非负的,非负矩阵分解算法并不适用。
发明内容
有鉴于此,有必要针对上述问题,提出一种基于约束半非负矩阵分解的声音分类方法、装置及介质,采用约束半非负矩阵分解算法对声音数据进行降维处理,降维过程中施加的约束包括类别约束和稀疏约束,以有效利用训练声音数据样本的类别信息并使得降维后的低维表示具有稀疏性,得到更具区分性的样本低维表示,从而提高声音数据分类方法的正确率。
为实现上述目的,本发明采取以下的技术方案:
一种基于约束半非负矩阵分解的声音分类方法包括以下步骤:
S1,将训练声音数据样本和测试声音数据样本表示为半非负矩阵X;
S2,根据半非负矩阵X构建类别约束矩阵U,并根据半非负矩阵X构建稀疏约束矩阵S;
S3,在类别约束和稀疏约束下,对半非负矩阵X进行约束半非负矩阵分解,得到对应的系数矩阵H;
S4,将系数矩阵H中对应于训练声音数据样本的低维表示以及训练声音数 据样本的类别信息作为训练数据,对分类模型进行训练得到分类器R;
S5,将系数矩阵H中对应于测试声音数据样本的低维表示输入分类器R,输出测试声音数据样本的分类结果。
进一步地,S1所述的将训练声音数据样本和测试声音数据样本表示为半非负矩阵X,按如下步骤进行:
S11,对训练声音数据样本和测试声音数据样本进行幅值归一化,使得每个样本的幅值在[-1,1];
S12,将每个训练声音数据样本表示成一个M维的列向量,记为x
i(i=1,2,…,N1),其中N1为训练声音数据样本的个数;并将每个测试声音数据样本表示成一个M维的列向量,记为x
j(j=1,2,…,N2),其中N2为测试声音数据样本的个数;
S13,将x
i和x
j排列成半非负矩阵X(M行N列),X被记为x
k(k=1,2,…,N;N=N1+N2),其中前N1列是类别已知的训练样本(x
1…x
N1),剩下的N2列(N2=N-N1)是类别未知的测试样本(x
N1+1…x
N)。
进一步地,S2所述的根据半非负矩阵X构建类别约束矩阵U,按如下步骤进行:
S201,声音数据样本包含B类,每个声音数据样本属于一个类别,根据半非负矩阵X中的训练样本构建一个N1行B列的矩阵C,矩阵C记为c
i,b(i=1,2,…,N1;b=1,2,…,B);当训练样本x
i是第b类时,c
i,b=1,其余c
i,b=0;
S202,构建N行(B+N2)列的类别约束矩阵U如下
其中,O表示零矩阵,I
N2是一个N2行N2列的单位矩阵。
进一步地,S2所述的根据半非负矩阵X构建稀疏约束矩阵S,按如下方式进行:
每个声音数据样本经过约束半非负矩阵分解算法降维后,其维度由M维变为M′维,则构建稀疏约束矩阵S如下
于公式(1)中,θ为稀疏度参数,其范围为0≤θ≤1;I
M′是一个M′行M′列的单位矩阵;l是一个元素全为1、维数为M′的列向量;l
T为l的转置。
进一步地,S3所述的在类别约束和稀疏约束下,对半非负矩阵X进行约束半非负矩阵分解,得到对应的系数矩阵H,按如下步骤进行:
S31,构造约束半非负矩阵分解的目标函数Γ
于公式(2)中,
表示矩阵的Frobenius范数;W表示约束半非负矩阵分解的基矩阵,W=[w
1,w
2,…,w
M′]是一个半非负矩阵,w
i(i=1,2,…,M′)表示一个M维的列向量;U是类别约束矩阵;S是稀疏约束矩阵;Z是一个非负矩阵,非负矩阵Z是一个(B+N2)行M′列的矩阵;(UZ)
T为(UZ)的转置;
S32,将矩阵Z的所有元素的值初始化为(0,1)之间的随机正数;
S33,计算基矩阵W的初始值为
于公式(3)中,U为类别约束矩阵;S为稀疏约束矩阵;Z为一个非负矩阵,非负矩阵Z为一个(P+N2)行M′列的矩阵;X为半非负矩阵;S
T为S的转置;Z
T为Z的转置;U
T为U的转置;
S34,设定约束半非负矩阵分解的目标函数Γ的最小值Γ
min、稀疏度参数θ、降维后的维度M′的值;
S35,将矩阵Z和基矩阵W交替迭代更新:先迭代更新一次矩阵Z,然后迭代更新一次基矩阵W,如此循环往复的先后迭代更新矩阵Z和基矩阵W;利用公式
迭代更新矩阵Z中元素,利用公式
迭代更新基矩阵W中的元素;
于公式(4)及公式(5)中,U为类别约束矩阵;S为稀疏约束矩阵;Z为一个非负矩阵;X为半非负矩阵;W为一个半非负矩阵;S
T为S的转置;Z
T为Z的转置;U
T为U的转置;W
T为W的转置;
S36,设定最大迭代次数E
max,每次迭代更新完成后计算目标函数Γ的值,当目标函数Γ的值小于Γ
min或者迭代次数达到E
max时,则停止迭代,得到最终的基矩阵W和矩阵Z;
S37,计算约束半非负矩阵分解的系数矩阵H
H=(UZ)
T (6)
于公式(6)中,H=[h
1;h
2;…;h
N]表示约束半非负矩阵分解的系数矩阵,h
i(i=1,2,…,N)为一个M′维的行向量;U为类别约束矩阵;Z为一个非负矩阵;(UZ)
T为(UZ)的转置。
进一步地,S4所述的将系数矩阵H中对应于训练声音数据样本的低维表示以及训练声音数据样本的类别信息作为训练数据,对分类模型进行训练得到分类器R,按如下步骤进行:
S41,系数矩阵H中的前N1行是训练声音数据样本的低维表示,记为HT, HT=[ht
1;ht
2;…;ht
N1],ht
i(i=1,2,…,N1)是维度为M′的行向量;
S42,训练声音数据样本的类别信息表示为矩阵A,该矩阵A记为a
i,b(i=1,2,…,N1;b=1,2,…,B),当ht
i对应的样本属于第b类时,a
i,b=1,其余a
i,b=0;
S43,选定分类模型,分类模型记为MW,将ht
i作为分类模型MW的输入、a
i,b作为分类模型MW的输出,对分类模型MW进行训练,得到分类器R。
进一步地,S5所述的将系数矩阵H中对应于测试声音数据样本的低维表示输入分类器R,输出测试声音数据样本的分类结果,按如下步骤进行:
S51,系数矩阵H中的(N1+1)~N行(共N2行)是测试声音数据样本的低维表示,记为HC,HC=[hc
1;hc
2;…;hc
N2],hc
j(j=1,2,…,N2)是维度为M′的行向量;
S52,将hc
j输入分类器R,分类器R的输出即是对应的测试样本的分类结果。
进一步地,所述分类模型MW选用最近邻分类器或支持向量机。
一种基于约束半非负矩阵分解的声音分类装置包括:
处理器;
存储器,耦合至所述的处理器并存储有指令,所述的指令在由所述处理器执行实现如上所述的基于约束半非负矩阵分解的声音分类方法的步骤。
进一步地,该装置获取训练声音数据样本和测试声音数据样本。
一种计算机可读取存储介质存储有基于约束半非负矩阵分解的声音分类方法的应用程序,所述应用程序实现如上所述的基于约束半非负矩阵分解的声音分类方法的步骤。
本发明的有益效果为:
本发明由于在对声音数据样本进行半非负矩阵分解的过程中,有效利用了训练声音样本数据的类别信息,并增加了稀疏约束,因而能得到更具区分性的低维声音特征,解决了半非负矩阵分解无法利用训练数据类别以及施加稀疏约束的问题,提高了声音数据分类方法的正确率。
图1为本发明的一种基于约束半非负矩阵分解的声音分类方法的工作流程图。
为使本发明的目的、技术方案和优点更加清楚,下面将结合本发明实施例,对本发明的技术方案作进一步清楚、完整地描述。需要说明的是,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
实施例
如图1所示,一种基于约束半非负矩阵分解的声音分类方法包括以下步骤:
S1,将训练声音数据样本和测试声音数据样本表示为半非负矩阵X;
S2,根据半非负矩阵X构建类别约束矩阵U,并根据半非负矩阵X构建稀疏约束矩阵S;
S3,在类别约束和稀疏约束下,对半非负矩阵X进行约束半非负矩阵分解,得到对应的系数矩阵H;
S4,将系数矩阵H中对应于训练声音数据样本的低维表示以及训练声音数据样本的类别信息作为训练数据,对分类模型进行训练得到分类器R;
S5,将系数矩阵H中对应于测试声音数据样本的低维表示输入分类器R,输 出测试声音数据样本的分类结果。
于本实施例中,本发明所述的半非负矩阵表示矩阵的元素中有正值也有负值,本发明所述的非负矩阵表示矩阵的元素的值没有负数。
于本实施例中,进一步地,S1所述的将训练声音数据样本和测试声音数据样本表示为半非负矩阵X,按如下步骤进行:
S11,对训练声音数据样本和测试声音数据样本进行幅值归一化,使得每个样本的幅值在[-1,1];
S12,将每个训练声音数据样本表示成一个M维的列向量,记为x
i(i=1,2,…,N1),其中N1为训练声音数据样本的个数;并将每个测试声音数据样本表示成一个M维的列向量,记为x
j(j=1,2,…,N2),其中N2为测试声音数据样本的个数;
S13,将x
i和x
j排列成半非负矩阵X(M行N列),X被记为x
k(k=1,2,…,N;N=N1+N2),其中前N1列是类别已知的训练样本(x
1…x
N1),剩下的N2列(N2=N-N1)是类别未知的测试样本(x
N1+1…x
N)。
于本实施例中,进一步地,S2所述的根据半非负矩阵X构建类别约束矩阵U,按如下步骤进行:
S201,声音数据样本包含B类,每个声音数据样本属于一个类别,根据半非负矩阵X中的训练样本构建一个N1行B列的矩阵C,矩阵C记为c
i,b(i=1,2,…,N1;b=1,2,…,B);当训练样本x
i是第b类时,c
i,b=1,其余c
i,b=0;
S202,构建N行(B+N2)列的类别约束矩阵U如下
其中,O表示零矩阵(零矩阵的元素全为0),I
N2是一个N2行N2列的单 位矩阵(单位矩阵的对角线元素全为1,其余元素全为0)。
进一步地,S2所述的根据半非负矩阵X构建稀疏约束矩阵S,按如下方式进行:
每个声音数据样本经过约束半非负矩阵分解算法降维后,其维度由M维变为M′维,则构建稀疏约束矩阵S如下
于公式(1)中,θ为稀疏度参数(该稀疏度参数可设置),其范围为0≤θ≤1;I
M′是一个M′行M′列的单位矩阵;l是一个元素全为1、维数为M′的列向量;l
T为l的转置(l的上标T表示对l进行转置)。
于本实施例中,进一步地,S3所述的在类别约束和稀疏约束下,对半非负矩阵X进行约束半非负矩阵分解,得到对应的系数矩阵H,按如下步骤进行:
S31,构造约束半非负矩阵分解的目标函数Γ
于公式(2)中,
表示矩阵的Frobenius范数;W表示约束半非负矩阵分解的基矩阵,W=[w
1,w
2,…,w
M′]是一个半非负矩阵,w
i(i=1,2,…,M′)表示一个M维的列向量;U是类别约束矩阵;S是稀疏约束矩阵;Z是一个非负矩阵,非负矩阵Z是一个(B+N2)行M′列的矩阵;(UZ)
T为(UZ)的转置;
S32,将矩阵Z的所有元素的值初始化为(0,1)之间的随机正数;
S33,计算基矩阵W的初始值为
于公式(3)中,U为类别约束矩阵;S为稀疏约束矩阵;Z为一个非负矩阵,非负矩阵Z为一个(P+N2)行M′列的矩阵;X为半非负矩阵;S
T为S的转置; Z
T为Z的转置;U
T为U的转置;
S34,设定约束半非负矩阵分解的目标函数Γ的最小值Γ
min、稀疏度参数θ、降维后的维度M′的值;
S35,将矩阵Z和基矩阵W交替迭代更新:先迭代更新一次矩阵Z,然后迭代更新一次基矩阵W,如此循环往复的先后迭代更新矩阵Z和基矩阵W;利用公式
迭代更新矩阵Z中元素,利用公式
迭代更新基矩阵W中的元素;
于公式(4)及公式(5)中,U为类别约束矩阵;S为稀疏约束矩阵;Z为一个非负矩阵;X为半非负矩阵;W为一个半非负矩阵;S
T为S的转置;Z
T为Z的转置;U
T为U的转置;W
T为W的转置;
S36,设定最大迭代次数E
max,每次迭代更新完成后计算目标函数Γ的值,当目标函数Γ的值小于Γ
min或者迭代次数达到E
max时,则停止迭代,得到最终的基矩阵W和矩阵Z;
S37,计算约束半非负矩阵分解的系数矩阵H
H=(UZ)
T (6)
于公式(6)中,H=[h
1;h
2;…;h
N]表示约束半非负矩阵分解的系数矩阵,h
i(i=1,2,…,N)为一个M′维的行向量;U为类别约束矩阵;Z为一个非负矩阵;(UZ)
T为(UZ)的转置。
于本实施例中,进一步地,S4所述的将系数矩阵H中对应于训练声音数据样本的低维表示以及训练声音数据样本的类别信息作为训练数据,对分类模型 进行训练得到分类器R,按如下步骤进行:
S41,系数矩阵H中的前N1行是训练声音数据样本的低维表示,记为HT,HT=[ht
1;ht
2;…;ht
N1],ht
i(i=1,2,…,N1)是维度为M′的行向量;
S42,训练声音数据样本的类别信息表示为矩阵A,该矩阵A记为a
i,b(i=1,2,…,N1;b=1,2,…,B),当ht
i对应的样本属于第b类时,a
i,b=1,其余a
i,b=0;
S43,选定分类模型,分类模型记为MW,将ht
i作为分类模型MW的输入、a
i,b作为分类模型MW的输出,对分类模型MW进行训练,得到分类器R。
于本实施例中,进一步地,S5所述的将系数矩阵H中对应于测试声音数据样本的低维表示输入分类器R,输出测试声音数据样本的分类结果,按如下步骤进行:
S51,系数矩阵H中的(N1+1)~N行(共N2行)是测试声音数据样本的低维表示,记为HC,HC=[hc
1;hc
2;…;hc
N2],hc
j(j=1,2,…,N2)是维度为M′的行向量;
S52,将hc
j输入分类器R,分类器R的输出即是对应的测试样本的分类结果。
于本实施例中,进一步地,所述分类模型MW选用最近邻分类器或支持向量机。
于本实施例中,进一步地,对本发明所述的矩阵Z和基矩阵W的迭代更新公式推导如下。
约束半非负矩阵的数学模型表示为:
X≈WS(UZ)
T
其中:X是半非负矩阵;W是约束半非负矩阵分解的基矩阵;U是类别约束矩阵;S是稀疏约束矩阵;Z是非负矩阵。
采用Frobenius范数作为约束半非负矩阵的目标函数
按照矩阵的迹对目标函数Γ进行展开,可得
Γ=Tr((X-WSZ
TU
T)
T(X-WSZ
TU
T))
=Tr((X
T-UZS
TW
T)(X-WSZ
TU
T))
=Tr(X
TX-X
TWSZ
TU
T-UZS
TW
TX+UZS
TW
TWSZ
TU
T)
=Tr(X
TX)-2Tr(X
TWSZ
TU
T)+Tr(UZS
TW
TWSZ
TU
T)
结合上式,对目标函数Γ中的W和Z求偏导,可得
(X
TW)
+和(X
TW)
-分别表示矩阵(X
TW)的非负数部分和负数部分,即(X
TW)
+=(|X
TW|+X
TW)/2,(X
TW)
-=(|X
TW|-X
TW)/2,其中||表示对矩阵取绝对值;(W
TW)
+和(W
TW)
-分别表示矩阵(W
TW)的非负数部分和负数部分,即(W
TW)
+=(|W
TW|+W
TW)/2,(W
TW)
-=(|W
TW|-W
TW)/2。因此,X
TW=(X
TW)
+-(X
TW)
-,以及W
TW=(W
TW)
+-(W
TW)
-,则有
U
T(X
TW)
-S+U
TUZS
T(W
TW)
+S=U
T(X
TW)
+S+U
TUZS
T(W
TW)
-S
Z是一个非负矩阵,为了在对其进行迭代更新的过程中,保持其非负性,在上式等号两边分别乘以Z
2,则有
Z
2[U
T(X
TW)
-S+U
TUS
TZ(W
TW)
+S]=Z
2[U
T(X
TW)
+S+U
TUZS
T(W
TW)
-S]
最终得到矩阵Z的迭代更新公式
由于矩阵Z的所有元素的初始值是(0,1)之间的随机正数,上式能确保在对矩阵Z进行迭代更新的过程中,让矩阵Z保持非负性。
一种基于约束半非负矩阵分解的声音分类装置包括:
处理器;
存储器,耦合至所述的处理器并存储有指令,所述的指令在由所述处理器执行实现如上所述的基于约束半非负矩阵分解的声音分类方法的步骤。
进一步地,该装置获取训练声音数据样本和测试声音数据样本。
一种计算机可读取存储介质存储有基于约束半非负矩阵分解的声音分类方法的应用程序,所述应用程序实现如上所述的基于约束半非负矩阵分解的声音分类方法的步骤。
于本实施例中,本发明的效果可以通过以下仿真实验进一步说明:
1)、实验数据
实验数据样本来自于罐装食品振动所发出的声音。使罐装食品振动并发出声音的方式为:采用高能电磁脉冲信号激励罐盖,使得罐盖振动并发出声音。此声音信号能反应罐内压力。共采集了72个某种罐装食品的声音信号,其中罐内压力合格的产品有36个,罐内压力过大的产品有15个,罐内压力过小的产品有21个。
2)、仿真条件
本发明的仿真使用Matlab9.2.0仿真软件,将约束半非负矩阵分解算法的稀疏度参数θ分别设为0.0、0.1和0.3,目标函数Γ的最小值Γ
min设为0.0001,最大迭代次数E
max为100次,分类模型MW选用最近邻分类器,每次实验均重复独立运行5次,取其平均值作为最终结果。
3)、仿真实验结果
对罐内压力合格和罐内压力过大的产品所对应的声音数据分别采用本发明所述的约束半非负矩阵分解方法和传统半非负矩阵分解方法进行罐内压力分类实验,实验结果如表1所示。
表1
对罐内压力合格和罐内压力过小的产品所对应的声音数据分别采用本发明所述的约束半非负矩阵分解方法和传统半非负矩阵分解方法进行罐内压力分类实验,实验结果如表2所示。
表2
以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。
Claims (11)
- 一种基于约束半非负矩阵分解的声音分类方法,其特征在于,该声音分类方法包括以下步骤:S1,将训练声音数据样本和测试声音数据样本表示为半非负矩阵X;S2,根据半非负矩阵X构建类别约束矩阵U,并根据半非负矩阵X构建稀疏约束矩阵S;S3,在类别约束和稀疏约束下,对半非负矩阵X进行约束半非负矩阵分解,得到对应的系数矩阵H;S4,将系数矩阵H中对应于训练声音数据样本的低维表示以及训练声音数据样本的类别信息作为训练数据,对分类模型进行训练得到分类器R;S5,将系数矩阵H中对应于测试声音数据样本的低维表示输入分类器R,输出测试声音数据样本的分类结果。
- 根据权利要求1所述的基于约束半非负矩阵分解的声音分类方法,其特征在于,S1所述的将训练声音数据样本和测试声音数据样本表示为半非负矩阵X,按如下步骤进行:S11,对训练声音数据样本和测试声音数据样本进行幅值归一化,使得每个样本的幅值在[-1,1];S12,将每个训练声音数据样本表示成一个M维的列向量,记为x i(i=1,2,…,N1),其中N1为训练声音数据样本的个数;并将每个测试声音数据样本表示成一个M维的列向量,记为x j(j=1,2,…,N2),其中N2为测试声音数据样本的个数;S13,将x i和x j排列成半非负矩阵X(M行N列),X被记为x k(k=1,2,…,N;N=N1+N2),其中前N1列是类别已知的训练样本(x 1…x N1),剩下的N2列(N2=N-N1)是类别未知的测试样本(x N1+1…x N)。
- 根据权利要求1所述的基于约束半非负矩阵分解的声音分类方法,其特征在于,S3所述的在类别约束和稀疏约束下,对半非负矩阵X进行约束半非负矩阵分解,得到对应的系数矩阵H,按如下步骤进行:S31,构造约束半非负矩阵分解的目标函数Γ于公式(2)中, 表示矩阵的Frobenius范数;W表示约束半非负矩阵分解的基矩阵,W=[w 1,w 2,…,w M′]是一个半非负矩阵,w i(i=1,2,…,M′)表示一个M维 的列向量;U是类别约束矩阵;S是稀疏约束矩阵;Z是一个非负矩阵,非负矩阵Z是一个(B+N2)行M′列的矩阵;(UZ) T为(UZ)的转置;S32,将矩阵Z的所有元素的值初始化为(0,1)之间的随机正数;S33,计算基矩阵W的初始值为于公式(3)中,U为类别约束矩阵;S为稀疏约束矩阵;Z为一个非负矩阵,非负矩阵Z为一个(P+N2)行M′列的矩阵;X为半非负矩阵;S T为S的转置;Z T为Z的转置;U T为U的转置;S34,设定约束半非负矩阵分解的目标函数Γ的最小值Γ min、稀疏度参数θ、降维后的维度M′的值;S35,将矩阵Z和基矩阵W交替迭代更新:先迭代更新一次矩阵Z,然后迭代更新一次基矩阵W,如此循环往复的先后迭代更新矩阵Z和基矩阵W;利用公式 迭代更新矩阵Z中元素,利用公式 迭代更新基矩阵W中的元素;于公式(4)及公式(5)中,U为类别约束矩阵;S为稀疏约束矩阵;Z为一个非负矩阵;X为半非负矩阵;W为一个半非负矩阵;S T为S的转置;Z T为Z的转置;U T为U的转置;W T为W的转置;S36,设定最大迭代次数E max,每次迭代更新完成后计算目标函数Γ的值,当目标函数Γ的值小于Γ min或者迭代次数达到E max时,则停止迭代,得到最终的基矩阵W和矩阵Z;S37,计算约束半非负矩阵分解的系数矩阵HH=(UZ) T (6)于公式(6)中,H=[h 1;h 2;…;h N]表示约束半非负矩阵分解的系数矩阵,h i(i=1,2,…,N)为一个M′维的行向量;U为类别约束矩阵;Z为一个非负矩阵;(UZ) T为(UZ)的转置。
- 根据权利要求1所述的基于约束半非负矩阵分解的声音分类方法,其特征在于,S4所述的将系数矩阵H中对应于训练声音数据样本的低维表示以及训练声音数据样本的类别信息作为训练数据,对分类模型进行训练得到分类器R,按如下步骤进行:S41,系数矩阵H中的前N1行是训练声音数据样本的低维表示,记为HT,HT=[ht 1;ht 2;…;ht N1],ht i(i=1,2,…,N1)是维度为M′的行向量;S42,训练声音数据样本的类别信息表示为矩阵A,该矩阵A记为a i,b(i=1,2,…,N1;b=1,2,…,B),当ht i对应的样本属于第b类时,a i,b=1,其余a i,b=0;S43,选定分类模型,分类模型记为MW,将ht i作为分类模型MW的输入、a i,b作为分类模型MW的输出,对分类模型MW进行训练,得到分类器R。
- 根据权利要求1所述的基于约束半非负矩阵分解的声音分类方法,其特征在于,S5所述的将系数矩阵H中对应于测试声音数据样本的低维表示输入分类器R,输出测试声音数据样本的分类结果,按如下步骤进行:S51,系数矩阵H中的(N1+1)~N行(共N2行)是测试声音数据样本的低维表示,记为HC,HC=[hc 1;hc 2;…;hc N2],hc j(j=1,2,…,N2)是维度为M′的行向量;S52,将hc j输入分类器R,分类器R的输出即是对应的测试样本的分类结 果。
- 根据权利要求6所述的基于约束半非负矩阵分解的声音分类方法,其特征在于,所述分类模型MW选用最近邻分类器或支持向量机。
- 一种基于约束半非负矩阵分解的声音分类装置,其特征在于,包括:处理器;存储器,耦合至所述的处理器并存储有指令,所述的指令在由所述处理器执行实现权利要求1至8中任一项所述的基于约束半非负矩阵分解的声音分类方法的步骤。
- 根据权利要求9所述的基于约束半非负矩阵分解的声音分类装置,其特征在于,该装置获取训练声音数据样本和测试声音数据样本。
- 一种计算机可读取存储介质,其特征在于,所述计算机可读取存储介质存储有基于约束半非负矩阵分解的声音分类方法的应用程序,所述应用程序实现如权利要求1至8中任一项所述的基于约束半非负矩阵分解的声音分类方法的步骤。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/119894 WO2020113575A1 (zh) | 2018-12-07 | 2018-12-07 | 基于约束半非负矩阵分解的声音分类方法、装置及介质 |
CN201880089090.2A CN111837185B (zh) | 2018-12-07 | 2018-12-07 | 基于约束半非负矩阵分解的声音分类方法、装置及介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/119894 WO2020113575A1 (zh) | 2018-12-07 | 2018-12-07 | 基于约束半非负矩阵分解的声音分类方法、装置及介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020113575A1 true WO2020113575A1 (zh) | 2020-06-11 |
Family
ID=70973434
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/119894 WO2020113575A1 (zh) | 2018-12-07 | 2018-12-07 | 基于约束半非负矩阵分解的声音分类方法、装置及介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111837185B (zh) |
WO (1) | WO2020113575A1 (zh) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112735382B (zh) * | 2020-12-22 | 2024-02-02 | 北京声智科技有限公司 | 音频数据处理方法、装置、电子设备及可读存储介质 |
CN117765926B (zh) * | 2024-02-19 | 2024-05-14 | 上海蜜度科技股份有限公司 | 语音合成方法、系统、电子设备及介质 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070068260A1 (en) * | 2005-09-26 | 2007-03-29 | Korea Research Institute Of Standards And Science | Pressure measuring system for vacuum chamber using ultrasonic wave |
CN103230880A (zh) * | 2013-03-28 | 2013-08-07 | 广州坚诺机械设备有限公司 | 一种容器真空度快速无损检测设备及检测方法 |
CN103559888A (zh) * | 2013-11-07 | 2014-02-05 | 航空电子系统综合技术重点实验室 | 基于非负低秩和稀疏矩阵分解原理的语音增强方法 |
CN104655425A (zh) * | 2015-03-06 | 2015-05-27 | 重庆大学 | 基于稀疏表示和大间隔分布学习的轴承故障分类诊断方法 |
CN104732535A (zh) * | 2015-03-18 | 2015-06-24 | 河海大学 | 一种约束稀疏的非负矩阵分解方法 |
CN204855086U (zh) * | 2015-05-04 | 2015-12-09 | 周飞龙 | 真空度自动检测装置 |
CN106289508A (zh) * | 2016-07-19 | 2017-01-04 | 西南交通大学 | 一种面向机械故障诊断的振动信号重构方法 |
CN108899048A (zh) * | 2018-05-10 | 2018-11-27 | 广东省智能制造研究所 | 一种基于信号时频分解的声音数据分类方法 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5034469B2 (ja) * | 2006-12-08 | 2012-09-26 | ソニー株式会社 | 情報処理装置および情報処理方法、並びに、プログラム |
WO2010138536A1 (en) * | 2009-05-27 | 2010-12-02 | Yin Zhang | Method and apparatus for spatio-temporal compressive sensing |
CN103871423A (zh) * | 2012-12-13 | 2014-06-18 | 上海八方视界网络科技有限公司 | 一种基于nmf非负矩阵分解的音频分离方法 |
CN105355212B (zh) * | 2015-10-14 | 2019-03-05 | 天津大学 | 一种稳健的欠定盲分离源数及混合矩阵估计方法及装置 |
-
2018
- 2018-12-07 CN CN201880089090.2A patent/CN111837185B/zh active Active
- 2018-12-07 WO PCT/CN2018/119894 patent/WO2020113575A1/zh active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070068260A1 (en) * | 2005-09-26 | 2007-03-29 | Korea Research Institute Of Standards And Science | Pressure measuring system for vacuum chamber using ultrasonic wave |
CN103230880A (zh) * | 2013-03-28 | 2013-08-07 | 广州坚诺机械设备有限公司 | 一种容器真空度快速无损检测设备及检测方法 |
CN103559888A (zh) * | 2013-11-07 | 2014-02-05 | 航空电子系统综合技术重点实验室 | 基于非负低秩和稀疏矩阵分解原理的语音增强方法 |
CN104655425A (zh) * | 2015-03-06 | 2015-05-27 | 重庆大学 | 基于稀疏表示和大间隔分布学习的轴承故障分类诊断方法 |
CN104732535A (zh) * | 2015-03-18 | 2015-06-24 | 河海大学 | 一种约束稀疏的非负矩阵分解方法 |
CN204855086U (zh) * | 2015-05-04 | 2015-12-09 | 周飞龙 | 真空度自动检测装置 |
CN106289508A (zh) * | 2016-07-19 | 2017-01-04 | 西南交通大学 | 一种面向机械故障诊断的振动信号重构方法 |
CN108899048A (zh) * | 2018-05-10 | 2018-11-27 | 广东省智能制造研究所 | 一种基于信号时频分解的声音数据分类方法 |
Also Published As
Publication number | Publication date |
---|---|
CN111837185B (zh) | 2024-03-12 |
CN111837185A (zh) | 2020-10-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105023573B (zh) | 使用听觉注意力线索的语音音节/元音/音素边界检测 | |
US11322155B2 (en) | Method and apparatus for establishing voiceprint model, computer device, and storage medium | |
CN110400579B (zh) | 基于方向自注意力机制和双向长短时网络的语音情感识别 | |
WO2020173133A1 (zh) | 情感识别模型的训练方法、情感识别方法、装置、设备及存储介质 | |
WO2018107810A1 (zh) | 声纹识别方法、装置、电子设备及介质 | |
US20210193149A1 (en) | Method, apparatus and device for voiceprint recognition, and medium | |
CN105976809A (zh) | 基于语音和面部表情的双模态情感融合的识别方法及系统 | |
CN108899049A (zh) | 一种基于卷积神经网络的语音情感识别方法及系统 | |
WO2019227574A1 (zh) | 语音模型训练方法、语音识别方法、装置、设备及介质 | |
CN108564965B (zh) | 一种抗噪语音识别系统 | |
WO2019237518A1 (zh) | 模型库建立方法、语音识别方法、装置、设备及介质 | |
Noroozi et al. | Supervised vocal-based emotion recognition using multiclass support vector machine, random forests, and adaboost | |
WO2020113575A1 (zh) | 基于约束半非负矩阵分解的声音分类方法、装置及介质 | |
Wang et al. | Spectral-temporal receptive fields and MFCC balanced feature extraction for robust speaker recognition | |
CN112053694A (zh) | 一种基于cnn与gru网络融合的声纹识别方法 | |
CN109036468A (zh) | 基于深度信念网络和核非线性psvm的语音情感识别方法 | |
CN109452932A (zh) | 一种基于声音的体质辨识方法及设备 | |
Mini et al. | EEG based direct speech BCI system using a fusion of SMRT and MFCC/LPCC features with ANN classifier | |
Zheng et al. | MSRANet: Learning discriminative embeddings for speaker verification via channel and spatial attention mechanism in alterable scenarios | |
Islam et al. | Noise-robust text-dependent speaker identification using cochlear models | |
Ma et al. | Application of time-frequency domain and deep learning fusion feature in non-invasive diagnosis of congenital heart disease-related pulmonary arterial hypertension | |
CN106297768B (zh) | 一种语音识别方法 | |
CN112052880A (zh) | 一种基于更新权值支持向量机的水声目标识别方法 | |
CN110838294A (zh) | 一种语音验证方法、装置、计算机设备及存储介质 | |
Dehghani et al. | Time-frequency localization using deep convolutional maxout neural network in Persian speech recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18942391 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18942391 Country of ref document: EP Kind code of ref document: A1 |