CN117935859A

CN117935859A - Multi-label music style feature selection method integrating feature similarity

Info

Publication number: CN117935859A
Application number: CN202410107121.XA
Authority: CN
Inventors: 杨涛; 刘海波; 马希骜
Original assignee: Zhejiang Gongshang University
Current assignee: Zhejiang Gongshang University
Priority date: 2024-01-25
Filing date: 2024-01-25
Publication date: 2024-04-26

Abstract

The invention belongs to the field of machine learning and pattern recognition, and relates to a multi-label music style feature selection method integrating feature similarity. The invention combines feature relativity to construct a multi-label music style feature selection method, which divides candidate music style labels into real music style labels and noise music style labels, then respectively utilizes feature similarity to limit coefficient matrix W mapped to the real labels and coefficient matrix S mapped to the noise labels, utilizes a multi-label classifier to limit sum matrix H of W and S, and finally, utilizes the W matrix to score the features to select related music style features. The problem difficulty is reduced, and the performance of the multi-label k nearest neighbor (MLKNN) model is improved.

Description

Multi-label music style feature selection method integrating feature similarity

技术领域Technical Field

本发明属于机器学习和模式识别领域，涉及一种融合特征相似性的偏多标签音乐风格特征选择方法。The invention belongs to the field of machine learning and pattern recognition, and relates to a method for selecting multi-label music style features by integrating feature similarity.

背景技术Background technique

音乐风格分类是一种对音乐作品进行归纳和区分的方式，通过特定的音乐元素和风格特征来划分不同的音乐类型。这种分类系统有助于理解和描述音乐的多样性，帮助听众更好地选择自己喜欢的音乐。在音乐风格分类中，有许多不同的流派和类型，每一种都有独特的声音和表达方式。古典音乐是一种通过复杂的结构和精致的编曲来表达情感和思想的音乐形式，而流行音乐则更注重通俗、易懂的旋律和歌词，是大众喜爱的主流音乐。此外，爵士乐以其复杂的和声和即兴演奏而著称，电子音乐则通过电子设备和技术创造出富有未来感的声音。民谣音乐通常以吉他为主，强调故事性和真实感，而摇滚音乐则以强烈的节奏和吉他演奏为特色，表达对社会和个体的态度。总体而言，音乐风格分类是一个丰富多彩的领域，反映了不同文化、时代和个体对音乐的独特理解和创造。这种分类系统帮助我们更好地欣赏和理解音乐的多元性，使我们能够在广阔的音乐世界中找到自己喜欢的声音。音乐风格分类将音乐按照其风格添加对应的标签，通过音乐风格标签，音乐平台能更好地将音乐推荐到感兴趣的用户，提升用户的体验感。音乐风格分类中也常用到多标签分类，在多标签音乐风格分类中，一首歌可能被标注上“金属”“朋克”“摇滚”“流行”等风格。在实际的数据集搜集过程中，往往不能得到完全准确无误的风格标签。因为实际收集的数据大多从网上爬取，而网络中的标记者通常会出现一些不可靠的标注者，不可避免导致一些标注出错。这就意味着，数据收集时在网上爬取的一些标签存在一些人为的因素导致一首歌标上不属于它的风格，仅其中某些标注者给出的标记是有效的。例如，一首歌或一段音乐，在网络上的标签可能有七个标签，分别是“民谣”“电子”“流行”“独立”“爵士”和“古典”。但是，一个专业人士仔细辨认这段音乐可以发现这些标签有许多错误。其中，只有“民谣”“电子”“流行”和“独立”是有效的标注。在含有错误标签的多标签的音乐风格分类数据集中进行分类，我们将其称为偏多标签音乐风格分类。Music style classification is a way to summarize and distinguish musical works, dividing different types of music by specific musical elements and style characteristics. This classification system helps to understand and describe the diversity of music and helps listeners better choose the music they like. In the music style classification, there are many different genres and types, each with a unique sound and expression. Classical music is a form of music that expresses emotions and thoughts through complex structures and exquisite arrangements, while pop music focuses more on popular and easy-to-understand melodies and lyrics, and is the mainstream music loved by the public. In addition, jazz is known for its complex harmony and improvisation, and electronic music creates futuristic sounds through electronic devices and technology. Folk music is usually guitar-based and emphasizes storytelling and authenticity, while rock music features strong rhythms and guitar playing to express attitudes towards society and individuals. Overall, music style classification is a rich and colorful field that reflects the unique understanding and creation of music by different cultures, eras, and individuals. This classification system helps us better appreciate and understand the diversity of music, allowing us to find our favorite sounds in the vast world of music. Music style classification adds corresponding labels to music according to its style. Through music style labels, music platforms can better recommend music to interested users and improve the user experience. Multi-label classification is also commonly used in music style classification. In multi-label music style classification, a song may be labeled with styles such as "metal", "punk", "rock", and "pop". In the actual data set collection process, it is often impossible to obtain completely accurate style labels. Because most of the actual collected data is crawled from the Internet, and there are usually some unreliable labelers in the network, which inevitably leads to some labeling errors. This means that some labels crawled from the Internet during data collection have some human factors that cause a song to be labeled with a style that does not belong to it, and only the labels given by some of the labelers are valid. For example, a song or a piece of music may have seven labels on the Internet, namely "folk", "electronic", "pop", "independent", "jazz" and "classical". However, a professional who carefully identifies this piece of music can find that there are many errors in these labels. Among them, only "folk", "electronic", "pop" and "independent" are valid labels. We call classification in a multi-label music style classification dataset with incorrect labels as partial multi-label music style classification.

在传统的音乐风格分类多标签学习任务中，使用特征选择方法对抽取的音乐特征进行缩减后，往往能有效提高学习算法的准确度。而以往的理论和实践都表明使用合适的特征选择方法能很好地降低学习任务难度并提高模型的准确率。多标签特征选择技术通过减少冗余的音乐特征，去除无关分类任务的音乐特征，降低学习器消耗的时间和硬件资源。并且由于无关和冗余特征被消除，去除了这些特征对多标签学习器的影响，学习性能得到改善。In the traditional multi-label learning task of music style classification, the accuracy of the learning algorithm can often be effectively improved by reducing the extracted music features using feature selection methods. Previous theories and practices have shown that using appropriate feature selection methods can effectively reduce the difficulty of learning tasks and improve the accuracy of the model. Multi-label feature selection technology reduces the time and hardware resources consumed by the learner by reducing redundant music features and removing music features that are irrelevant to the classification task. And because irrelevant and redundant features are eliminated, the influence of these features on the multi-label learner is removed, and the learning performance is improved.

在偏多标签音乐风格分类的场景下，直接利用多标签特征选择算法忽略了噪声标签对对特征选择结果的影响，导致错误选择与噪声音乐风格标签相关的特征。这样选择出来的特征缺乏可信度。现有的偏多标签特征选择方法较少，一种算法是将偏多标签的候选标签分为两部分，并分别通过核范数和和l₁范数限制从特征空间映射到标签空间的系数矩阵，然后使用流形正则让相似的实例有相似的标签。In the scenario of multi-label music style classification, directly using the multi-label feature selection algorithm ignores the impact of noise labels on the feature selection results, resulting in the wrong selection of features related to noise music style labels. The features selected in this way lack credibility. There are few existing multi-label feature selection methods. One algorithm is to divide the candidate labels of multi-label into two parts, and respectively map the coefficient matrix from the feature space to the label space through the nuclear norm and l ₁ norm restrictions, and then use the manifold regularization to make similar instances have similar labels.

发明内容Summary of the invention

本发明针对目前的技术不足，提出一种融合特征相似性的偏多标签音乐风格特征选择方法。In view of the current technical deficiencies, the present invention proposes a multi-label music style feature selection method integrating feature similarity.

本发明的具体步骤如下：The specific steps of the present invention are as follows:

步骤1：抽取音乐特征获取偏多标签音乐分类数据集M并指定的特征子集维度K，其中，集合M中有n个音乐样本，q个标签，d个音乐特征；Step 1: Extract music features to obtain a multi-label music classification dataset M and specify the feature subset dimension K, where the set M contains n music samples, q labels, and d music features;

步骤2：对偏多标签音乐分类数据集M进行划分，将偏多标签音乐分类数据集M分为训练样本集MT和测试样本集MP，这里，使用X表示训练样本集的特征矩阵，(X)_ij表示第i个样本的第j个特征值，Y表示训练样本的候选标签指示矩阵，(Y)_ij表示第j个样本的候选标签是否存在第i个标签，1则存在，0则不存在；Step 2: Divide the multi-label music classification dataset M into a training sample set MT and a test sample set MP. Here, X represents the feature matrix of the training sample set, (X) _ij represents the jth eigenvalue of the i-th sample, Y represents the candidate label indicator matrix of the training sample, and (Y) _ij represents whether the candidate label of the j-th sample has the i-th label, which is 1 if it exists and 0 if it does not exist.

步骤3：计算每个样本x_g的局部格拉姆矩阵表示特征相似性。样本x_g的局部格拉姆W_g∈R^dxd矩阵计算方式如下：Step 3: Calculate the local Gram matrix of each sample _xg to represent feature similarity. The local Gram matrix _Wg∈Rdxd of sample _xg ^is calculated as follows:

其中，δ是邻域粒度的阈值，Δ是距离度量公式。in, δ is the threshold of neighborhood granularity, and Δ is the distance metric formula.

步骤4：定义一个偏多标签分类器，目标函数如下Step 4: Define a partial multi-label classifier with the following objective function

其中，W和S分别是映射到真实标签和噪声标签的映射矩阵。Among them, W and S are the mapping matrices mapped to the true label and the noise label, respectively.

步骤5：将特征相似性分别限制映射矩阵W和S，这里使用流行正则的思想，定义最终目标函数Step 5: Restrict the feature similarity to the mapping matrices W and S respectively. Here, we use the idea of popular regularization to define the final objective function

其中，L_g＝D_g+W_g，D_g是W_g的每一行元素相加形成的对角矩阵。Wherein, L _g = D _g + W _g , and D _g is a diagonal matrix formed by adding the elements of each row of W _g .

步骤7：使用交替求解法最小化目标函数求解W和S，对W得每一个列向量计算二范数得到特征的打分值，然后选取最大的K个特征。Step 7: Use the alternating solution method to minimize the objective function to solve W and S, calculate the bi-norm of each column vector of W to get the feature score, and then select the largest K features.

步骤8：利用选取的K个特征对训练样本集MT和测试样本集MP进行降维，分别得到降维后训练样本集MT′和降维后的测试样本集MP′，然后将降维后的训练样本集MT′输入多标签k近邻(ML-KNN)模型进行训练，得到训练后的多标签k近邻模型(ML-KNN)模型。Step 8: Use the selected K features to reduce the dimensions of the training sample set MT and the test sample set MP, and obtain the reduced-dimensional training sample set MT′ and the reduced-dimensional test sample set MP′ respectively. Then, input the reduced-dimensional training sample set MT′ into the multi-label k-nearest neighbor (ML-KNN) model for training to obtain the trained multi-label k-nearest neighbor model (ML-KNN) model.

本发明的有益效果是：本发明融合特征相识性构造了一个偏多标签音乐风格特征选择方法，包括：对多标签数据集进行预处理，其中包括缺失值填充，数据离散化等；利用偏多标签音乐风格特征选择方法对处理过的数据集进行特征筛选，得到筛选后的特征集。将得到的特征数据集输入到多标签k近邻(MLKNN)模型中，得到数据集优化后的多标签k近邻(MLKNN)模型。本发明通过将候选音乐风格标签分为真实音乐风格标签和噪声音乐风格标签，然后分别利用特征相似性限制映射到真实标签的系数矩阵W和映射到噪声标签的系数矩阵S，并利用一个偏多标签分类器限制W和S的和矩阵H，最后，通过W矩阵给特征打分选择相关的音乐风格特征。其降低问题难度，提升了多标签k近邻(MLKNN)模型的性能。The beneficial effect of the present invention is as follows: the present invention integrates feature similarity to construct a partial multi-label music style feature selection method, including: preprocessing a multi-label data set, including missing value filling, data discretization, etc.; using the partial multi-label music style feature selection method to perform feature screening on the processed data set to obtain a screened feature set. The obtained feature data set is input into a multi-label k-nearest neighbor (MLKNN) model to obtain a multi-label k-nearest neighbor (MLKNN) model after the data set is optimized. The present invention divides the candidate music style labels into real music style labels and noise music style labels, and then uses feature similarity to restrict the coefficient matrix W mapped to the real label and the coefficient matrix S mapped to the noise label, respectively, and uses a partial multi-label classifier to restrict the sum matrix H of W and S, and finally, the W matrix is used to score the features and select relevant music style features. It reduces the difficulty of the problem and improves the performance of the multi-label k-nearest neighbor (MLKNN) model.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明的流程图。FIG. 1 is a flow chart of the present invention.

具体实施方式Detailed ways

下面对本发明的具体实施方式进行详细的说明。The specific implementation modes of the present invention are described in detail below.

步骤1：抽取音乐特征获取偏多标签音乐分类数据集M并指定的特征子集维度K，其中，集合M中有n个音乐风格样本，q个音乐风格标签，d个音乐特征；Step 1: Extract music features to obtain a multi-label music classification dataset M and specify the feature subset dimension K, where the set M contains n music style samples, q music style labels, and d music features;

步骤2：对偏多标签音乐风格分类数据集M进行划分，将偏多标签音乐风格分类数据集M分为训练样本集MT和测试样本集MP，这里，使用X表示训练样本集MT的特征矩阵，(X)_ij表示第i个样本的第j个特征值，Y表示训练样本的候选标签指示矩阵，(Y)_ij表示第j个样本的候选标签是否存在第i个标签，1则存在，0则不存在；Step 2: Divide the multi-label music style classification dataset M into a training sample set MT and a test sample set MP. Here, X represents the feature matrix of the training sample set MT, (X) _ij represents the jth eigenvalue of the i-th sample, Y represents the candidate label indicator matrix of the training sample, and (Y) _ij represents whether the candidate label of the j-th sample has the i-th label, 1 if it exists, and 0 if it does not exist;

步骤3：计算每个样本的局部格拉姆矩阵表示特征相似性。样本x_g的局部格拉姆W_g∈R^dxd矩阵计算方式如下：Step 3: Calculate the local Gram matrix of each sample to represent feature similarity. The local Gram matrix W _g ∈ R ^dxd of sample x _g is calculated as follows:

其中，δ是邻域粒度的阈值，Δ采用欧式距离。in, δ is the threshold of neighborhood granularity, and Δ uses Euclidean distance.

其中，L_g＝D_g+W_g，D_g是W_g的每一行元素相加形成的对角矩阵。λ，β，γ，η₁和η₂为平衡参数。使用交替求解法最小化目标函数求解W，对W得每一个列向量计算二范数得到特征得打分值，然后选取最大的K个特征。Where L _g = D _g + W _g , D _g is the diagonal matrix formed by adding the elements of each row of W _g . λ, β, γ, η ₁ and η ₂ are balance parameters. Use the alternating solution method to minimize the objective function to solve W, calculate the bi-norm of each column vector of W to obtain the feature score, and then select the largest K features.

步骤6：利用选取的K个特征对训练样本集MT和测试样本集MP进行降维，分别得到降维后训练样本集MT′和降维后的测试样本集MP′，然后将降维后的训练样本集MT′输入多标签k近邻(ML-KNN)模型进行训练，得到训练后的多标签k近邻模型(ML-KNN)模型。Step 6: Use the selected K features to reduce the dimensions of the training sample set MT and the test sample set MP, and obtain the reduced-dimensional training sample set MT′ and the reduced-dimensional test sample set MP′ respectively. Then, input the reduced-dimensional training sample set MT′ into the multi-label k-nearest neighbor (ML-KNN) model for training to obtain the trained multi-label k-nearest neighbor model (ML-KNN) model.

在步骤5中的λ，β，γ，η₁和η₂参数由交叉验证得到。The λ, β, γ, _η1 and _η2 parameters in step 5 are obtained by cross validation.

步骤5中求解的过程具体为：首先，利用换元法将目标函数转换为The specific solution process in step 5 is as follows: First, the objective function is converted into

利用LADMAP方法将上式转换为Using the LADMAP method, the above formula is converted into

固定W，S，P，Q，优化HFixed W, S, P, Q, optimized H

H_k+1＝(Y^TX+μ₁S_k+μ₁W_k-Y₁)(XX^T+λI+μ₁I)^-1 H _k+1 =(Y ^T X + μ ₁ S _k + μ ₁ W _k - Y ₁ )(XX ^T + λ I + μ ₁ I) ^-1

固定W，H，S，Q，优化PFixed W, H, S, Q, optimized P

固定W，H，S，P，优化QFixed W, H, S, P, optimized Q

固定H，P，Q，优化W和SFixed H, P, Q, optimized W and S

是一个对角矩阵，/> is a diagonal matrix, />

S_k+1＝soft_2γ/(μ₁+μ₂)(μ₃Q+Y₃+μ₁H-μ₁W+Y₁)，其中，soft_ξ(x)＝sign(x)max(|x|-ξ，0) _Sk+1 = soft _2γ /( _μ1 + _μ2 )( _μ3Q + _Y3 + _μ1H - _μ1W + _Y1 ), where soft _ξ (x) = sign(x)max(|x|-ξ, 0)

更新Y₁，Y₂，Y₃，μ₁，μ₂，μ₃ Update Y ₁ , Y ₂ , Y ₃ , μ ₁ , μ ₂ , μ ₃

其中μ_max是μ₁，μ₂，μ₃限定的最大值，ρ是增长系数。Wherein μ _max is the maximum value limited by μ ₁ , μ ₂ , and μ ₃ , and ρ is the growth coefficient.

验证例：Verification example:

为了验证本申请的有效性，我们在music_styles数据集上进行验证，music_styles数据集是一个偏标签音乐风格分类的数据集，它将音乐的风格进行标注预测。其中包括“古典”“爵士”“摇滚”等10个音乐风格标签，由6839段音乐构成，每段音乐98个特征并被标记为10个标签其中几个，网络收集的原始标签为候选标签，通过人工精细分辨后得到真实标签。In order to verify the effectiveness of this application, we verified it on the music_styles dataset, which is a dataset for partial label music style classification, which annotates and predicts the style of music. It includes 10 music style labels such as "classical", "jazz", and "rock", and consists of 6839 pieces of music. Each piece of music has 98 features and is marked as one of the 10 labels. The original labels collected by the network are candidate labels, and the real labels are obtained through manual fine resolution.

根据本发明的步骤，将λ设置为1，β设置为1，γ设置为0.1，η₁设置为0.01，η₂设置为0.01。此时输入的集合M为music_styles，输入的特征子集维度K为24。最后，选择的特征序号为{65，66，26，4，54，64，96，81，92，27，79，52，29，49，1628，58，1，5，95，55，57，9，93}，然后根据已选特征集合创建新训练集，最后使用新训练集来训练MLKNN分类器模型，得到模型MLKNN-FS。According to the steps of the present invention, λ is set to 1, β is set to 1, γ is set to 0.1, η ₁ is set to 0.01, and η ₂ is set to 0.01. At this time, the input set M is music_styles, and the input feature subset dimension K is 24. Finally, the selected feature numbers are {65, 66, 26, 4, 54, 64, 96, 81, 92, 27, 79, 52, 29, 49, 1628, 58, 1, 5, 95, 55, 57, 9, 93}, and then a new training set is created according to the selected feature set, and finally the new training set is used to train the MLKNN classifier model to obtain the model MLKNN-FS.

使用One Error、Ranking Loss、Coverage Error和Average Precision等作为评判多标签分类模型的标准。接下来对比验证实验，用完整训练集直接来训练MLKNN模型，不经过特征选择，得到模型MKLNN-ALL。代入测试集，得到MLKNN-ALL模型的四个指标。将以上数据汇聚成表格如下：One Error, Ranking Loss, Coverage Error, and Average Precision are used as criteria for judging multi-label classification models. Next, in the comparative verification experiment, the MLKNN model is trained directly with the complete training set without feature selection to obtain the model MKLNN-ALL. Substitute it into the test set to obtain the four indicators of the MLKNN-ALL model. The above data is aggregated into the following table:

表1模型MLKNN-ALL和模型MLKNN-FS预测结果四种指标对比Table 1 Comparison of four indicators of prediction results of model MLKNN-ALL and model MLKNN-FS

表1中Average Precision指标是越大越好，而Coverage Error、One Error和Ranking Loss这三者的指标是越小越好。实验结果表明MLKNN-FS分类器在多种指标上均比MLKNN-ALL分类器要好。这表明了本发明可以有效地提高多标签分类模型的性能。The Average Precision index in Table 1 is as large as possible, while the Coverage Error, One Error and Ranking Loss indexes are as small as possible. The experimental results show that the MLKNN-FS classifier is better than the MLKNN-ALL classifier in many indicators. This shows that the present invention can effectively improve the performance of the multi-label classification model.

Claims

1. A method for selecting multi-label music style features by integrating feature similarity, comprising the following steps:

Step 1: Extract music features to obtain a multi-label music classification dataset M and specify the number of selected features K, where the set M contains n music samples, q labels, and d music features;

Step 2: Divide the multi-label music classification dataset M into a training sample set MT and a test sample set MP. Here, X represents the feature matrix of the training sample set, (X) _ij represents the jth eigenvalue of the i-th sample, Y represents the candidate label indicator matrix of the training sample, and (Y) _ij represents whether the candidate label of the j-th sample has the i-th label, which is 1 if it exists and 0 if it does not exist.

Step 3: Calculate the local Gram matrix of each sample _xg to represent feature similarity; the local Gram matrix _Wg∈Rdxd of sample _xg ^is calculated as follows:

in, δ is the threshold of neighborhood granularity, Δ is the distance metric formula;

Step 4: Define a partial multi-label classifier with the following objective function

Among them, W and S are the mapping matrices mapped to the true label and the noise label respectively;

Step 5: Restrict the feature similarity to the mapping matrices W and S respectively. Here, we use the idea of popular regularization to define the final objective function

Where L _g = D _g + W _g , D _g is the diagonal matrix formed by adding the elements of each row of W _g ; λ, β, γ, η ₁ and η ₂ are balance parameters; the alternating solution method is used to minimize the objective function to solve W and S, the bi-norm of each column vector of W is calculated to obtain the feature score, and then the largest K features are selected;

Step 6: Use the selected K features to reduce the dimensions of the training sample set MT and the test sample set MP, and obtain the reduced-dimensional training sample set MT′ and the reduced-dimensional test sample set MP′ respectively. Then, input the reduced-dimensional training sample set MT′ into the multi-label k-nearest neighbor (ML-KNN) model for training to obtain the trained multi-label k-nearest neighbor model (ML-KNN) model.

2. According to the method for selecting multi-label music style features by integrating feature similarity according to claim 1, it is characterized in that in step 5

First, the objective function is transformed into

Using the LADMAP method, the above formula is converted into

Fixed W, S, P, Q, optimized H

H _k+1 =(Y ^T X + μ ₁ S _k + μ ₁ W _k - Y ₁ )(XX ^T + λ I + μ ₁ I) ^-1

Fixed W, H, S, Q, optimized P

Fixed W, H, S, P, optimized Q

Fixed H, P, Q, optimized W and S

is a diagonal matrix, />

where soft _ξ (x) = sign(x)max(|x|-ξ, 0)

Update Y ₁ , Y ₂ , Y ₃ , μ ₁ , μ ₂ , μ ₃

Wherein μ _max is the maximum value limited by μ ₁ , μ ₂ , and μ ₃ , and ρ is the growth coefficient.

3. According to the method for selecting multi-label music style features based on feature similarity fusion according to claim 1, it is characterized in that in step 6, the step of training the MLKNN classifier comprises:

The newly generated feature subset is input into the MLKNN model. At this time, the number of parameters k of the MLKNN model is 10, and other parameters remain default, and finally the optimized MLKNN model is obtained.