CN105069129A

CN105069129A - Self-adaptive multi-label prediction method

Info

Publication number: CN105069129A
Application number: CN201510501816.7A
Authority: CN
Inventors: 胡学钢; 王博岩; 李培培
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2015-06-24
Filing date: 2015-08-14
Publication date: 2015-11-18
Anticipated expiration: 2035-08-14
Also published as: CN104915436A; CN105069129B

Abstract

本发明公开了一种自适应多标签预测方法，其特征是按如下步骤进行：1、获得初始化示例集；2、获得初始化示例集中的领袖示例、局外示例和选民示例；3、获得选民示例集的所属聚类；4、采用支持向量机对预测示例进行粗分类；5、对预测示例进行多标签预测。本发明能准确地对网络信息加上标签，提高多标签预测的准确性、普适性、可解释性以及可移转性，从而实现大数据环境下智能信息分类和处理。The invention discloses an adaptive multi-label prediction method, which is characterized by following steps: 1. Obtain an initialization example set; 2. Obtain leader examples, outsider examples and voter examples in the initialization example set; 3. Obtain voter examples 4. Use the support vector machine to roughly classify the prediction examples; 5. Perform multi-label prediction on the prediction examples. The invention can accurately add labels to network information, improve the accuracy, universality, interpretability and transferability of multi-label prediction, thereby realizing intelligent information classification and processing in a big data environment.

Description

Adaptive Multi-label Prediction Method

技术领域technical field

本发明属于智能信息分类与处理领域，特别是涉及一种可应用于大数据环境下多媒体资讯的快速聚类及发现密度峰值点的自适应多标签预测方法。The invention belongs to the field of intelligent information classification and processing, and in particular relates to an adaptive multi-label prediction method applicable to fast clustering of multimedia information in a big data environment and finding density peak points.

背景技术Background technique

随着网络的快速发展，信息量正成几何趋势增长，当下的微博、论坛、微信、在线视频、网络购物和社交网络无一例外都需要标签来方便用户的搜索和分类，准确而详尽的标签一方面可让用户能够快速地找到其所需，另一方面商家也可以借助标签对用户进行分类，对不同的用户群推荐迎合其口味的产品，从而避免用户因浏览大量无关信息，使有价值的内容淹没在信息的海洋中。反之商家若是无法正确处理信息过载问题，将最终导致消费者的不断流失。With the rapid development of the network, the amount of information is increasing geometrically. The current Weibo, forums, WeChat, online video, online shopping and social networks all need tags to facilitate users' search and classification. Accurate and detailed tags On the one hand, it allows users to quickly find what they need. On the other hand, businesses can also use tags to classify users and recommend products that cater to their tastes for different user groups, thereby preventing users from browsing a large amount of irrelevant information and making valuable content is drowned in the ocean of information. Conversely, if merchants cannot properly deal with the problem of information overload, it will eventually lead to the continuous loss of consumers.

目前给信息加多标签的方法主要有将多标签分解转化为独立的单一标签进行标记和将多标签转化为标签间的排序来标记。转化为单一标签，将多标签之间的关联关系完全忽略，准确性低；标签间的排序不仅需要大量的计算，且确定标签的排序后，还需要进一步确定是该标签的前标签还是后标签相似程度更高，因此同样存在准确性不高的缺陷。At present, the methods of adding multi-labels to information mainly include decomposing multi-labels into independent single labels for labeling and transforming multi-labels into sorting between labels for labeling. Converting to a single label completely ignores the relationship between multiple labels, and the accuracy is low; the sorting between labels not only requires a lot of calculations, but also needs to further determine whether it is the previous label or the post label after the label is sorted The degree of similarity is higher, so there is also the defect of low accuracy.

相较于本发明，目前的处理方法存在以下缺点：Compared with the present invention, present processing method has following shortcoming:

1、目前的网络信息通过计算机的学习方法，对单一标签也就是识别问题做出的预测方法较多，但由于信息的多标签存在关联关系，因此利用分解多标签为单一多标签的方法，标签的准确性较低，不能达到实用的目的。1. The current network information uses computer learning methods to make predictions for a single label, that is, the recognition problem. However, due to the correlation between multiple labels in the information, the method of decomposing multiple labels into a single multiple label is used. The accuracy of the label is low and cannot achieve practical purposes.

2、目前的多标签预测技术往往只能对给定的静态数据集做出处理，如考虑新增信息，往往需要重新学习，重新设置参数，不能做到随数据的变化而自动调整参数，因此泛化性弱，普适性差。2. The current multi-label prediction technology can only deal with a given static data set. If new information is considered, it often needs to re-learn and reset parameters. It cannot automatically adjust parameters as the data changes. Therefore, Weak generalization and poor universality.

3、将信息的多标签预测转为标签间的序关系来处理，不仅需要大量的计算，且可解释性较差，预测的准确性也不高。3. Converting the multi-label prediction of information into the order relationship between labels not only requires a lot of calculations, but also has poor interpretability and low prediction accuracy.

4、现有的多标签预测技术多是以提高某一评价标注而设计的，忽略了其它标准，这造成了其可移植性差的特点，仅适合在满足某些条件的数据集中使用。4. Most of the existing multi-label prediction technologies are designed to improve a certain evaluation label, ignoring other standards, which results in poor portability and is only suitable for use in data sets that meet certain conditions.

发明内容Contents of the invention

本发明是为了克服现有技术存在的不足之处，提供一种自适应多标签预测方法，以期能准确地对网络信息加上标签，提高多标签预测的准确性、普适性、可解释性以及可移转性，从而实现大数据环境下智能信息分类和处理。The purpose of the present invention is to overcome the deficiencies of the existing technology and provide an adaptive multi-label prediction method, in order to accurately add labels to network information and improve the accuracy, universality and interpretability of multi-label prediction And transferability, so as to realize intelligent information classification and processing in the big data environment.

本发明为解决技术问题采用如下技术方案：The present invention adopts following technical scheme for solving technical problems:

本发明一种自适应多标签预测方法的特点是按如下步骤进行：The feature of a kind of self-adaptive multi-label prediction method of the present invention is to carry out as follows:

步骤1：获得初始化示例集D：Step 1: Obtain the initialization example set D:

步骤1.1、由num′个已知对象建立原始示例集D′＝{inst′₁,inst′₂,…,inst′_a,…,inst′_num′}，inst′_a表示第a个已知对象所对应的原始示例；1≤a≤num′；并有inst′_a＝{attr′_a；lab′_a}；attr′_a表示所述第a个已知对象特征的属性集；lab′_a表示所述第a个已知对象语义的标签集；并有attr′_a＝{attr′_a,1,attr′_a,2,…,attr′_a,n}；attr′_a,n表示第a个已知对象的第n个属性；n为第a个已知对象的属性数；lab′_a＝{lab′_a,1,lab′_a,2,…,lab′_a,x,…,lab′_a,m}；lab′_a,x表示第a个已知对象的第x个标签；m为第a个已知对象的标签数；1≤x≤m；并有：lab′_a,x＝1表示第a个已知对象语义符合第x个标签；lab′_a,x＝0表示第a个已知对象语义不符合第x个标签；Step 1.1. Establish the original example set D′={inst′ ₁ , inst′ ₂ ,…,inst′ _a ,…,inst′ _num′ } from num′ known objects, where inst′ _a represents the ath known object The corresponding original example; 1≤a≤num'; and inst' _a = {attr'_a;lab' _a }; attr' _a represents the attribute set of the a-th known object feature; lab' _a represents The label set of the a-th known object semantics; and attr' _a ={attr' _a,1 ,attr' _a,2 ,...,attr' _a,n }; attr' _a,n represents the a-th The nth attribute of the known object; n is the number of attributes of the ath known object; lab′ _a ={lab′ _a,1 ,lab′ _a,2 ,…,lab′ _a,x ,…,lab′ _{a, m} }; lab′ _{a, x} represents the x-th label of the a-th known object; m is the number of labels of the a-th known object; 1≤x≤m; and: lab′ _{a, x} = 1 means that the semantics of the a-th known object conforms to the x-th label; lab′ _{a, x} = 0 means that the semantics of the a-th known object does not conform to the x-th label;

步骤1.2、对所述原始示例集D′中的num′个已知对象特征的属性集{attr′₁,attr′₂,…,attr′_a,…,attr′_num′}分别进行归一化处理，获得归一化处理后的num′个已知对象特征的属性集{attr″₁,attr″₂,…,attr″_a,…,attr″_num′}；当所述归一化后的第a个已知对象特征的属性集arrt″_a对应的m个标签值均为0时，删除所述归一化后的第a个已知对象所属的原始示例；从而获得num个示例构成的初始化示例集D＝{inst₁,inst₂,…,inst_i,…,inst_num}；inst_i表示初始化后的第i个已知对象所对应的示例；并有inst_i＝{attr_i；lab_i}；attr_i表示初始化后的第i个示例特征的属性集；lab_i表示初始化后的所述第i个示例语义的标签集；1≤i≤num；Step 1.2: Normalize the attribute sets {attr' ₁ , attr' ₂ , ..., attr' _a , ..., attr'_num' } of num' known object features in the original example set D' Processing, obtain normalized attribute sets of num′ known object features {attr″ ₁ , attr″ ₂ ,…, attr″ _a ,…, attr″ _num′ }; when the normalized When the m label values corresponding to the attribute set arrt″ _a of the a-th known object feature are all 0, delete the original example to which the a-th known object belongs after the normalization; thereby obtaining the Initialization example set D={inst ₁ , inst ₂ ,...,inst _i ,...,inst _num }; inst _i represents the example corresponding to the i-th known object after initialization; and inst _i ={attr _i ; lab _i }; attr _i represents the attribute set of the i-th example feature after initialization; lab _i represents the label set of the i-th example semantics after initialization; 1≤i≤num;

步骤2：求解所述初始化示例集D中各示例的群聚度，从而确定初始化示例集D中的领袖示例、局外示例和选民示例：Step 2: Solve the clustering degree of each example in the initialization example set D, so as to determine the leader example, outsider example and voter example in the initialization example set D:

步骤2.1、将所述初始化示例集D中num个示例中的每个示例的m个标签分别作为m维坐标，从而获得第i个示例inst_i与第k个示例inst_k标签的欧式距离d_ik；1≤k≤num且k≠i；Step 2.1, using the m labels of each example in the num examples in the initialization example set D as m-dimensional coordinates, thereby obtaining the Euclidean distance d _ik of the i-th example inst _i and the k-th example inst _k label ;1≤k≤num and k≠i;

步骤2.2、定义迭代次数γ；并初始化γ＝1；定义所述第i个示例inst_i的所属聚类为clu_i；Step 2.2, define the number of iterations γ; and initialize γ=1; define the cluster to which the i-th example inst _i belongs to as clu _i ;

步骤2.3、利用式(1)获得第γ次迭代的第i个示例inst_i的内聚合度从而获得第γ次迭代的num个示例的内聚合度并将最大的内聚合度记为 Step 2.3, using formula (1) to obtain the degree of internal aggregation of the i-th example inst _i of the γ-th iteration To obtain the degree of inner aggregation of the num examples of the γ-th iteration and record the maximum degree of internal polymerization as

${ρ ρ}_{i i}^{((γ γ))} = = {Σ Σ}_{k k = = 11}^{n no u u m m} f f (({d d}_{i i k k} - - {d d}_{c c}^{((γ γ))})) - - - - - - ((11))$

式(1)中，为第γ次迭代的阈值；当 $d_{i k} \leq d_{c}^{(γ)}$ 时， $f (d_{i k} - d_{c}^{(γ)}) = 1;$ 当 $d_{i k} > d_{c}^{(γ)}$ 时， $f (d_{i k} - d_{c}^{(γ)}) = 0;$ In formula (1), is the threshold of the γth iteration; when $d_{i k} \leq d_{c}^{(γ)}$ hour, $f (d_{i k} - d_{c}^{(γ)}) = 1;$ when $d_{i k} > d_{c}^{(γ)}$ hour, $f (d_{i k} - d_{c}^{(γ)}) = 0;$

步骤2.4、利用式(2)或式(3)获得第γ次迭代的第i个示例inst_i的差异度从而获得第γ次迭代的num个示例的差异度 $δ^{(γ)} = {δ_{1}^{(γ)}, δ_{2}^{(γ)}, ..., δ_{i}^{(γ)}, ..., δ_{n u m}^{(γ)}} :$ Step 2.4, use formula (2) or formula (3) to obtain the difference degree of the i-th example inst _i of the γ-th iteration To obtain the degree of difference of num examples of the γth iteration $δ^{(γ)} = {δ_{1}^{(γ)}, δ_{2}^{(γ)}, ..., δ_{i}^{(γ)}, ..., δ_{no u m}^{(γ)}} :$

$δ_{i}^{(γ)} = Σ_{k = 1}^{n u m} m a x (d_{i k}),$ 当 $ρ_{i}^{(γ)} = ρ_{\max}^{(γ)} - - - (2)$ $δ_{i}^{(γ)} = Σ_{k = 1}^{no u m} m a x (d_{i k}),$ when $ρ_{i}^{(γ)} = ρ_{\max}^{(γ)} - - - (2)$

当 $ρ_{i}^{(γ)} &NotEqual; ρ_{\max}^{(γ)} - - - (3)$ when $ρ_{i}^{(γ)} &NotEqual; ρ_{\max}^{(γ)} - - - (3)$

步骤2.5、对所述第γ次迭代的num个示例的差异度进行归一化处理，获得归一化后的差异度 $δ^{' (γ)} = {δ_{1}^{' (γ)}, δ_{2}^{' (γ)}, ..., δ_{i}^{' (γ)}, ..., δ_{n u m}^{' (γ)}};$ Step 2.5, the degree of difference for the num examples of the γth iteration Perform normalization processing to obtain the normalized difference degree $δ^{' (γ)} = {δ_{1}^{' (γ)}, δ_{2}^{' (γ)}, ..., δ_{i}^{' (γ)}, ..., δ_{no u m}^{' (γ)}};$

步骤2.6、利用式(4)获得第γ次迭代的第i个示例inst_i的群聚度从而获得第γ次迭代的num个示例的群聚度 ${sco}^{(γ)} = {{sco}_{1}^{(γ)}, {sco}_{2}^{(γ)}, ..., {sco}_{i}^{(γ)} ..., {sco}_{n u m}^{(γ)}} :$ Step 2.6, using formula (4) to obtain the clustering degree of the i-th example inst _i of the γ-th iteration To obtain the clustering degree of num examples of the γth iteration ${sco}^{(γ)} = {{sco}_{1}^{(γ)}, {sco}_{2}^{(γ)}, ..., {sco}_{i}^{(γ)} ..., {sco}_{no u m}^{(γ)}} :$

${sco sco}_{i i}^{((γ γ))} = = {ρ ρ}_{i i}^{((γ γ))} \times \times {δ δ}_{i i}^{' ' ((γ γ))} - - - - - - ((44))$

步骤2.7、对所述第γ次迭代的num个示例的群聚度sco^(γ)进行降序排列，获得群聚度序列并令与所述群聚度序列sco′^(γ)相对应的内聚合度为 $ρ^{' (γ)} = {ρ_{1}^{' (γ)}, ρ_{2}^{' (γ)}, ..., ρ_{t}^{' (γ)}, ..., ρ_{n u m}^{' (γ)}};$ 表示当 ${sco}_{i}^{(γ)} = {sco}_{t}^{' (γ)}$ 时的第γ次迭代的第i个示例inst_i的内聚合度；1≤t≤num；Step 2.7, sort the clustering degree sco ^(γ) of the num examples of the γth iteration in descending order, and obtain the clustering degree sequence And let the inner aggregation degree corresponding to the clustering degree sequence sco' ^(γ) be $ρ^{' (γ)} = {ρ_{1}^{' (γ)}, ρ_{2}^{' (γ)}, ..., ρ_{t}^{' (γ)}, ..., ρ_{no u m}^{' (γ)}};$ means when ${sco}_{i}^{(γ)} = {sco}_{t}^{' (γ)}$ The degree of inner aggregation of the i-th example inst _i of the γ-th iteration when ; 1≤t≤num;

步骤2.8、初始化t＝1；Step 2.8, initialization t=1;

步骤2.9、判断且≥num×3％是否成立，若成立，则第γ次迭代的阈值为有效值，并记录t后，执行步骤2.10；否则，判断是否成立，若成立，则将t+1赋值给t，并重复执行步骤2.9；否则，修改阈值将γ+1赋值给γ，并返回执行步骤2.3；Step 2.9, Judgment and Whether ≥num×3% is true, if true, the threshold of the γth iteration is a valid value, and after recording t, go to step 2.10; otherwise, judge Whether it is true, if true, assign t+1 to t, and repeat step 2.9; otherwise, modify the threshold Assign γ+1 to γ, and return to step 2.3;

步骤2.10、若第γ次迭代的第i个示例inst_i的内聚合度是否满足若满足，则所述第i个示例inst_i为局外示例，且令所述第i个示例inst_i的所属聚类clu_i＝-1；否则，判断是否成立，若成立，则第i个示例inst_i为领袖示例，且令clu_i＝i，否则，第i个示例inst_i为选民示例；Step 2.10, if the degree of internal aggregation of the i-th example inst _i of the γ-th iteration Is it satisfied If it is satisfied, then the i-th example inst _i is an outlier example, and the cluster clu _i to which the i-th example inst _i belongs = -1; otherwise, determine Whether it is true, if true, then the i-th example inst _i is a leader example, and let clu _i =i, otherwise, the i-th example inst _i is a voter example;

步骤2.11、统计所述领袖示例的个数和所述选民示例的个数，并分别记为N和M；Step 2.11, counting the number of leader examples and the number of voter examples, and recording them as N and M respectively;

步骤2.12、记N个领袖示例集为1≤α≤N；则与所述N个领袖示例集D^(l)相对应的内聚合度为表示第α个领袖示例的内聚合度；与所述N个领袖示例集D^(l)相对应的标签集为 ${lab}^{(l)} = {{lab}_{1}^{(l)}, {lab}_{2}^{(l)}, ..., {lab}_{α}^{(l)}, ..., {lab}_{N}^{(l)}};$ 表示第α个领袖示例的标签集；与所述N个领袖示例集D^(l)相对应的所属聚类为表示第α个领袖示例的所属聚类；Step 2.12, record N leader example sets as 1≤α≤N; then the degree of internal aggregation corresponding to the N leader example sets D ^(l) is Denotes the αth leader instance The degree of internal polymerization; the label set corresponding to the N leader example set D ^(l) is ${lab}^{(l)} = {{lab}_{1}^{(l)}, {lab}_{2}^{(l)}, ..., {lab}_{α}^{(l)}, ..., {lab}_{N}^{(l)}};$ Denotes the αth leader instance The label set of; The clustering corresponding to the N leaders example set D ^(l) is Denotes the αth leader instance belongs to the cluster;

步骤2.13、记M个选民示例集为1≤β≤M；则与所述M个选民示例集D^(v)相对应的内聚合度为表示第β个选民示例的内聚合度；与所述M个选民示例集D^(v)相对应的标签集为 ${lab}^{(v)} = {{lab}_{1}^{(v)}, {lab}_{2}^{(v)}, ..., {lab}_{β}^{(v)}, ..., {lab}_{M}^{(v)}};$ 表示第β个选民示例的标签集；与所述M个选民示例集D^(v)相对应的所属聚类为表示第β个选民示例的所属聚类；Step 2.13, record M voter example sets as 1≤β≤M; then the degree of inner aggregation corresponding to the M voter example set D ^(v) is Denotes the βth voter instance The degree of internal aggregation; the label set corresponding to the M voter example set D ^(v) is ${lab}^{(v)} = {{lab}_{1}^{(v)}, {lab}_{2}^{(v)}, ..., {lab}_{β}^{(v)}, ..., {lab}_{m}^{(v)}};$ Denotes the βth voter instance label set; the cluster corresponding to the M voter example set D ^(v) is Denotes the βth voter instance belongs to the cluster;

步骤3：获得所述M个选民示例集D^(v)的所属聚类clu^(v)：Step 3: Obtain the cluster clu ^(v) to which the M voter example sets D ^(v) belong:

步骤3.1、定义迭代次数χ；并初始化χ＝1；并定义第z个中转示例inst_z；z≥0；并初始化α＝1、β＝1、z＝0；Step 3.1, define the number of iterations χ; and initialize χ=1; and define the zth transit example inst _z ; z≥0; and initialize α=1, β=1, z=0;

步骤3.2、从所述N个领袖示例集D^(l)中选取任第α个领袖示例获得所述第α个领袖示例为与第χ次迭代的第β个选民示例标签的欧式距离 Step 3.2, select any α-th leader example from the N leader example set D ^(l) Obtain the αth leader instance as With the βth voter example of the χth iteration Euclidean distance of labels

步骤3.3、若时，则将β+1赋值给β，并判断β≤M是否成立，若成立，重复执行步骤3.3；否则执行步骤3.5；若时，判断第χ次迭代的第β个选民示例的所属聚类是否为空，若为空，则执行步骤3.4；否则，表示第χ次迭代的第β个选民示例的所属聚类的值为第χ次迭代现有的领袖示例的下标，记为执行步骤3.11；Step 3.3, if , then assign β+1 to β, and judge whether β≤M holds true, if true, repeat step 3.3; otherwise, go to step 3.5; if When , judge the β-th voter example of the χ-th iteration belongs to the cluster Is it empty, if it is empty, go to step 3.4; otherwise, it represents the βth voter example of the χth iteration belongs to the cluster The value of is the subscript of the existing leader instance in the χ iteration, denoted as Execute step 3.11;

步骤3.4、将第α个领袖示例的下标α^(l)赋值给并将z+1赋值给z，令表示将第χ次迭代的第β个选民示例中的下标β_χ、标签集内聚合度和所属聚类均赋值给第χ次迭代的第z个中转示例的下标、标签集、内聚合度和所属聚类；并将β+1赋值给β；判断β≤M是否成立，若成立，则执行步骤3.3；否则执行步骤3.5；Step 3.4, the αth leader example The subscript α ^(l) of is assigned to And assign z+1 to z, let Indicates that the β-th voter instance of the χ-th iteration will be The subscript β _χ in , the label set degree of internal polymerization and belong to the cluster Both are assigned to the z-th transition example of the χ-th iteration The subscript, label set, degree of inner aggregation and belonging cluster; and assign β+1 to β; judge whether β≤M holds true, if true, go to step 3.3; otherwise go to step 3.5;

步骤3.5、若z≤0，则执行步骤3.14；否则，将χ+1赋值给χ，并将依次赋值给令β＝1；并获得所述第χ次迭代的第β个选民示例与第χ次迭代第z个中转示例标签的欧式距离并将z-1赋值给z；Step 3.5, if z≤0, then execute step 3.14; otherwise, assign χ+1 to χ, and set in turn assigned to Let β = 1; and obtain the β-th voter instance of the χ-th iteration With the χth iteration zth transition example Euclidean distance of labels And assign z-1 to z;

步骤3.6、若时，则将β+1赋值给β，并判断β≤M是否成立，若成立，重复执行步骤3.6；否则执行步骤3.5；若时，判断第χ次迭代的第β个选民示例的所属聚类是否为空，若为空，则执行步骤3.7；否则，表示第χ次迭代的第β个选民示例的所属聚类的值为第χ次迭代现有的领袖示例的下标，记为执行步骤3.8；Step 3.6, if , then assign β+1 to β, and judge whether β≤M holds true, if true, repeat step 3.6; otherwise, go to step 3.5; if When , judge the β-th voter example of the χ-th iteration belongs to the cluster Is it empty, if it is empty, go to step 3.7; otherwise, it represents the βth voter example of the χth iteration belongs to the cluster The value of is the subscript of the existing leader instance in the χ iteration, denoted as Execute step 3.8;

步骤3.7、将第χ次迭代的第z个中转示例的下标z^(χ)赋值给并将z+1赋值给z，令并将β+1赋值给β；并判断β≤M是否成立，若成立，则重复执行步骤3.6；否则执行步骤3.5；Step 3.7, transfer the z-th transfer example of the x-th iteration The subscript z ^(χ) of is assigned to And assign z+1 to z, let And assign β+1 to β; and judge whether β≤M is true, if it is true, repeat step 3.6; otherwise, execute step 3.5;

步骤3.8、利用式(5)获得第χ次迭代的第β选民示例与所述第χ次迭代现有领袖示例的影响力 Step 3.8, using formula (5) to obtain the βth voter example of the χth iteration The influence of the existing leader example with the χth iteration

${gra gra}_{{β β}_{χ χ} ϵ ϵ}^{((v v)) (({β β}_{χ χ}))} = = \frac{{ρ ρ}_{{β β}_{χ χ}}^{((v v))} \times \times {ρ ρ}_{ϵ ϵ}^{(({β β}_{χ χ}))}}{{d d}_{{β β}_{χ χ} ϵ ϵ}^{((v v)) (({β β}_{χ χ}))}} - - - - - - ((55))$

步骤3.9、利用式(6)获得第χ次迭代的第β个选民示例与第χ次迭代的第z个中转示例的影响力 Step 3.9, use formula (6) to obtain the βth voter example of the χth iteration The z-th transition example with the χ-th iteration influence

${gra gra}_{{β β}_{χ χ} z z}^{((v v)) ((χ χ))} = = \frac{{ρ ρ}_{{β β}_{χ χ}}^{((v v))} \times \times {ρ ρ}_{z z}^{((χ χ))}}{{d d}_{{β β}_{χ χ} z z}^{((v v)) ((χ χ))}} - - - - - - ((66))$

步骤3.10、若则将β+1赋值给β，并执行步骤3.6；否则，令并将z+1赋值给z，令并将β+1赋值给β，并判断β≤M是否成立，若成立，则执行步骤3.6；否则执行步骤3.5；Step 3.10, if Then assign β+1 to β, and execute step 3.6; otherwise, let And assign z+1 to z, let And assign β+1 to β, and judge whether β≤M holds true, if true, go to step 3.6; otherwise go to step 3.5;

步骤3.11、利用式(7)获得第χ次迭代的第β选民示例与所述第χ次迭代现有领袖示例的影响力 Step 3.11, use formula (7) to obtain the βth voter example of the χth iteration The influence of the existing leader example with the χth iteration

${gra gra}_{{β β}_{χ χ} ϵ ϵ}^{((v v)) (({β β}_{χ χ}))} = = \frac{{ρ ρ}_{{β β}_{χ χ}}^{((v v))} \times \times {ρ ρ}_{ϵ ϵ}^{(({β β}_{χ χ}))}}{{d d}_{{β β}_{χ χ} ϵ ϵ}^{((v v)) (({β β}_{χ χ}))}} - - - - - - ((77))$

步骤3.12、利用式(8)获得第χ次迭代的第β个选民示例与第α个领袖示例的影响力 Step 3.12, use formula (8) to obtain the β-th voter example of the χ-th iteration Example with the αth leader influence

${gra gra}_{{β β}_{χ χ} α α}^{((v v)) ((l l))} = = \frac{{ρ ρ}_{{β β}_{χ χ}}^{((v v))} \times \times {ρ ρ}_{α α}^{((l l))}}{{d d}_{{β β}_{χ χ} α α}^{((v v)) ((l l))}} - - - - - - ((88))$

步骤3.13、若则将β+1赋值给β，并执行步骤3.3；否则，将第α个领袖示例的下标α^(l)赋值给并将z+1赋值给z，令并将β+1赋值给β，并判断β≤M是否成立，若成立，则执行步骤3.3；否则执行步骤3.5；Step 3.13, if Then assign β+1 to β, and execute step 3.3; otherwise, assign the αth leader instance The subscript α ^(l) of is assigned to And assign z+1 to z, let And assign β+1 to β, and judge whether β≤M is true, if it is true, go to step 3.3; otherwise go to step 3.5;

步骤3.14、将α+1赋值给α；并判断α≤N是否成立，若成立，令β＝1，并执行步骤3.2；否则执行步骤3.15；Step 3.14, assigning α+1 to α; and judging whether α≤N is true, if true, set β=1, and execute step 3.2; otherwise, execute step 3.15;

步骤3.15、将第χ次迭代时所述M个选民示例集D^(v)相对应的所属聚类依次赋值给所述M个选民示例集D^(v)相对应的所属聚类 Step 3.15, the corresponding clusters of the M voter example sets D ^(v) during the χ iteration Sequentially assign values to the corresponding clusters of the M voter example sets D ^(v)

步骤3.16、判断是否还存在所属聚类为空的选民示例，若存在，则设置所属聚类为空的选民示例的所属聚类的值为-1；Step 3.16. Determine whether there are still voter examples whose clusters are empty, and if so, set the cluster value of the voter examples whose clusters are empty to -1;

步骤4；采用支持向量机对预测示例进行粗分类：Step 4: Roughly classify the predicted examples using support vector machines:

4.1、建立由nump个预测示例组成的预测示例集P＝{instp₁,instp₂,…,instp_j,…,instp_nump}；instp_j表示第j个预测示例；1≤j≤nump；并有instp_j＝{attrp_j；labp_j}；arrtp_j表示第j个预测示例instp_j的属性集；labp_j表示第j个预测示例instp_j的标签集；记所述第j个预测示例instp_j的所属聚类为clup_j；4.1. Establish a prediction example set P={instp ₁ , instp ₂ ,…,instp _j ,…,instp _nump } consisting of nmp prediction examples; instp _j represents the jth prediction example; 1≤j≤nump; and have instp _j = {attrp _j ; labp _j }; arrtp _j represents the attribute set of the jth prediction example instp _j ; labp _j represents the label set of the jth prediction example instp _j ; record the jth prediction example instp _j The cluster to which it belongs is clup _j ;

4.2、以所述初始化示例集D相对应的num个所属聚类{clu₁,clu₂,…,clu_i,…,clu_num}作为训练标签，以所述初始化示例集D中的num个已知对象的属性集{attr₁,attr₂…,attr_i,…,attr_num}作为训练样本；以所述预测示例集P的nump个属性集{attrp₁,attrp₂…,attrp_j,…,attrp_nump}作为预测样本，并用支持向量机方法进行训练，获得nump个预测标签，将所述nump个预测标签分别赋值给所述预测示例集P的nump个所属聚类；从而完成对所述预测示例集P的粗分类；4.2. Use the num clusters {clu ₁ ,clu ₂ ,...,clu _i ,...,clu _num } corresponding to the initialization example set D as training labels, and use the num clusters in the initialization example set D that have been The attribute sets {attr ₁ , attr ₂ ..., _attrp ₂ ..., _attrp _j _, ..., attrp _nump } as a prediction sample, and train with a support vector machine method to obtain nmp prediction labels, and assign the nmp prediction labels to the nump belonging clusters of the prediction example set P; thereby completing the prediction Coarse classification of the example set P;

步骤5、对nump个预测示例进行多标签预测；Step 5, perform multi-label prediction on nmp prediction examples;

步骤5.1、将所述初始化示例集D中num个示例和所述预测示例集P中nump个示例整合为第ψ次更新示例集 $D_{n e w}^{(ψ)} = {{inst}_{1}, {inst}_{2}, ..., {inst}_{i}, ..., {inst}_{n u m}; {instp}_{1}, {instp}_{2}, ..., {instp}_{j}, ..., {instp}_{n u m p}},$ 记为 $D_{n e w}^{(ψ)} = {{inst}_{1}^{(ψ)}, {inst}_{2}^{(ψ)}, ..., {inst}_{Ω}^{(ψ)}, ..., {inst}_{n u m + n u m p}^{(ψ)}};$ 表示第Ω个第ψ次更新示例；1≤Ω≤num+nump；Step 5.1. Integrate the num examples in the initialization example set D and the nmp examples in the prediction example set P into an updated example set for the ψth time ${D.}_{no e w}^{(ψ)} = {{inst}_{1}, {inst}_{2}, ..., {inst}_{i}, ..., {inst}_{no u m}; {instp}_{1}, {instp}_{2}, ..., {instp}_{j}, ..., {instp}_{no u m p}},$ recorded as ${D.}_{no e w}^{(ψ)} = {{inst}_{1}^{(ψ)}, {inst}_{2}^{(ψ)}, ..., {inst}_{Ω}^{(ψ)}, ..., {inst}_{no u m + no u m p}^{(ψ)}};$ Indicates the Ω-th ψ-th update example; 1≤Ω≤num+nump;

步骤5.2、所述第ψ次更新示例集中num+nump个更新示例中的的每个示例的n个属性分别作为n维坐标，从而获得第Ω个第ψ次更新示例与第ξ个第ψ次更新示例属性的欧式距离1≤ξ≤num+nump且ξ≠Ω；Step 5.2, the ψth update example set The n attributes of each example in the num+nump update examples are respectively used as n-dimensional coordinates, so as to obtain the Ω-th ψ-th update example Update example with ξ-th ψ-th Euclidean distance of attributes 1≤ξ≤num+nump and ξ≠Ω;

步骤5.3、利用式(9)获得第Ω个第ψ次更新示例的属性聚合度从而获得第ψ次更新的num+nump个更新示例的属性聚合度 Step 5.3, using formula (9) to obtain the Ω-th ψ-th update example The attribute aggregation degree of So as to obtain the attribute aggregation degree of the num+nump update examples of the ψth update

${Γ Γ}_{Ω Ω}^{((ψ ψ))} = = {Σ Σ}_{ξ ξ = = 11}^{n no u u m m + + n no u u m m p p} f f (({d d}_{Ω Ω ξ ξ}^{((ψ ψ))} - - {d d}_{c c}^{((γ γ))})) - - - - - - ((99))$

当 $d_{Ω ξ}^{(ψ)} \leq d_{c}^{(γ)}$ 时， $f (d_{Ω ξ}^{(ψ)} - d_{c}^{(γ)}) = 1;$ 当 $d_{Ω ξ}^{(ψ)} > d_{c}^{(γ)}$ 时， $f (d_{Ωξ}^{(ψ)} - d_{c}^{(γ)}) = 0;$ when $d_{Ω ξ}^{(ψ)} \leq d_{c}^{(γ)}$ hour, $f (d_{Ω ξ}^{(ψ)} - d_{c}^{(γ)}) = 1;$ when $d_{Ω ξ}^{(ψ)} > d_{c}^{(γ)}$ hour, $f (d_{Ωξ}^{(ψ)} - d_{c}^{(γ)}) = 0;$

步骤5.4、初始化j＝1；Step 5.4, initialize j=1;

步骤5.5、若所述预测示例集P中第j个预测示例instp_j的所属聚类为clup_j与所述初始化示例集D中第i个已知示例inst_i的所属聚类为clu_i相同；则利用式(10)获得第i个已知示例inst_i与第j个预测示例instp_j的影响力gra_ij：Step 5.5, if the cluster of the j-th prediction example instp _j in the prediction example set P is clup _j and the cluster of the i-th known example inst _i in the initialization example set D is clu _i ; Then use formula (10) to obtain the influence gra _ij of the i-th known example inst _i and the j-th predicted example instp _j :

${gra gra}_{i i j j} = = \frac{{Γ Γ}_{i i} \times \times {Γ Γ}_{j j}}{{d d}_{i i j j}} - - - - - - ((1010))$

式(10)中，Γ_i表示已知示例inst_i在第ψ次更新示例集所对应更新示例的属性聚合度，Γ_j表示预测示例instp_j在第ψ次更新示例集所对应更新示例的属性聚合度，d_ij表示所述第i个已知示例inst_i与第j个预测示例instp_j属性的欧式距离；In formula (10), Γ _i represents the known example inst _i updating the example set at the ψth time The attribute aggregation degree of the corresponding update example, Γ _j indicates that the predicted example instp _j updates the example set at the ψth time The attribute aggregation degree of the corresponding update example, d _ij represents the Euclidean distance between the i-th known example inst _i and the j-th predicted example instp _j attribute;

步骤5.6、重复步骤5.5，从而获得第j个预测示例instp_j与所述初始化示例集D其他已知示例的影响力，并记录最大影响力gra_max；Step 5.6, repeat step 5.5, so as to obtain the influence of the jth prediction example instp _j and other known examples of the initialization example set D, and record the maximum influence gra _max ;

步骤5.7、若gra_ij＝gra_max，则令labp_j＝lab_i，表示所述预测示例集P的标签集labp_j中的各个标签和所述初始化示例集D的标签集lab_i中的各个标签相同，从而获得第j个多标签预测的预测示例；Step 5.7. If gra _ij = gra _max , then set labp _j = lab _i , indicating each label in the label set labp _j of the prediction example set P and each label in the label set lab _i of the initialization example set D The same, so as to obtain the prediction example of the jth multi-label prediction;

步骤5.8、将j+1赋值给j，并判断j≤nump是否成立，若成立，则返回步骤5.5执行，否则，表示完成对nump个预测示例的多标签预测；Step 5.8, assign j+1 to j, and judge whether j≤nump is true, if true, return to step 5.5 for execution, otherwise, it means that the multi-label prediction of nump prediction examples is completed;

本发明所述的自适应多标签预测方法的特点是：The characteristics of the self-adaptive multi-label prediction method of the present invention are:

所述步骤5中，还包括步骤5.9、将所述完成多标签预测的预测示例集P的标签集赋值到所述对应的第ψ次更新示例集中，从而获得第ψ+1次更新示例集以所述第ψ+1次更新示例集作为新的初始化示例集进行自适应多标签预测。In the step 5, it also includes step 5.9, assigning the label set of the prediction example set P that has completed the multi-label prediction to the corresponding ψth update example set , so as to obtain the ψ+1th update example set Update the example set by the ψ+1th time as a new set of initialization examples for adaptive multi-label prediction.

当出现新的具有相同的对象特征及相同的对象语义的预测示例时，只需从步骤4开始即可完成对新的预测示例进行多标签预测。When a new prediction example with the same object features and the same object semantics appears, it only needs to start from step 4 to complete the multi-label prediction for the new prediction example.

所述步骤2.9中，修改阈值的规则是：若则将减去τ₂赋值给否则，将加τ₂赋值给0.1≤τ₂≤0.5，75％≤τ₁＜100％。In the step 2.9, modify the threshold The rule is: if then will Subtract τ ₂ and assign to Otherwise, will Add τ ₂ and assign to _{0.1≤τ2≤0.5} , 75% _≤τ1 <100%.

与已有技术相比，本发明有益效果体现在：Compared with the prior art, the beneficial effects of the present invention are reflected in:

1、本发明采用先粗分类再精准预测的方法，借助本发明所含的自适应性，通过多轮迭代，使得预测标签不断进化，进而取得比现有的多标签预测技术更为准确的预测结果，是一个可以投入到实际应用的方法。1. The present invention adopts the method of rough classification first and then precise prediction. With the help of the self-adaptability contained in the present invention, through multiple rounds of iterations, the prediction label is continuously evolved, and then more accurate prediction than the existing multi-label prediction technology can be obtained. The result, is a method that can be put into practical use.

2、本发明通过初始化示例集，可根据不同已知对象特征和语义确定不同的初始化示例集，使得本发明可广泛应用于现有网络平台大部分的应用环境，从简单的文字型数据，到音频，乃至图像，皆可有较好地做出标签预测，相较于现有技术普适性强。2. Through the initialization example set, the present invention can determine different initialization example sets according to different known object characteristics and semantics, so that the present invention can be widely applied to most application environments of existing network platforms, from simple text data to Audio, even images, can make better label predictions, which is more universal than existing technologies.

3、本发明通过计算获得内聚合度来表示示例的内聚程度，通过计算获得差异度来表示示例的耦合程度，并依据内聚合度和差异度求解出来的群聚度，各参数有实际含义，充分考虑了高内聚低耦合的数据分类要求，易于理解和解释，从而在保证了本发明有较高的预测准确性的同时，使得本发明有较强的可移植性，可在各种条件下进行多标签预测。3. The present invention expresses the degree of cohesion of the example by calculating the degree of cohesion, obtains the degree of coupling by calculating the degree of difference, and calculates the degree of clustering based on the degree of cohesion and degree of difference. Each parameter has a practical meaning , fully considers the data classification requirements of high cohesion and low coupling, and is easy to understand and explain, so that while ensuring the high prediction accuracy of the present invention, the present invention has strong portability and can be used in various Under the condition of multi-label prediction.

4、本发明通过内聚合度能够准确找到各个产品领域中的领袖示例；对于微博，论坛和社交网络，借助此法能够准确地找到不同话题领域中影响力最大的关键用户，通过对其行为的详细研究，可预测到该领域可能的趋势，并为该领域的用户提供准确的推荐。4. The present invention can accurately find examples of leaders in various product fields through the degree of inner aggregation; for microblogs, forums and social networks, this method can accurately find key users with the greatest influence in different topic fields, and through their behavior A detailed study of , can predict possible trends in the field and provide accurate recommendations to users in the field.

5、本发明通过计算示例与示例间影响力，不但可以用于多标签预测上，也可对相同语义的已知标签的示例进行类比，找寻到与该示例的多标签极为类似的示例，推荐给用户，提高用户的使用体验。5. By calculating the influence between examples and examples, the present invention can not only be used for multi-label prediction, but also can make analogies to examples with known labels of the same semantics, and find examples that are very similar to the multi-label examples of this example. Recommended To users, improve user experience.

6、本发明在预测示例的多标签确定时，采用选取与预测示例最为相似的已知示例的标签集作为预测示例的标签集的方法，可以将该已知示例的用户群推荐给新出现的预测示例；可为新出现的产品找到其较为准确的市场定位，并为其发现潜在的用户。6. The present invention adopts the method of selecting the label set of the known example that is most similar to the predicted example as the label set of the predicted example when determining the multi-label of the predicted example, and can recommend the user group of the known example to the newly emerging Forecast example; it can find a more accurate market positioning for new products and discover potential users for them.

7、本发明由于采用将完成多标签预测的预测示例加入到初始化示例集的方法，从而丰富了现有训练集，提高了下一轮预测的准确性，使得本发明具有自适应性的学习能力，面对新加入的示例能进一步完善现有数据集合，伴随已知标签示例的增加，将进一步提高该方法预测的准确性。7. The present invention enriches the existing training set and improves the accuracy of the next round of prediction due to the method of adding the prediction examples that complete the multi-label prediction to the initialization example set, so that the present invention has adaptive learning ability , in the face of newly added examples, the existing data set can be further improved, and with the increase of known label examples, the prediction accuracy of this method will be further improved.

具体实施方式Detailed ways

本实施例中，一种自适应多标签预测方法，是按如下步骤进行：In this embodiment, an adaptive multi-label prediction method is performed in the following steps:

步骤1.1、由num′个已知对象建立原始示例集D′＝{inst′₁,inst′₂,…,inst′_a,…,inst′_num′}，inst′_a表示第a个已知对象所对应的原始示例；1≤a≤num′；并有inst′_a＝{attr′_a；lab′_a}；attr′_a表示第a个已知对象特征的属性集；lab′_a表示第a个已知对象语义的标签集；并有attr′_a＝{attr′_a,1,attr′_a,2,…,attr′_a,n}；attr′_a,n表示第a个已知对象的第n个属性；n为第a个已知对象的属性数，lab′_a＝{lab′_a,1,lab′_a,2,…,lab′_a,x,…,lab′_a,m}；lab′_a,x表示第a个已知对象的第x个标签；m为第a个已知对象的标签数；1≤x≤m；并有：lab′_a,x＝1表示第a个已知对象语义符合第x个标签；lab′_a,x＝0表示第a个已知对象语义不符合第x个标签；假设，已知对象为图片，将色差，尺寸等需要详细描述的对象特征作为属性集，用准确而详尽的数字作为各个属性的值；将风景图片，动物图片等非是即否的对象语义作为标签集，用0表示不符合该标签，用1表示符合该标签；Step 1.1. Establish the original example set D′={inst′ ₁ , inst′ ₂ ,…,inst′ _a ,…,inst′ _num′ } from num′ known objects, where inst′ _a represents the ath known object The corresponding original example; 1≤a≤num′; and inst′ _a = {attr′ _a ; lab′ _a }; attr′ _a represents the attribute set of the a-th known object feature; lab′ _a represents the a-th A label set of known object semantics; and attr′ _a ={attr′ _a,1 ,attr′ _a,2 ,…,attr′ _a,n }; attr′ _a,n represents the ath known object The nth attribute; n is the number of attributes of the ath known object, lab′ _a = {lab′ _a,1 ,lab′ _a,2 ,…,lab′ _a,x ,…,lab′ _a,m } ;lab′ _a,x represents the xth label of the ath known object; m is the label number of the ath known object; 1≤x≤m; and: lab′ _a,x = 1 represents the ath The semantics of a known object conforms to the x-th label; lab′ _{a, x} = 0 means that the semantics of the a-th known object does not conform to the x-th label; assuming that the known object is a picture, the color difference, size, etc. need to be described in detail Object features are used as an attribute set, and accurate and detailed numbers are used as the values of each attribute; object semantics such as landscape pictures and animal pictures are used as a label set, and 0 indicates that it does not meet the label, and 1 indicates that it meets the label. ;

步骤1.2、对原始示例集D′中的num′个已知对象特征的属性集{attr′₁,attr′₂,…,attr′_a,…,attr′_num′}分别进行归一化处理；在归一化处理中，以第a个已知对象特征的属性集attr′_a为例，即是先记录属性集{attr′_a,1,attr′_a,2,…,attr′_a,n}中值最大的属性attr′_a,max，再用最大的属性attr′_a,max作为分母，与属性集中每个属性进行除法计算，便可获得第a个归一化处理后的已知对象特征的属性集attr″_a；依此类推获得归一化处理后的num′个已知对象特征的属性集{attr″₁,attr″₂,…,attr″_a,…,attr″_num′}；当归一化后的第a个已知对象特征的属性集arrt″_a对应的m个标签值均为0时，删除归一化后的第a个已知对象所属的原始示例；从而获得num个示例构成的初始化示例集D＝{inst₁,inst₂,…,inst_i,…,inst_num}；inst_i表示初始化后的第i个已知对象所对应的示例；并有inst_i＝{attr_i；lab_i}；attr_i表示初始化后的第i个示例特征的属性集；lab_i表示初始化后的第i个示例语义的标签集；1≤i≤num；如表1所示：Step 1.2, normalize the attribute sets {attr' ₁ , attr' ₂ , ..., attr' _a , ..., attr'_num' } of num' known object features in the original example set D'; In the normalization process, take the attribute set attr′ _a of the a-th known object feature as an example, that is, first record the attribute set {attr′ _a,1 ,attr′ _a,2 ,…,attr′ _a,n } attribute attr′ _a,max with the largest median value, and then use the largest attribute attr′ _a,max as the denominator, and perform division calculation with each attribute in the attribute set to obtain the a-th known object after normalization The attribute set attr″ _a of the feature; and so on to obtain the attribute set {attr″ ₁ , attr″ ₂ ,…,attr″ _a ,…,attr″ _num′ } of num′ known object features after normalization ; When the m label values corresponding to the attribute set arrt″ _a of the a-th known object feature after normalization are all 0, delete the original example to which the a-th known object belongs after normalization; thereby obtaining num The initialization example set D={inst ₁ , inst ₂ ,...,inst _i ,...,inst _num } composed of examples; inst _i represents the example corresponding to the i-th known object after initialization; and inst _i ={ attr _i ; lab _i }; attr _i represents the attribute set of the i-th example feature after initialization; lab _i represents the label set of the i-th example semantics after initialization; 1≤i≤num; as shown in Table 1:

表1：初始化示例集D第i个示例inst_i的数据表Table 1: The data table of the i-th example inst _i of the initialization example set D

attr_i,1 attri _i,1 …… attr_i,n attri _i,n lab_i,1 lab _i,1 …… lab_i,m lab _i,m ρ_i ρ _i δ_i _δi sco_i sco _i clu_i clu _i inst_i inst _i

步骤2：求解初始化示例集D中各示例的群聚度，从而确定初始化示例集D中的领袖示例、局外示例和选民示例：Step 2: Solve the clustering degree of each example in the initialization example set D, so as to determine the leader example, outsider example and voter example in the initialization example set D:

步骤2.1、将初始化示例集D中num个示例中的每个示例的m个标签分别作为m维坐标，从而获得第i个示例inst_i与第k个示例inst_k标签的欧式距离d_ik；1≤k≤num且k≠i；例如，求解第一个示例与第二个示例标签的欧式距离d₁₂，第一个示例和第二个示例都有m个相同名称的标签，但由于取值不一定相同，则分别表示为第一个示例的标签集lab₁＝{lab_1,1,lab_1,2,…,lab_1,m}和第二个示例的标签集lab₂＝{lab_2,1,lab_2,2,…,lab_2,m}，则标签的欧式距离d₁₂为 $d_{12} = \sqrt{{({lab}_{1, 1} - {lab}_{2, 1})}^{2} + ... + {({lab}_{1, m} - {lab}_{2, m})}^{2}};$ Step 2.1, the m labels of each example in the num examples in the initialization example set D are respectively used as m-dimensional coordinates, thereby obtaining the Euclidean distance d _ik between the i-th example inst _i and the k-th example inst _k label; 1 ≤k≤num and k≠i; for example, to solve the Euclidean distance d ₁₂ between the first example and the second example label, both the first example and the second example have m labels with the same name, but due to the value are not necessarily the same, they are represented as the label set lab ₁ = {lab _1,1 ,lab _1,2 ,...,lab _1,m } of the first example and the label set lab ₂ ={lab ₂ of the second example _,1 ,lab _2,2 ,…,lab _2,m }, then the Euclidean distance d ₁₂ of the label is $d_{12} = \sqrt{{({lab}_{1, 1} - {lab}_{2, 1})}^{2} + ... + {({lab}_{1, m} - {lab}_{2, m})}^{2}};$

步骤2.2、定义迭代次数γ；并初始化γ＝1；定义第i个示例inst_i的所属聚类为clu_i；Step 2.2, define the number of iterations γ; and initialize γ=1; define the cluster of the i-th example inst _i as clu _i ;

式(1)中，为第γ次迭代的阈值；当 $d_{ik} \leq d_{c}^{(γ)}$ 时， $f (d_{i k} - d_{c}^{(γ)}) = 1;$ 当 $d_{i k} > d_{c}^{(γ)}$ 时， $f (d_{i k} - d_{c}^{(γ)}) = 0;$ In formula (1), is the threshold of the γth iteration; when $d_{ik} \leq d_{c}^{(γ)}$ hour, $f (d_{i k} - d_{c}^{(γ)}) = 1;$ when $d_{i k} > d_{c}^{(γ)}$ hour, $f (d_{i k} - d_{c}^{(γ)}) = 0;$

步骤2.5、对第γ次迭代的num个示例的差异度进行归一化处理，获得归一化后的差异度借助步骤2.4和步骤2.5将会使归一化后的差异度有较大的区分，使少数接近于1，大部分值都小于0.5，这将有助于领袖示例的选取；Step 2.5, the degree of difference for the num examples of the γ-th iteration Perform normalization processing to obtain the normalized difference degree With the help of steps 2.4 and 2.5, the normalized difference There is a large distinction, so that a few are close to 1, and most of the values are less than 0.5, which will help the selection of leader examples;

步骤2.6、利用式(4)获得第γ次迭代的第i个示例inst_i的群聚度从而获得第γ次迭代的num个示例的群聚度 ${sco}^{(γ)} = {{sco}_{1}^{(γ)}, {sco}_{2}^{(γ)}, ..., {sco}_{i}^{(γ)} ..., {sco}_{n u m}^{(γ)}} :$ Step 2.6, use formula (4) to obtain the clustering degree of the i-th example inst _i of the γ-th iteration To obtain the clustering degree of num examples of the γth iteration ${sco}^{(γ)} = {{sco}_{1}^{(γ)}, {sco}_{2}^{(γ)}, ..., {sco}_{i}^{(γ)} ..., {sco}_{no u m}^{(γ)}} :$

步骤2.7、对第γ次迭代的num个示例的群聚度sco^(γ)进行降序排列，获得群聚度序列并令与群聚度序列sco′^(γ)相对应的内聚合度为 $ρ^{' (γ)} = {ρ_{1}^{' (γ)}, ρ_{2}^{' (γ)}, ..., ρ_{t}^{' (γ)}, ..., ρ_{n u m}^{' (γ)}};$ 表示当 ${sco}_{i}^{(γ)} = {sco}_{t}^{' (γ)}$ 时的第γ次迭代的第i个示例inst_i的内聚合度；1≤t≤num；Step 2.7. Arrange the clustering degrees sco ^(γ) of the num examples of the γth iteration in descending order to obtain the clustering degree sequence And let the inner aggregation degree corresponding to the clustering degree sequence sco′ ^(γ) be $ρ^{' (γ)} = {ρ_{1}^{' (γ)}, ρ_{2}^{' (γ)}, ..., ρ_{t}^{' (γ)}, ..., ρ_{no u m}^{' (γ)}};$ means when ${sco}_{i}^{(γ)} = {sco}_{t}^{' (γ)}$ Inner aggregation degree of the i-th example inst _i of the γ-th iteration; 1≤t≤num;

步骤2.8、初始化t＝1；Step 2.8, initialization t=1;

步骤2.9、判断且≥num×3％是否成立，若成立，则第γ次迭代的阈值为有效值，并记录t后，执行步骤2.10；否则，判断是否成立，若成立，则将t+1赋值给t，并重复执行步骤2.9；否则，修改阈值修改阈值的规则是：若则将减去τ₂赋值给否则，将加τ₂赋值给0.1≤τ₂≤0.5，75％≤τ₁＜100％；将γ+1赋值给γ，并返回执行步骤2.3；判断且≥num×3％的条件中，1.25和3％不是固定不变的，本发明是建立在示例数目为万级，标签数目在20以下，会有较优解，当示例数目和标签数目变化时候，可以酌情进行修改，其原则是能保证后面的步骤中仅选取群聚度远大于其它示例的少量示例作为领袖示例；Step 2.9, Judgment and Whether ≥num×3% is true, if true, the threshold of the γth iteration is a valid value, and after recording t, go to step 2.10; otherwise, judge Whether it is true, if true, assign t+1 to t, and repeat step 2.9; otherwise, modify the threshold modify threshold The rule is: if then will Subtract τ ₂ and assign to Otherwise, will Add τ ₂ and assign to 0.1≤τ ₂ ≤0.5, 75%≤τ ₁ <100%; assign γ+1 to γ, and return to step 2.3; judge and In the condition of ≥num×3%, 1.25 and 3% are not fixed. The present invention is based on the fact that the number of examples is 10,000 and the number of labels is less than 20. There will be a better solution. When the number of examples and the number of labels change , can be modified as appropriate, and the principle is to ensure that only a small number of examples whose clustering degree is much greater than other examples are selected as leader examples in the following steps;

步骤2.10、若第γ次迭代的第i个示例inst_i的内聚合度是否满足若满足，则第i个示例inst_i为局外示例，且令第i个示例inst_i的所属聚类clu_i＝-1；否则，判断是否成立，若成立，则第i个示例inst_i为领袖示例，且令clu_i＝i，否则，第i个示例inst_i为选民示例；Step 2.10, if the degree of internal aggregation of the i-th example inst _i of the γ-th iteration Is it satisfied If it is satisfied, then the i-th example inst _i is an outlier example, and the cluster clu _i to which the i-th example inst _i belongs = -1; otherwise, judge Whether it is true, if true, then the i-th example inst _i is a leader example, and let clu _i =i, otherwise, the i-th example inst _i is a voter example;

步骤2.11、统计领袖示例的个数和选民示例的个数，并分别记为N和M；Step 2.11, count the number of leader examples and the number of voter examples, and record them as N and M respectively;

步骤2.12、记N个领袖示例集为1≤α≤N；则与N个领袖示例集D^(l)相对应的内聚合度为表示第α个领袖示例的内聚合度；与N个领袖示例集D^(l)相对应的标签集为 ${lab}^{(l)} = {{lab}_{1}^{(l)}, {lab}_{2}^{(l)}, ..., {lab}_{α}^{(l)}, ..., {lab}_{N}^{(l)}};$ 表示第α个领袖示例的标签集；与N个领袖示例集D^(l)相对应的所属聚类为表示第α个领袖示例的所属聚类；Step 2.12, record N leader example sets as 1≤α≤N; then the degree of internal aggregation corresponding to the N leader example set D ^(l) is Denotes the αth leader instance The degree of internal aggregation; the label set corresponding to the N leader example set D ^(l) is ${lab}^{(l)} = {{lab}_{1}^{(l)}, {lab}_{2}^{(l)}, ..., {lab}_{α}^{(l)}, ..., {lab}_{N}^{(l)}};$ Denotes the αth leader instance label set; the cluster corresponding to the N leader example set D ^(l) is Denotes the αth leader instance belongs to the cluster;

步骤2.13、记M个选民示例集为1≤β≤M；则与M个选民示例集D^(v)相对应的内聚合度为表示第β个选民示例的内聚合度；与M个选民示例集D^(v)相对应的标签集为 ${lab}^{(v)} = {{lab}_{1}^{(v)}, {lab}_{2}^{(v)}, ..., {lab}_{β}^{(v)}, ..., {lab}_{M}^{(v)}};$ 表示第β个选民示例的标签集；与M个选民示例集D^(v)相对应的所属聚类为表示第β个选民示例的所属聚类；Step 2.13, record M voter example sets as 1≤β≤M; then the degree of inner aggregation corresponding to the set of M voter examples D ^(v) is Denotes the βth voter instance The degree of inner aggregation; the label set corresponding to the set of M voter examples D ^(v) is ${lab}^{(v)} = {{lab}_{1}^{(v)}, {lab}_{2}^{(v)}, ..., {lab}_{β}^{(v)}, ..., {lab}_{m}^{(v)}};$ Denotes the βth voter instance label set; the cluster corresponding to the M voter example set D ^(v) is Denotes the βth voter instance belongs to the cluster;

步骤3：获得M个选民示例集D^(v)的所属聚类clu^(v)：Step 3: Obtain the cluster clu ^(v) of M voter example set D ⁽ v):

步骤3.1、定义迭代次数χ；并初始化χ＝1；并定义第z个中转示例inst_z；z≥0；并初始化α＝1、β＝1、z＝0；第z个中转示例inst_z存储结构类似于常用的堆栈结构，本发明为了表述清晰，同时引入迭代次数χ，用来区分z相同时的中转示例；此时M个选民示例集D^(v)相对应的所属聚类的值皆为空；Step 3.1, define the number of iterations χ; and initialize χ=1; and define the zth transfer example inst _z ; z≥0; and initialize α=1, β=1, z=0; the zth transfer example inst _z stores The structure is similar to the commonly used stack structure. In order to clarify the expression, the present invention introduces the number of iterations ^χ at the same time to distinguish the transfer examples when z is the same; The values are all empty;

步骤3.2、从N个领袖示例集D^(l)中选取任第α个领袖示例获得第α个领袖示例为与第χ次迭代的第β个选民示例的标签的欧式距离 Step 3.2, select any α-th leader example from the N leader example set D ^(l) Obtain the αth leader example as With the βth voter example of the χth iteration The Euclidean distance of the label

步骤3.3、若时，则将β+1赋值给β，并判断β≤M是否成立，若成立，重复执行步骤3.3；否则执行步骤3.5；若时，判断第χ次迭代的第β个选民示例的所属聚类是否为空，若为空，则执行步骤3.4；否则，表示第χ次迭代的第β个选民示例的所属聚类的值为第χ次迭代现有的领袖示例的下标，记为执行步骤3.11；例如，第χ次迭代现有的领袖示例为inst₉，则 Step 3.3, if , then assign β+1 to β, and judge whether β≤M holds true, if true, repeat step 3.3; otherwise, go to step 3.5; if When , judge the β-th voter example of the χ-th iteration belongs to the cluster Is it empty, if it is empty, go to step 3.4; otherwise, it represents the βth voter example of the χth iteration belongs to the cluster The value of is the subscript of the existing leader instance in the χ iteration, denoted as Execute step 3.11; for example, the existing leader example of the χ iteration is inst ₉ , then

步骤3.4、将第α个领袖示例的下标α^(l)赋值给并将z+1赋值给z，令表示将第χ次迭代的第β个选民示例中的下标β_χ、标签集内聚合度和所属聚类均赋值给第χ次迭代的第z个中转示例的下标、标签集、内聚合度和所属聚类；并将β+1赋值给β；判断β≤M是否成立，若成立，则执行步骤3.3；否则执行步骤3.5；表示一个示例等于了另一个示例，其仅表示这两个示例对应的值相同，即将等号右边示例的下标、标签集、内聚合度和所属聚类赋值给等号左边示例的下标、标签集、内聚合度和所属聚类；Step 3.4, the αth leader example The subscript α ^(l) of is assigned to And assign z+1 to z, let Indicates that the β-th voter instance of the χ-th iteration will be The subscript β _χ in , the label set degree of internal polymerization and belong to the cluster Both are assigned to the z-th transition example of the χ-th iteration The subscript, label set, degree of inner aggregation and belonging cluster; and assign β+1 to β; judge whether β≤M holds true, if true, go to step 3.3; otherwise go to step 3.5; Indicates that one example is equal to another example, it only means that the corresponding values of the two examples are the same, that is, assign the subscript, label set, degree of inner aggregation, and cluster of the example on the right side of the equal sign to the subscript of the example on the left side of the equal sign, label set, degree of inner aggregation and belonging cluster;

步骤3.5、若z≤0，则执行步骤3.14；否则，将χ+1赋值给χ，并将依次赋值给对于其它与χ相关的参数，也需将χ-1关联的赋值给对应的χ关联的，以保持数据的连贯和一致性，譬如令β＝1；并获得所述第χ次迭代的第β个选民示例与第χ次迭代第z个中转示例的标签的欧式距离并将z-1赋值给z；Step 3.5, if z≤0, then execute step 3.14; otherwise, assign χ+1 to χ, and set in turn assigned to For other parameters related to χ, it is also necessary to assign the value associated with χ-1 to the corresponding value associated with χ, so as to maintain the coherence and consistency of the data, for example Let β = 1; and obtain the β-th voter instance of the χ-th iteration With the χth iteration zth transition example The Euclidean distance of the label And assign z-1 to z;

步骤3.8、利用式(5)获得第χ次迭代的第β选民示例与第χ次迭代现有的领袖示例的影响力 Step 3.8, using formula (5) to obtain the βth voter example of the χth iteration The influence of the existing leader example with the χth iteration

式(5)可推广到计算任一两个语义相同的示例的影响力的计算，只需要知道两个示例的内聚合度和两者标签的欧式距离，或是两个示例的属性聚合度和两者属性的欧式距离，套用公式(5)，便可获得两个示例间的影响力；Equation (5) can be extended to the calculation of the influence of any two examples with the same semantics, only need to know the degree of inner aggregation of the two examples and the Euclidean distance between the labels of the two examples, or the aggregation degree of attributes of the two examples and The Euclidean distance between the two attributes, the influence between the two examples can be obtained by applying the formula (5);

步骤3.9、利用式(6)获得第χ次迭代的第β个选民示例与第χ次迭代的第z个中转示例的影响力 Step 3.9, use formula (6) to obtain the β-th voter example of the χ-th iteration The z-th transition example with the χ-th iteration influence

步骤3.10、若则将β+1赋值给β，并执行步骤3.6；否则，令并将z+1赋值给z，令并将β+1赋值给β，并判断β≤M是否成立，若成立，则执行步骤3.6；否则执行步骤3.5；Step 3.10, if Then assign β+1 to β, and execute step 3.6; otherwise, let And assign z+1 to z, let And assign β+1 to β, and judge whether β≤M is true, if it is true, go to step 3.6; otherwise go to step 3.5;

步骤3.11、利用式(7)获得第χ次迭代的第β选民示例与第χ次迭代现有领袖示例的影响力 Step 3.11, use formula (7) to obtain the βth voter example of the χth iteration The influence of the existing leader example with the χth iteration

步骤3.13、若则将β+1赋值给β，并执行步骤3.3；否则，将第α个领袖示例的下标α^(l)赋值给并将z+1赋值给z，令并判断β≤M是否成立，若成立，则将β+1赋值给β，并执行步骤3.3；否则执行步骤3.5；Step 3.13, if Then assign β+1 to β, and execute step 3.3; otherwise, assign the αth leader instance The subscript α ^(l) of is assigned to And assign z+1 to z, let And judge whether β≤M is true, if it is true, assign β+1 to β, and execute step 3.3; otherwise, execute step 3.5;

步骤3.14、将α+1赋值给α；并判断α≤N是否成立，若成立，令β＝1，并执行步骤3.2；否则，执行步骤3.15；Step 3.14, assigning α+1 to α; and judging whether α≤N is true, if true, set β=1, and execute step 3.2; otherwise, execute step 3.15;

步骤3.15、将第χ次迭代时M个选民示例集D^(v)相对应的所属聚类依次赋值给M个选民示例集D^(v)相对应的所属聚类 Step 3.15, the clusters corresponding to the M voter example sets D ^(v) in the χth iteration Sequentially assign values to the corresponding clusters of M voter example sets D ^(v)

步骤3.16、判断是否还存在所属聚类为空的选民示例，若存在，则设置所属聚类为空的选民示例的所属聚类的值为-1；因此，选民示例的所属聚类可取的值的数目为N+1，分别对应N个领袖示例的所属聚类的值以及所属聚类为-1的情况；Step 3.16. Determine whether there are still voter examples whose clusters are empty, and if so, set the cluster value of the voter examples whose clusters are empty to -1; therefore, the possible value of the cluster of voter examples The number of is N+1, corresponding to the value of the cluster to which the N leader examples belong and the case where the cluster to which they belong is -1;

步骤4；采用支持向量机对预测示例进行粗分类：Step 4: Roughly classify the predicted examples using support vector machine:

4.1、建立由nump个预测示例组成的预测示例集P＝{instp₁,instp₂,…,instp_j,…,instp_nump}；instp_j表示第j个预测示例；1≤j≤nump；并有instp_j＝{attrp_j；labp_j}；arrtp_j表示第j个预测示例instp_j的属性集；labp_j表示第j个预测示例instp_j的标签集；记第j个预测示例instp_j的所属聚类为clup_j；本发明中预测示例和已知示例必须是同一对象，即对象的特征和语义相同，例如，已知示例是图片，则预测示例也需是图片，皆将色差，尺寸等需要详细描述的对象特征作为属性集，将风景图片，动物图片等非是即否的对象语义作为标签集，两个示例集拥有相同名称的属性集和标签集，但值各不相同，为表述清晰，本发明在论述时用不同符号进行区分；4.1. Establish a prediction example set P={instp ₁ , instp ₂ ,…,instp _j ,…,instp _nump } consisting of nmp prediction examples; instp _j represents the jth prediction example; 1≤j≤nump; and have instp _j = {attrp _j ; labp _j }; arrtp _j represents the attribute set of the jth prediction example instp _j ; labp _j represents the label set of the jth prediction example instp _j ; record the belonging cluster of the jth prediction example instp _j The class is clup _j ; in the present invention, the predicted example and the known example must be the same object, that is, the characteristics and semantics of the object are the same, for example, if the known example is a picture, the predicted example must also be a picture, and the color difference, size, etc. The detailed description of the object features is used as an attribute set, and the yes-or-no object semantics such as landscape pictures and animal pictures are used as a label set. The two example sets have the same attribute set and label set, but the values are different for clarity. , the present invention uses different symbols to distinguish when discussing;

4.2、以初始化示例集D相对应的num个所属聚类{clu₁,clu₂,…,clu_i,…,clu_num}作为训练标签，以初始化示例集D中的num个已知对象的属性集{attr₁,attr₂…,attr_i,…,attr_num}作为训练样本；以预测示例集P的nump个属性集{attrp₁,attrp₂…,attrp_j,…,attrp_nump}作为预测样本，并用支持向量机方法进行训练，获得nump个预测标签，将nump个预测标签分别赋值给预测示例集P的nump个所属聚类；从而完成对预测示例集P的粗分类；支持向量机方法通常有三个输入，分别为训练标签，训练样本和预测样本，从而得到一个输出，即预测标签；4.2. Use the num clusters {clu ₁ ,clu ₂ ,…,clu _i ,…,clu _num } corresponding to the initialization example set D as training labels to initialize the attributes of num known objects in the example set D Set {attr ₁ , attr ₂ ..., attr _i , ..., attr _num } as a training sample; take the nump attribute set {attrp ₁ , attrp ₂ ..., attrp _j , ..., attrp _nump } of the prediction example set P as a prediction sample , and use the support vector machine method to train, obtain nmp prediction labels, and assign the nump prediction labels to the nump clusters of the prediction example set P; thus complete the rough classification of the prediction example set P; the support vector machine method usually There are three inputs, namely training label, training sample and prediction sample, so as to obtain an output, that is, the prediction label;

步骤5.1、将所述初始化示例集D中num个示例和所述预测示例集P中nump个示例整合为第ψ次更新示例集 $D_{n e w}^{(ψ)} = {{inst}_{1}, {inst}_{2}, ..., {inst}_{i}, ..., {inst}_{n u m}; {instp}_{1}, {instp}_{2}, ..., {instp}_{j}, ..., {instp}_{n u m p}},$ 记为 $D_{n e w}^{(ψ)} = {{inst}_{1}^{(ψ)}, {inst}_{2}^{(ψ)}, ..., {inst}_{Ω}^{(ψ)}, ..., {inst}_{n u m + n u m p}^{(ψ)}};$ 表示第Ω个第ψ次更新示例；1≤Ω≤num+nump；ψ为更新次数，更新主要包括将现有初始化示例和预测示例整合成一个示例集，以及将完成多标签预测的预测示例集P的标签集赋值到所述对应的第ψ次更新示例集中，ψ的初始化为1，每完成一次更新后，将ψ+1赋值给ψ；Step 5.1. Integrate the num examples in the initialization example set D and the nmp examples in the prediction example set P into an updated example set for the ψth time ${D.}_{no e w}^{(ψ)} = {{inst}_{1}, {inst}_{2}, ..., {inst}_{i}, ..., {inst}_{no u m}; {instp}_{1}, {instp}_{2}, ..., {instp}_{j}, ..., {instp}_{no u m p}},$ recorded as ${D.}_{no e w}^{(ψ)} = {{inst}_{1}^{(ψ)}, {inst}_{2}^{(ψ)}, ..., {inst}_{Ω}^{(ψ)}, ..., {inst}_{no u m + no u m p}^{(ψ)}};$ Indicates the Ω-th ψ update example; 1≤Ω≤num+nump; ψ is the number of updates, the update mainly includes integrating the existing initialization examples and prediction examples into an example set, and the prediction example set that will complete the multi-label prediction The label set of P is assigned to the corresponding ψ-th update example set, ψ is initialized to 1, and ψ+1 is assigned to ψ after each update is completed;

步骤5.3、利用式(9)获得第Ω个第ψ次更新示例的属性聚合度从而获得第ψ次更新的num+nump个更新示例的属性聚合度 Step 5.3, using formula (9) to obtain the Ω-th ψ-th update example The degree of attribute aggregation So as to obtain the attribute aggregation degree of the num+nump update examples of the ψth update

当 $d_{Ω ξ}^{(ψ)} \leq d_{c}^{(γ)}$ 时， $f (d_{Ω ξ}^{(ψ)} - d_{c}^{(γ)}) = 1;$ 当 $d_{Ω ξ}^{(ψ)} > d_{c}^{(γ)}$ 时， $f (d_{Ω ξ}^{(ψ)} - d_{c}^{(γ)}) = 0;$ 求解属性聚合度公式和内聚合度公式近似，但由标签的欧式距离变成了属性的欧式距离；when $d_{Ω ξ}^{(ψ)} \leq d_{c}^{(γ)}$ hour, $f (d_{Ω ξ}^{(ψ)} - d_{c}^{(γ)}) = 1;$ when $d_{Ω ξ}^{(ψ)} > d_{c}^{(γ)}$ hour, $f (d_{Ω ξ}^{(ψ)} - d_{c}^{(γ)}) = 0;$ The formula for solving the attribute aggregation degree is similar to the inner aggregation degree formula, but the Euclidean distance of the label is changed to the Euclidean distance of the attribute;

步骤5.4、初始化j＝1；Step 5.4, initialize j=1;

步骤5.9、将所述完成多标签预测的预测示例集P的标签集赋值到所述对应的第ψ次更新示例集中，从而获得第ψ+1次更新示例集以所述第ψ+1次更新示例集作为新的初始化示例集进行自适应多标签预测，从而丰富现有训练集，提高下一轮预测的准确性，当出现新的具有相同的对象特征及相同的对象语义的预测示例时，只需从步骤4开始即可完成对新的预测示例进行多标签预测。Step 5.9: Assign the label set of the predicted example set P that has completed the multi-label prediction to the corresponding updated example set for the ψth time , so as to obtain the ψ+1th update example set Update the example set by the ψ+1th time Adaptive multi-label prediction is performed as a new initialization example set to enrich the existing training set and improve the accuracy of the next round of prediction. When there are new prediction examples with the same object features and the same object semantics, just Starting from step 4, the multi-label prediction of the new prediction example can be completed.

Claims

1. self-adaptation many Tag Estimations method, is characterized in that carrying out as follows:

Step 1: obtain initialization example set D:

Step 1.1, to be set up by the individual known object of num ' original illustration collection D '=inst ' ₁, inst ' ₂..., inst ' _a..., inst ' _{num '}, inst ' _arepresent the original illustration corresponding to a known object; 1≤a≤num '; And have inst ' _a=attr ' _a; Lab ' _a; Attr ' _arepresent the property set of described a known object feature; Lab ' _arepresent the tally set of described a known object semanteme; And have attr ' _a=attr ' _{a, 1}, attr ' _{a, 2}..., attr ' _a,n; Attr ' _a,nrepresent the n-th attribute of a known object; N is the attribute number of a known object; Lab ' _a=lab ' _{a, 1}, lab ' _{a, 2}..., lab ' _a,x..., lab ' _a,m; Lab ' _a,xrepresent an xth label of a known object; M is the number of tags of a known object; 1≤x≤m; And have: lab ' _a,x=1 represents that a known object semanteme meets an xth label; Lab ' _a,x=0 represents that a known object semanteme does not meet an xth label;

Step 1.2, to the property set of the num ' individual known object feature in described original illustration collection D ' attr ' ₁, attr ' ₂..., attr ' _a..., attr ' _{num '}be normalized respectively, obtain the individual known object feature of num ' after normalized property set attr " ₁, attr " ₂..., attr " _a..., attr " _{num '}; As the property set arrt of a known object feature after described normalization _a" when m corresponding label value is 0, delete the original illustration belonging to a known object after described normalization; Thus obtain the initialization example set D={inst of num example formation ₁, inst ₂..., inst _i..., inst _num; Inst _irepresent the example corresponding to i-th known object after initialization; And have inst _i={ attr _i; lab _i; Attr _irepresent the property set of i-th exemplary characteristics after initialization; lab _irepresent the tally set of described i-th exemplary semantic after initialization; 1≤i≤num;

Step 2: the clustering degree solving each example in described initialization example set D, thus determine the leader's example in initialization example set D, example not in the know and voter's example:

Step 2.1, using m label of each example in num example in described initialization example set D as m dimension coordinate, thus obtain i-th example inst _iwith a kth example inst _kthe Euclidean distance d of label _ik; 1≤k≤num and k ≠ i;

Step 2.2, definition iterations γ; And initialization γ=1; Define described i-th example inst _iaffiliated cluster be clu _i;

Step 2.3, formula (1) is utilized to obtain i-th example inst of the γ time iteration _ithe interior degree of polymerization thus obtain the interior degree of polymerization of num example of the γ time iteration and the degree of polymerization in maximum is designated as

ρ_{i}^{(γ)} = Σ_{k = 1}^{n u m} f (d_{i k} - d_{c}^{(γ)}) - - - (1)

In formula (1), it is the threshold value of the γ time iteration; When

d_{i k} \leq d_{c}^{(γ)}

Time,

f (d_{i k} - d_{c}^{(γ)}) = 1;

When

d_{i k} > d_{c}^{(γ)}

Time,

f (d_{i k} - d_{c}^{(γ)}) = 0;

Step 2.4, formula (2) or formula (3) is utilized to obtain i-th example inst of the γ time iteration _idiversity factor thus obtain the diversity factor of num example of the γ time iteration

δ^{(γ)} = {δ_{1}^{(γ)}, δ_{2}^{(γ)}, ..., δ_{i}^{(γ)}, ..., δ_{n u m}^{(γ)}} :

δ_{i}^{(γ)} = Σ_{k = 1}^{n u m} m a x (d_{i k}),

When

ρ_{i}^{(γ)} = ρ_{\max}^{(γ)} - - - (2)

when

ρ_{i}^{(γ)} &NotEqual; ρ_{\max}^{(γ)} - - - (3)

Step 2.5, diversity factor δ to num example of described the γ time iteration ^(γ)be normalized, obtain the diversity factor after normalization

δ^{' (γ)} = {δ_{1}^{' (γ)}, δ_{2}^{' (γ)}, ..., δ_{i}^{' (γ)}, ..., δ_{n u m}^{' (γ)}};

Step 2.6, formula (4) is utilized to obtain i-th example inst of the γ time iteration _iclustering degree thus obtain the clustering degree of num example of the γ time iteration

{sco}^{(γ)} = {{sco}_{1}^{(γ)}, {sco}_{2}^{(γ)}, ..., {sco}_{i}^{(γ)} ..., {sco}_{n u m}^{(γ)}} :

{sco}_{i}^{(γ)} = ρ_{i}^{(γ)} \times δ_{i}^{' (γ)} - - - (4)

Step 2.7, clustering degree sco to num example of described the γ time iteration ^(γ)carry out descending sort, obtain clustering degree series and order and described clustering degree series sco ' ^(γ)the corresponding interior degree of polymerization is

ρ^{(γ)} = {ρ_{1}^{' (γ)}, ρ_{2}^{' (γ)}, ..., ρ_{t}^{' (γ)}, ..., ρ_{n u m}^{' (γ)}};

represent and work as

{sco}_{i}^{(γ)} = {sco}_{t}^{' (γ)}

Time i-th example inst of the γ time iteration _ithe interior degree of polymerization; 1≤t≤num;

Step 2.8, initialization t=1;

Step 2.9, judgement and whether set up, if set up, then the threshold value of the γ time iteration for effective value, and after recording t, perform step 2.10; Otherwise, judge whether set up, if set up, then by t+1 assignment to t, and repeated execution of steps 2.9; Otherwise, amendment threshold value by γ+1 assignment to γ, and return execution step 2.3;

If i-th of step 2.10 the γ time iteration example inst _ithe interior degree of polymerization whether meet if meet, then described i-th example inst _ifor example not in the know, and make described i-th example inst _iaffiliated cluster clu _i=-1; Otherwise, judge whether set up, if set up, then i-th example inst _ifor leader's example, and make clu _i=i, otherwise, i-th example inst _ifor voter's example;

Step 2.11, add up the number of described leader's example and the number of described voter's example, and be designated as N and M respectively;

Step 2.12, remember that N number of leader's example set is

D^{(l)} = {{inst}_{1}^{(l)}, {inst}_{2}^{(l)}, ..., {inst}_{α}^{(l)}, ..., {inst}_{N}^{(l)}},

1≤α≤N; Then with described N number of leader example set D ^(l)the corresponding interior degree of polymerization is represent α leader's example the interior degree of polymerization; With described N number of leader example set D ^(l)corresponding tally set is

{lab}^{(l)} = {{lab}_{1}^{(l)}, {lab}_{2}^{(l)}, ..., {lab}_{α}^{(l)}, ..., {lab}_{N}^{(l)}};

represent α leader's example tally set; With described N number of leader example set D ^(l)corresponding affiliated cluster is

{clu}^{(l)} = {{clu}_{1}^{(l)}, {clu}_{2}^{(l)}, ..., {clu}_{α}^{(l)}, ..., {clu}_{N}^{(l)}};

represent α leader's example affiliated cluster;

Step 2.13, note M voter's example set are

D^{(v)} = {{inst}_{1}^{(v)}, {inst}_{2}^{(v)}, ..., {inst}_{β}^{(v)}, ..., {inst}_{M}^{(v)}},

1≤β≤M; Then with described M voter example set D ^(v)the corresponding interior degree of polymerization is

ρ^{(v) (γ)} = {ρ_{1}^{(v) (γ)}, ρ_{2}^{(v) (γ)}, ..., ρ_{β}^{(v) (γ)}, ..., ρ_{M}^{(v) (γ)}};

represent β voter's example the interior degree of polymerization; With described M voter example set D ^(v)corresponding tally set is

{lab}^{(v)} = {{lab}_{1}^{(v)}, {lab}_{2}^{(v)}, ..., {lab}_{β}^{(v)}, ..., {lab}_{M}^{(v)}};

represent β voter's example tally set; With described M voter example set D ^(v)corresponding affiliated cluster is

{clu}^{(v)} = {{clu}_{1}^{(v)}, {clu}_{2}^{(v)}, ..., {clu}_{β}^{(v)}, ..., {clu}_{M}^{(u)}};

represent β voter's example affiliated cluster;

Step 3: obtain described M voter example set D ^(v)affiliated cluster clu ^(v):

Step 3.1, definition iterations χ; And initialization χ=1; And define z transfer example inst _z; Z>=0; And initialization α=1, β=1, z=0;

Step 3.2, from described N number of leader example set D ^(l)in choose wantonly α leader's example obtaining described α leader's example is with β voter's example of the χ time iteration the Euclidean distance of label

If step 3.3 time, then by β+1 assignment to β, and judge whether β≤M sets up, if set up, repeated execution of steps 3.3; Otherwise perform step 3.5; If time, judge β voter's example of the χ time iteration affiliated cluster whether be empty, if it is empty, then perform step 3.4; Otherwise, represent β voter's example of the χ time iteration affiliated cluster value be the subscript of the χ time existing leader's example of iteration, be designated as perform step 3.11;

Step 3.4, by α leader's example subscript α ^(l)assignment is given and by z+1 assignment to z, order represent β voter's example of the χ time iteration in subscript β _χ, tally set the interior degree of polymerization with affiliated cluster equal assignment gives z transfer example of the χ time iteration subscript, tally set, the interior degree of polymerization and affiliated cluster; And by β+1 assignment to β; Judge whether β≤M sets up, if set up, then perform step 3.3; Otherwise perform step 3.5;

If step 3.5 z≤0, then perform step 3.14; Otherwise, by χ+1 assignment to χ, and will assignment is given successively make β=1; And obtain β voter's example of described the χ time iteration with the χ time iteration z transfer example the Euclidean distance of label and by z-1 assignment to z;

If step 3.6 time, then by β+1 assignment to β, and judge whether β≤M sets up, if set up, repeated execution of steps 3.6; Otherwise perform step 3.5; If time, judge β voter's example of the χ time iteration affiliated cluster whether be empty, if it is empty, then perform step 3.7; Otherwise, represent β voter's example of the χ time iteration affiliated cluster value be the subscript of the χ time existing leader's example of iteration, be designated as perform step 3.8;

Step 3.7, by z transfer example of the χ time iteration subscript z ^(χ)assignment is given and by z+1 assignment to z, order and by β+1 assignment to β; And judge whether β≤M sets up, if set up, then repeated execution of steps 3.6; Otherwise perform step 3.5;

Step 3.8, formula (5) is utilized to obtain β voter's example of the χ time iteration with the influence power of the existing leader's example of described the χ time iteration

{gra}_{β_{χ} ϵ}^{(v) (β_{χ})} = \frac{ρ_{β_{χ}}^{(v)} \times ρ_{ϵ}^{(β_{χ})}}{d_{β_{χ} ϵ}^{(v) (β_{χ})}} - - - (5)

Step 3.9, formula (6) is utilized to obtain β voter's example of the χ time iteration with z transfer example of the χ time iteration influence power

{gra}_{β_{χ} z}^{(v) (χ)} = \frac{ρ_{β_{χ}}^{(v)} \times ρ_{z}^{(χ)}}{d_{β_{χ} z}^{(v) (χ)}} - - - (6)

If step 3.10 then by β+1 assignment to β, and perform step 3.6; Otherwise, order and by z+1 assignment to z, order and by β+1 assignment to β, and judge whether β≤M sets up, if set up, then perform step 3.6; Otherwise perform step 3.5;

Step 3.11, formula (7) is utilized to obtain β voter's example of the χ time iteration with the influence power of the existing leader's example of described the χ time iteration

{gra}_{β_{χ} ϵ}^{(v) (β_{χ})} = \frac{ρ_{β_{χ}}^{(v)} \times ρ_{ϵ}^{(β_{χ})}}{d_{β_{χ} ϵ}^{(v) (β_{χ})}} - - - (7)

Step 3.12, formula (8) is utilized to obtain β voter's example of the χ time iteration with α leader's example influence power

{gra}_{β_{χ} α}^{(v) (l)} = \frac{ρ_{β_{χ}}^{(v)} \times ρ_{α}^{(l)}}{d_{β_{χ} α}^{(v) (l)}} - - - (8)

If step 3.13 then by β+1 assignment to β, and perform step 3.3; Otherwise, by α leader's example subscript α ^(l)assignment is given and by z+1 assignment to z, order and by β+1 assignment to β, and judge whether β≤M sets up, if set up, then perform step 3.3; Otherwise perform step 3.5;

Step 3.14, by α+1 assignment to α; And judge whether α≤N sets up, if set up, make β=1, and perform step 3.2; Otherwise perform step 3.15;

Step 3.15, by M voter example set D described during the χ time iteration ^(v)corresponding affiliated cluster assignment gives described M voter example set D successively ^(v)corresponding affiliated cluster

{{clu}_{1}^{(v)}, {clu}_{2}^{(v)}, ..., {clu}_{β}^{(v)}, ..., {clu}_{M}^{(v)}};

Step 3.16, to judge whether also to exist affiliated cluster be empty voter's example, if exist, then to arrange affiliated cluster be the value of the affiliated cluster of empty voter's example is-1;

Step 4; Support vector machine is adopted to carry out rough sort to prediction example:

4.1, the prediction example set P={instp be made up of nump prediction example is set up ₁, instp ₂..., instp _j..., instp _nump; Instp _jrepresent a jth prediction example; 1≤j≤nump; And have instp _j={ attrp _j; Labp _j; Arrtp _jrepresent a jth prediction example instp _jproperty set; Labp _jrepresent a jth prediction example instp _jtally set; Remember a described jth prediction example instp _jaffiliated cluster be clup _j;

4.2, with num the affiliated cluster { clu that described initialization example set D is corresponding ₁, clu ₂..., clu _i..., clu _numas training label, with the property set { attr of num known object in described initialization example set D ₁, attr ₂, attr _i..., attr _numas training sample; With nump the property set { attrp of described prediction example set P ₁, attrp ₂, attrp _j..., attrp _numpas forecast sample, and train with support vector machine method, obtain nump and predict label, give nump of described prediction example set P affiliated cluster by described nump prediction label difference assignment; Thus the rough sort completed described prediction example set P;

Step 5, to nump prediction example carry out many Tag Estimations;

Step 5.1, be upgrade example set the ψ time by nump exemplary integrated in num example in described initialization example set D and described prediction example set P

D_{n e w}^{(ψ)} = {{inst}_{1}, {inst}_{2}, ..., {inst}_{i}, ..., {inst}_{n u m}; {instp}_{1}, {instp}_{2}, ..., {instp}_{j}, ..., {instp}_{n u m p}},

Be designated as

D_{n e w}^{(ψ)} = {{inst}_{1}^{(ψ)}, {inst}_{2}^{(ψ)}, ..., {inst}_{Ω}^{(ψ)}, ..., {inst}_{n u m + n u m p}^{(ψ)}};

represent that Ω upgrades example the ψ time; 1≤Ω≤num+nump;

Step 5.2, described the ψ time renewal example set middle num+nump upgrade in example n attribute of each example respectively as n dimension coordinate, thus obtain Ω the ψ time renewal example example is upgraded the ψ time with ξ the Euclidean distance of attribute 1≤ξ≤num+nump and ξ ≠ Ω;

Step 5.3, formula (9) is utilized to obtain Ω the ψ time renewal example the attribute degree of polymerization thus obtain the attribute degree of polymerization of num+nump the renewal example upgraded for the ψ time

Γ_{Ω}^{(ψ)} = Σ_{ξ = 1}^{n u m + n u m p} f (d_{Ω ξ}^{(ψ)} - d_{c}^{(γ)}) - - - (9)

When

d_{Ω ξ}^{(ψ)} \leq d_{c}^{(γ)}

Time,

f (d_{Ω ξ}^{(ψ)} - d_{c}^{(γ)}) = 1;

When

d_{Ω ξ}^{(ψ)} > d_{c}^{(γ)}

Time,

f (d_{Ω ξ}^{(ψ)} - d_{c}^{(γ)}) = 0;

Step 5.4, initialization j=1;

If a jth prediction example instp in the described prediction example set P of step 5.5 _jaffiliated cluster be clup _jwith i-th known example inst in described initialization example set D _iaffiliated cluster be clu _iidentical; Formula (10) is then utilized to obtain i-th known example inst _iexample instp is predicted with jth _jinfluence power gra _ij:

{gra}_{i j} = \frac{Γ_{i} \times Γ_{j}}{d_{i j}} - - - (10)

In formula (10), Γ _irepresent known example inst _iexample set is upgraded at the ψ time the corresponding attribute degree of polymerization upgrading example, Γ _jrepresent prediction example instp _jexample set is upgraded at the ψ time the corresponding attribute degree of polymerization upgrading example, d _ijrepresent described i-th known example inst _iexample instp is predicted with jth _jthe Euclidean distance of attribute;

Step 5.6, repetition step 5.5, thus obtain a jth prediction example instp _jwith the influence power of described other known example of initialization example set D, and record maximum effect power gra _max;

If step 5.7 gra _ij=gra _max, then labp is made _j=lab _i, represent the tally set labp of described prediction example set P _jin each label and the tally set lab of described initialization example set D _iin each label identical, thus obtain the prediction example of jth many Tag Estimations;

Step 5.8, by j+1 assignment to j, and judge whether j≤nump sets up, if set up, then return step 5.5 and perform, otherwise, has represented many Tag Estimations nump being predicted to example;

2. self-adaptation many Tag Estimations method according to claim 1, is characterized in that: in described step 5, also comprises step 5.9, by the described the ψ time renewal example set of tally set assignment to described correspondence completing the prediction example set P of many Tag Estimations in, thus obtain ψ+1 renewal example set example set is upgraded+1 time with described ψ the many Tag Estimations of self-adaptation are carried out as new initialization example set.

3. self-adaptation many Tag Estimations method according to claim 1 and 2, it is characterized in that: when occur new there is the prediction example of identical characteristics of objects and identical Object Semanteme time, only need can complete from step 4 and many Tag Estimations are carried out to new prediction example.

4. self-adaptation many Tag Estimations method according to claim 1, is characterized in that in described step 2.9, amendment threshold value rule be: if then will deduct τ ₂assignment is given otherwise, will add τ ₂assignment is given 0.1≤τ ₂≤ 0.5,75%≤τ ₁< 100%.