CN105069129A - Self-adaptive multi-label prediction method - Google Patents

Self-adaptive multi-label prediction method Download PDF

Info

Publication number
CN105069129A
CN105069129A CN201510501816.7A CN201510501816A CN105069129A CN 105069129 A CN105069129 A CN 105069129A CN 201510501816 A CN201510501816 A CN 201510501816A CN 105069129 A CN105069129 A CN 105069129A
Authority
CN
China
Prior art keywords
gamma
inst
assignment
num
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510501816.7A
Other languages
Chinese (zh)
Other versions
CN105069129B (en
Inventor
胡学钢
王博岩
李培培
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN201510501816.7A priority Critical patent/CN105069129B/en
Publication of CN105069129A publication Critical patent/CN105069129A/en
Application granted granted Critical
Publication of CN105069129B publication Critical patent/CN105069129B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9562Bookmark management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a self-adaptive multi-label prediction method, which is characterized by comprising the following steps: 1. acquiring an initialization sample set; 2. acquiring a leader sample, an outsider sample, and an elector sample in the initialization sample set; 3. acquiring a cluster to which an elector sample set belongs; 4. performing coarse classification on a prediction sample by using a support vector machine; and 5. performing multi-label prediction on the prediction sample. According to the present invention, a label can be accurately added to network information, and accuracy, universality, interpretability and transferability of multi-label prediction can be improved, so as to implement intelligent information classification and processing in a big data environment.

Description

Self-adaptation many Tag Estimations method
Technical field
The invention belongs to intelligent information classification and process field, particularly relate to and a kind ofly can be applicable to the quick clustering of Multi-media information under large data environment and find self-adaptation many Tag Estimations method of density peaks point.
Background technology
Along with the fast development of network, quantity of information is just becoming geometric trend to increase, instantly microblogging, forum, micro-letter, Online Video, shopping at network and social networks all need the user friendly search of label and classification without exception, accurate and detailed label can allow user find needed for it rapidly on the one hand, businessman also can classify to user by label on the other hand, the product catering to its taste is recommended to different customer groups, thus avoid user because browsing a large amount of irrelevant information, valuable content is submerged in the ocean of information.If otherwise businessman cannot correct process information overload problem, will finally cause the continuous loss of consumer.
It is that independently single label carries out marking and marks how label converting for the sequence between label that the method at present adding label to information mainly contains many labels decomposition and inversion.Be converted into single label, ignored completely by the incidence relation between many labels, accuracy is low; Sequence between label not only needs a large amount of calculating, and after determining the sequence of label, also needs to determine the front label of this label further or rear label similarity degree is higher, therefore there is the not high defect of accuracy equally.
Compared to the present invention, there is following shortcoming in current disposal route:
1, the current network information is by the learning method of computing machine, the Forecasting Methodology made single label i.e. identification problem is more, but because many labels of information exist incidence relation, therefore utilize and decompose the method that many labels are single many labels, the accuracy of label is lower, can not reach practical purpose.
2, current many Tag Estimations technology often can only be handled it to given static data collection, as considered newly-added information, often needs to relearn, Reparametrization, can not accomplish automatically to adjust parameter with the change of data, therefore generalization is weak, and universality is poor.
3, processed by the order relation that many Tag Estimations of information transfer between label, not only need a large amount of calculating, and interpretation is poor, the accuracy of prediction is not high yet.
4, mostly existing many Tag Estimations technology is to improve a certain evaluate mark and design, and have ignored other standard, which results in the feature of its portable difference, the data centralization being only adapted at meeting some condition uses.
Summary of the invention
The present invention is the weak point existed to overcome prior art, a kind of self-adaptation many Tag Estimations method is provided, to label can be added to the network information exactly, improve the accuracy of many Tag Estimations, universality, interpretation and transferability, thus intelligent information classification and process under realizing large data environment.
The present invention is that technical solution problem adopts following technical scheme:
The feature of a kind of self-adaptation of the present invention many Tag Estimations method is carried out as follows:
Step 1: obtain initialization example set D:
Step 1.1, to be set up by the individual known object of num ' original illustration collection D '=inst ' 1, inst ' 2..., inst ' a..., inst ' num ', inst ' arepresent the original illustration corresponding to a known object; 1≤a≤num '; And have inst ' a=attr ' a; Lab ' a; Attr ' arepresent the property set of described a known object feature; Lab ' arepresent the tally set of described a known object semanteme; And have attr ' a=attr ' a, 1, attr ' a, 2..., attr ' a,n; Attr ' a,nrepresent the n-th attribute of a known object; N is the attribute number of a known object; Lab ' a=lab ' a, 1, lab ' a, 2..., lab ' a,x..., lab ' a,m; Lab ' a,xrepresent an xth label of a known object; M is the number of tags of a known object; 1≤x≤m; And have: lab ' a,x=1 represents that a known object semanteme meets an xth label; Lab ' a,x=0 represents that a known object semanteme does not meet an xth label;
Step 1.2, to the property set of the num ' individual known object feature in described original illustration collection D ' attr ' 1, attr ' 2..., attr ' a..., attr ' num 'be normalized respectively, obtain the individual known object feature of num ' after normalized property set attr " 1, attr " 2..., attr " a..., attr " num '; Property set arrt when a known object feature after described normalization " awhen m corresponding label value is 0, delete the original illustration belonging to a known object after described normalization; Thus obtain the initialization example set D={inst of num example formation 1, inst 2..., inst i..., inst num; Inst irepresent the example corresponding to i-th known object after initialization; And have inst i={ attr i; lab i; Attr irepresent the property set of i-th exemplary characteristics after initialization; lab irepresent the tally set of described i-th exemplary semantic after initialization; 1≤i≤num;
Step 2: the clustering degree solving each example in described initialization example set D, thus determine the leader's example in initialization example set D, example not in the know and voter's example:
Step 2.1, using m label of each example in num example in described initialization example set D as m dimension coordinate, thus obtain i-th example inst iwith a kth example inst kthe Euclidean distance d of label ik; 1≤k≤num and k ≠ i;
Step 2.2, definition iterations γ; And initialization γ=1; Define described i-th example inst iaffiliated cluster be clu i;
Step 2.3, formula (1) is utilized to obtain i-th example inst of the γ time iteration ithe interior degree of polymerization thus obtain the interior degree of polymerization of num example of the γ time iteration and the degree of polymerization in maximum is designated as
ρ i ( γ ) = Σ k = 1 n u m f ( d i k - d c ( γ ) ) - - - ( 1 )
In formula (1), it is the threshold value of the γ time iteration; When d i k ≤ d c ( γ ) Time, f ( d i k - d c ( γ ) ) = 1 ; When d i k > d c ( γ ) Time, f ( d i k - d c ( γ ) ) = 0 ;
Step 2.4, formula (2) or formula (3) is utilized to obtain i-th example inst of the γ time iteration idiversity factor thus obtain the diversity factor of num example of the γ time iteration δ ( γ ) = { δ 1 ( γ ) , δ 2 ( γ ) , ... , δ i ( γ ) , ... , δ n u m ( γ ) } :
δ i ( γ ) = Σ k = 1 n u m m a x ( d i k ) , When ρ i ( γ ) = ρ max ( γ ) - - - ( 2 )
when ρ i ( γ ) ≠ ρ max ( γ ) - - - ( 3 )
Step 2.5, diversity factor to num example of described the γ time iteration be normalized, obtain the diversity factor after normalization δ ′ ( γ ) = { δ 1 ′ ( γ ) , δ 2 ′ ( γ ) , ... , δ i ′ ( γ ) , ... , δ n u m ′ ( γ ) } ;
Step 2.6, formula (4) is utilized to obtain i-th example inst of the γ time iteration iclustering degree thus obtain the clustering degree of num example of the γ time iteration sco ( γ ) = { sco 1 ( γ ) , sco 2 ( γ ) , ... , sco i ( γ ) ... , sco n u m ( γ ) } :
sco i ( γ ) = ρ i ( γ ) × δ i ′ ( γ ) - - - ( 4 )
Step 2.7, clustering degree sco to num example of described the γ time iteration (γ)carry out descending sort, obtain clustering degree series and order and described clustering degree series sco ' (γ)the corresponding interior degree of polymerization is ρ ′ ( γ ) = { ρ 1 ′ ( γ ) , ρ 2 ′ ( γ ) , ... , ρ t ′ ( γ ) , ... , ρ n u m ′ ( γ ) } ; represent and work as sco i ( γ ) = sco t ′ ( γ ) Time i-th example inst of the γ time iteration ithe interior degree of polymerization; 1≤t≤num;
Step 2.8, initialization t=1;
Step 2.9, judgement and whether>=num × 3% is set up, if set up, then the threshold value of the γ time iteration for effective value, and after recording t, perform step 2.10; Otherwise, judge whether set up, if set up, then by t+1 assignment to t, and repeated execution of steps 2.9; Otherwise, amendment threshold value by γ+1 assignment to γ, and return execution step 2.3;
If i-th of step 2.10 the γ time iteration example inst ithe interior degree of polymerization whether meet if meet, then described i-th example inst ifor example not in the know, and make described i-th example inst iaffiliated cluster clu i=-1; Otherwise, judge whether set up, if set up, then i-th example inst ifor leader's example, and make clu i=i, otherwise, i-th example inst ifor voter's example;
Step 2.11, add up the number of described leader's example and the number of described voter's example, and be designated as N and M respectively;
Step 2.12, remember that N number of leader's example set is 1≤α≤N; Then with described N number of leader example set D (l)the corresponding interior degree of polymerization is represent α leader's example the interior degree of polymerization; With described N number of leader example set D (l)corresponding tally set is lab ( l ) = { lab 1 ( l ) , lab 2 ( l ) , ... , lab α ( l ) , ... , lab N ( l ) } ; represent α leader's example tally set; With described N number of leader example set D (l)corresponding affiliated cluster is represent α leader's example affiliated cluster;
Step 2.13, note M voter's example set are 1≤β≤M; Then with described M voter example set D (v)the corresponding interior degree of polymerization is represent β voter's example the interior degree of polymerization; With described M voter example set D (v)corresponding tally set is lab ( v ) = { lab 1 ( v ) , lab 2 ( v ) , ... , lab β ( v ) , ... , lab M ( v ) } ; represent β voter's example tally set; With described M voter example set D (v)corresponding affiliated cluster is represent β voter's example affiliated cluster;
Step 3: obtain described M voter example set D (v)affiliated cluster clu (v):
Step 3.1, definition iterations χ; And initialization χ=1; And define z transfer example inst z; Z>=0; And initialization α=1, β=1, z=0;
Step 3.2, from described N number of leader example set D (l)in choose wantonly α leader's example obtaining described α leader's example is with β voter's example of the χ time iteration the Euclidean distance of label
If step 3.3 time, then by β+1 assignment to β, and judge whether β≤M sets up, if set up, repeated execution of steps 3.3; Otherwise perform step 3.5; If time, judge β voter's example of the χ time iteration affiliated cluster whether be empty, if it is empty, then perform step 3.4; Otherwise, represent β voter's example of the χ time iteration affiliated cluster value be the subscript of the χ time existing leader's example of iteration, be designated as perform step 3.11;
Step 3.4, by α leader's example subscript α (l)assignment is given and by z+1 assignment to z, order represent β voter's example of the χ time iteration in subscript β χ, tally set the interior degree of polymerization with affiliated cluster equal assignment gives z transfer example of the χ time iteration subscript, tally set, the interior degree of polymerization and affiliated cluster; And by β+1 assignment to β; Judge whether β≤M sets up, if set up, then perform step 3.3; Otherwise perform step 3.5;
If step 3.5 z≤0, then perform step 3.14; Otherwise, by χ+1 assignment to χ, and will assignment is given successively make β=1; And obtain β voter's example of described the χ time iteration with the χ time iteration z transfer example the Euclidean distance of label and by z-1 assignment to z;
If step 3.6 time, then by β+1 assignment to β, and judge whether β≤M sets up, if set up, repeated execution of steps 3.6; Otherwise perform step 3.5; If time, judge β voter's example of the χ time iteration affiliated cluster whether be empty, if it is empty, then perform step 3.7; Otherwise, represent β voter's example of the χ time iteration affiliated cluster value be the subscript of the χ time existing leader's example of iteration, be designated as perform step 3.8;
Step 3.7, by z transfer example of the χ time iteration subscript z (χ)assignment is given and by z+1 assignment to z, order and by β+1 assignment to β; And judge whether β≤M sets up, if set up, then repeated execution of steps 3.6; Otherwise perform step 3.5;
Step 3.8, formula (5) is utilized to obtain β voter's example of the χ time iteration with the influence power of the existing leader's example of described the χ time iteration
gra β χ ϵ ( v ) ( β χ ) = ρ β χ ( v ) × ρ ϵ ( β χ ) d β χ ϵ ( v ) ( β χ ) - - - ( 5 )
Step 3.9, formula (6) is utilized to obtain β voter's example of the χ time iteration with z transfer example of the χ time iteration influence power
gra β χ z ( v ) ( χ ) = ρ β χ ( v ) × ρ z ( χ ) d β χ z ( v ) ( χ ) - - - ( 6 )
If step 3.10 then by β+1 assignment to β, and perform step 3.6; Otherwise, order and by z+1 assignment to z, order and by β+1 assignment to β, and judge whether β≤M sets up, if set up, then perform step 3.6; Otherwise perform step 3.5;
Step 3.11, formula (7) is utilized to obtain β voter's example of the χ time iteration with the influence power of the existing leader's example of described the χ time iteration
gra β χ ϵ ( v ) ( β χ ) = ρ β χ ( v ) × ρ ϵ ( β χ ) d β χ ϵ ( v ) ( β χ ) - - - ( 7 )
Step 3.12, formula (8) is utilized to obtain β voter's example of the χ time iteration with α leader's example influence power
gra β χ α ( v ) ( l ) = ρ β χ ( v ) × ρ α ( l ) d β χ α ( v ) ( l ) - - - ( 8 )
If step 3.13 then by β+1 assignment to β, and perform step 3.3; Otherwise, by α leader's example subscript α (l)assignment is given and by z+1 assignment to z, order and by β+1 assignment to β, and judge whether β≤M sets up, if set up, then perform step 3.3; Otherwise perform step 3.5;
Step 3.14, by α+1 assignment to α; And judge whether α≤N sets up, if set up, make β=1, and perform step 3.2; Otherwise perform step 3.15;
Step 3.15, by M voter example set D described during the χ time iteration (v)corresponding affiliated cluster assignment gives described M voter example set D successively (v)corresponding affiliated cluster
Step 3.16, to judge whether also to exist affiliated cluster be empty voter's example, if exist, then to arrange affiliated cluster be the value of the affiliated cluster of empty voter's example is-1;
Step 4; Support vector machine is adopted to carry out rough sort to prediction example:
4.1, the prediction example set P={instp be made up of nump prediction example is set up 1, instp 2..., instp j..., instp nump; Instp jrepresent a jth prediction example; 1≤j≤nump; And have instp j={ attrp j; Labp j; Arrtp jrepresent a jth prediction example instp jproperty set; Labp jrepresent a jth prediction example instp jtally set; Remember a described jth prediction example instp jaffiliated cluster be clup j;
4.2, with num the affiliated cluster { clu that described initialization example set D is corresponding 1, clu 2..., clu i..., clu numas training label, with the property set { attr of num known object in described initialization example set D 1, attr 2, attr i..., attr numas training sample; With nump the property set { attrp of described prediction example set P 1, attrp 2, attrp j..., attrp numpas forecast sample, and train with support vector machine method, obtain nump and predict label, give nump of described prediction example set P affiliated cluster by described nump prediction label difference assignment; Thus the rough sort completed described prediction example set P;
Step 5, to nump prediction example carry out many Tag Estimations;
Step 5.1, be upgrade example set the ψ time by nump exemplary integrated in num example in described initialization example set D and described prediction example set P D n e w ( ψ ) = { inst 1 , inst 2 , ... , inst i , ... , inst n u m ; instp 1 , instp 2 , ... , instp j , ... , instp n u m p } , Be designated as D n e w ( ψ ) = { inst 1 ( ψ ) , inst 2 ( ψ ) , ... , inst Ω ( ψ ) , ... , inst n u m + n u m p ( ψ ) } ; represent that Ω upgrades example the ψ time; 1≤Ω≤num+nump;
Step 5.2, described the ψ time renewal example set middle num+nump upgrade in example n attribute of each example respectively as n dimension coordinate, thus obtain Ω the ψ time renewal example example is upgraded the ψ time with ξ the Euclidean distance of attribute 1≤ξ≤num+nump and ξ ≠ Ω;
Step 5.3, formula (9) is utilized to obtain Ω the ψ time renewal example the attribute degree of polymerization thus obtain the attribute degree of polymerization of num+nump the renewal example upgraded for the ψ time
Γ Ω ( ψ ) = Σ ξ = 1 n u m + n u m p f ( d Ω ξ ( ψ ) - d c ( γ ) ) - - - ( 9 )
When d Ω ξ ( ψ ) ≤ d c ( γ ) Time, f ( d Ω ξ ( ψ ) - d c ( γ ) ) = 1 ; When d Ω ξ ( ψ ) > d c ( γ ) Time, f ( d Ωξ ( ψ ) - d c ( γ ) ) = 0 ;
Step 5.4, initialization j=1;
If a jth prediction example instp in the described prediction example set P of step 5.5 jaffiliated cluster be clup jwith i-th known example inst in described initialization example set D iaffiliated cluster be clu iidentical; Formula (10) is then utilized to obtain i-th known example inst iexample instp is predicted with jth jinfluence power gra ij:
gra i j = Γ i × Γ j d i j - - - ( 10 )
In formula (10), Γ irepresent known example inst iexample set is upgraded at the ψ time the corresponding attribute degree of polymerization upgrading example, Γ jrepresent prediction example instp jexample set is upgraded at the ψ time the corresponding attribute degree of polymerization upgrading example, d ijrepresent described i-th known example inst iexample instp is predicted with jth jthe Euclidean distance of attribute;
Step 5.6, repetition step 5.5, thus obtain a jth prediction example instp jwith the influence power of described other known example of initialization example set D, and record maximum effect power gra max;
If step 5.7 gra ij=gra max, then labp is made j=lab i, represent the tally set labp of described prediction example set P jin each label and the tally set lab of described initialization example set D iin each label identical, thus obtain the prediction example of jth many Tag Estimations;
Step 5.8, by j+1 assignment to j, and judge whether j≤nump sets up, if set up, then return step 5.5 and perform, otherwise, has represented many Tag Estimations nump being predicted to example;
The feature of self-adaptation many Tag Estimations method of the present invention is:
In described step 5, also comprise step 5.9, by the described the ψ time renewal example set of tally set assignment to described correspondence completing the prediction example set P of many Tag Estimations in, thus obtain ψ+1 renewal example set example set is upgraded+1 time with described ψ the many Tag Estimations of self-adaptation are carried out as new initialization example set.
When occur new there is the prediction example of identical characteristics of objects and identical Object Semanteme time, only need can complete from step 4 and many Tag Estimations are carried out to new prediction example.
In described step 2.9, amendment threshold value rule be: if then will deduct τ 2assignment is given otherwise, will add τ 2assignment is given 0.1≤τ 2≤ 0.5,75%≤τ 1< 100%.
Compared with the prior art, beneficial effect of the present invention is embodied in:
1, the present invention adopts the method that first rough sort is precisely predicted again, by the adaptivity contained by the present invention, by taking turns iteration, prediction label is constantly evolved more, and then obtain and predict the outcome more accurately than existing many Tag Estimations technology, be a method can putting into practical application.
2, the present invention is by initialization example set, different initialization example set can be determined according to different known object characteristic sum semanteme, make the present invention can be widely used in the most applied environment of existing network platform, from simple literal data, to audio frequency, and even image, all can have and make Tag Estimation preferably, strong compared to prior art universality.
3, the present invention represents poly-degree in example by calculating the degree of polymerization in acquisition, by calculating the degree of coupling obtaining diversity factor and represent example, and according to the clustering degree that the interior degree of polymerization and diversity factor solve out, each parameter has physical meaning, take into full account the Data classification requirement of the low coupling of high cohesion, easy to understand and explanation, thus while ensure that the present invention has higher forecasting accuracy, make the present invention have stronger portability, many Tag Estimations can be carried out under various conditions.
4, the present invention accurately can find the leader's example in each product scope by the interior degree of polymerization; For microblogging, forum and social networks, can find the key user that in different topic field, influence power is maximum, by studying in great detail its behavior exactly by this method, measurable to the possible trend in this field, and recommend accurately for the user in this field provides.
5, the present invention is by influence power between sample calculation and example, not only may be used on many Tag Estimations, also can carry out analogy to the example of the known label of identical semanteme, look for the example very similar with many labels of this example, recommend user, improve the experience of user.
6, the present invention is when predicting that many labels of example are determined, adopts and chooses and predict that the tally set of the known example that example is the most similar is as the method for tally set predicting example, can recommend emerging prediction example by the customer group of this known example; Can be emerging product and find its market orientation comparatively accurately, and find potential user for it.
7, the present invention is owing to adopting the method prediction example completing many Tag Estimations being joined initialization example set, thus enriched existing training set, improve the accuracy of next round prediction, the present invention is made to have the learning ability of adaptivity, in the face of the example newly added can improve available data set further, with the increase of known label example, the accuracy of the method prediction will be improved further.
Embodiment
In the present embodiment, a kind of self-adaptation many Tag Estimations method is carried out as follows:
Step 1: obtain initialization example set D:
Step 1.1, to be set up by the individual known object of num ' original illustration collection D '=inst ' 1, inst ' 2..., inst ' a..., inst ' num ', inst ' arepresent the original illustration corresponding to a known object; 1≤a≤num '; And have inst ' a=attr ' a; Lab ' a; Attr ' arepresent the property set of a known object feature; Lab ' arepresent the tally set of a known object semanteme; And have attr ' a=attr ' a, 1, attr ' a, 2..., attr ' a,n; Attr ' a,nrepresent the n-th attribute of a known object; N is the attribute number of a known object, lab ' a=lab ' a, 1, lab ' a, 2..., lab ' a,x..., lab ' a,m; Lab ' a,xrepresent an xth label of a known object; M is the number of tags of a known object; 1≤x≤m; And have: lab ' a,x=1 represents that a known object semanteme meets an xth label; Lab ' a,x=0 represents that a known object semanteme does not meet an xth label; Suppose, known object is picture, and by aberration, size etc. need the characteristics of objects described in detail as property set, by the value of accurate and detailed numeral as each attribute; By scenery picture, animal pictures etc. are non-be namely no Object Semanteme as tally set, represent with 0 and do not meet this label, represent with 1 and meet this label;
Step 1.2, to the property set of the individual known object feature of the num ' in original illustration collection D ' attr ' 1, attr ' 2..., attr ' a..., attr ' num 'be normalized respectively; In normalized, with the property set attr ' of a known object feature afor example, be namely first record attribute collection attr ' a, 1, attr ' a, 2..., attr ' a,nthe maximum attribute attr ' of intermediate value a, max, then with maximum attribute attr ' a, maxas denominator, carry out division calculation with attribute each in property set, just can obtain the property set attr of the known object feature after a normalized " a; The rest may be inferred obtain the individual known object feature of num ' after normalized property set attr " 1, attr " 2..., attr " a..., attr " num '; Property set arrt when a known object feature after normalization " awhen m corresponding label value is 0, delete the original illustration belonging to a known object after normalization; Thus obtain the initialization example set D={inst of num example formation 1, inst 2..., inst i..., inst num; Inst irepresent the example corresponding to i-th known object after initialization; And have inst i={ attr i; lab i; Attr irepresent the property set of i-th exemplary characteristics after initialization; lab irepresent the tally set of i-th exemplary semantic after initialization; 1≤i≤num; As shown in table 1:
Table 1: initialization example set D i-th example inst itables of data
attr i,1 attr i,n lab i,1 lab i,m ρ i δ i sco i clu i
inst i
Step 2: the clustering degree solving each example in initialization example set D, thus determine the leader's example in initialization example set D, example not in the know and voter's example:
Step 2.1, using m label of each example in num example in initialization example set D as m dimension coordinate, thus obtain i-th example inst iwith a kth example inst kthe Euclidean distance d of label ik; 1≤k≤num and k ≠ i; Such as, the Euclidean distance d of first example and second example tag is solved 12, first example and second example have the label of m same names, but due to value not necessarily identical, be then expressed as the tally set lab of first example 1={ lab 1,1, lab 1,2..., lab 1, mand the tally set lab of second example 2={ lab 2,1, lab 2,2..., lab 2, m, then the Euclidean distance d of label 12for d 12 = ( lab 1 , 1 - lab 2 , 1 ) 2 + ... + ( lab 1 , m - lab 2 , m ) 2 ;
Step 2.2, definition iterations γ; And initialization γ=1; Define i-th example inst iaffiliated cluster be clu i;
Step 2.3, formula (1) is utilized to obtain i-th example inst of the γ time iteration ithe interior degree of polymerization thus obtain the interior degree of polymerization of num example of the γ time iteration and the degree of polymerization in maximum is designated as
&rho; i ( &gamma; ) = &Sigma; k = 1 n u m f ( d i k - d c ( &gamma; ) ) - - - ( 1 )
In formula (1), it is the threshold value of the γ time iteration; When d ik &le; d c ( &gamma; ) Time, f ( d i k - d c ( &gamma; ) ) = 1 ; When d i k > d c ( &gamma; ) Time, f ( d i k - d c ( &gamma; ) ) = 0 ;
Step 2.4, formula (2) or formula (3) is utilized to obtain i-th example inst of the γ time iteration idiversity factor thus obtain the diversity factor of num example of the γ time iteration &delta; ( &gamma; ) = { &delta; 1 ( &gamma; ) , &delta; 2 ( &gamma; ) , ... , &delta; i ( &gamma; ) , ... , &delta; n u m ( &gamma; ) } :
&delta; i ( &gamma; ) = &Sigma; k = 1 n u m m a x ( d i k ) , When &rho; i ( &gamma; ) = &rho; max ( &gamma; ) - - - ( 2 )
when &rho; i ( &gamma; ) &NotEqual; &rho; max ( &gamma; ) - - - ( 3 )
Step 2.5, diversity factor to num example of the γ time iteration be normalized, obtain the diversity factor after normalization the diversity factor after normalization will be made by step 2.4 and step 2.5 have larger differentiation, make minority close to 1, major part value is all less than 0.5, and this will contribute to choosing of leader's example;
Step 2.6, formula (4) is utilized to obtain i-th example inst of the γ time iteration iclustering degree thus obtain the clustering degree of num example of the γ time iteration sco ( &gamma; ) = { sco 1 ( &gamma; ) , sco 2 ( &gamma; ) , ... , sco i ( &gamma; ) ... , sco n u m ( &gamma; ) } :
sco i ( &gamma; ) = &rho; i ( &gamma; ) &times; &delta; i &prime; ( &gamma; ) - - - ( 4 )
Step 2.7, clustering degree sco to num example of the γ time iteration (γ)carry out descending sort, obtain clustering degree series and order and clustering degree series sco ' (γ)the corresponding interior degree of polymerization is &rho; &prime; ( &gamma; ) = { &rho; 1 &prime; ( &gamma; ) , &rho; 2 &prime; ( &gamma; ) , ... , &rho; t &prime; ( &gamma; ) , ... , &rho; n u m &prime; ( &gamma; ) } ; represent and work as sco i ( &gamma; ) = sco t &prime; ( &gamma; ) Time i-th example inst of the γ time iteration ithe interior degree of polymerization; 1≤t≤num;
Step 2.8, initialization t=1;
Step 2.9, judgement and whether>=num × 3% is set up, if set up, then the threshold value of the γ time iteration for effective value, and after recording t, perform step 2.10; Otherwise, judge whether set up, if set up, then by t+1 assignment to t, and repeated execution of steps 2.9; Otherwise, amendment threshold value amendment threshold value rule be: if then will deduct τ 2assignment is given otherwise, will add τ 2assignment is given 0.1≤τ 2≤ 0.5,75%≤τ 1< 100%; By γ+1 assignment to γ, and return execution step 2.3; Judge and in the condition of>=num × 3%, 1.25 and 3% is not changeless, the present invention is that to be based upon example numbers be ten thousand grades, number of tags is below 20, have more excellent solution, when example numbers and number of tags change time, can take the circumstances into consideration to modify, its principle can ensure only to choose a small amount of example of clustering degree much larger than other example in step below as leader's example;
If i-th of step 2.10 the γ time iteration example inst ithe interior degree of polymerization whether meet if meet, then i-th example inst ifor example not in the know, and make i-th example inst iaffiliated cluster clu i=-1; Otherwise, judge whether set up, if set up, then i-th example inst ifor leader's example, and make clu i=i, otherwise, i-th example inst ifor voter's example;
Step 2.11, the number of statistics leader example and the number of voter's example, and be designated as N and M respectively;
Step 2.12, remember that N number of leader's example set is 1≤α≤N; Then with N number of leader example set D (l)the corresponding interior degree of polymerization is represent α leader's example the interior degree of polymerization; With N number of leader example set D (l)corresponding tally set is lab ( l ) = { lab 1 ( l ) , lab 2 ( l ) , ... , lab &alpha; ( l ) , ... , lab N ( l ) } ; represent α leader's example tally set; With N number of leader example set D (l)corresponding affiliated cluster is represent α leader's example affiliated cluster;
Step 2.13, note M voter's example set are 1≤β≤M; Then with M voter example set D (v)the corresponding interior degree of polymerization is represent β voter's example the interior degree of polymerization; With M voter example set D (v)corresponding tally set is lab ( v ) = { lab 1 ( v ) , lab 2 ( v ) , ... , lab &beta; ( v ) , ... , lab M ( v ) } ; represent β voter's example tally set; With M voter example set D (v)corresponding affiliated cluster is represent β voter's example affiliated cluster;
Step 3: obtain M voter example set D (v)affiliated cluster clu (v):
Step 3.1, definition iterations χ; And initialization χ=1; And define z transfer example inst z; Z>=0; And initialization α=1, β=1, z=0; Z transfer example inst zstorage organization is similar to conventional stack architecture, and the present invention is clear in order to state, and introduces iterations χ simultaneously, be used for distinguishing z identical time transfer example; Now M voter example set D (v)corresponding affiliated cluster value be all sky;
Step 3.2, from N number of leader example set D (l)in choose wantonly α leader's example obtaining α leader's example is with β voter's example of the χ time iteration the Euclidean distance of label
If step 3.3 time, then by β+1 assignment to β, and judge whether β≤M sets up, if set up, repeated execution of steps 3.3; Otherwise perform step 3.5; If time, judge β voter's example of the χ time iteration affiliated cluster whether be empty, if it is empty, then perform step 3.4; Otherwise, represent β voter's example of the χ time iteration affiliated cluster value be the subscript of the χ time existing leader's example of iteration, be designated as perform step 3.11; Such as, the χ time existing leader's example of iteration is inst 9, then
Step 3.4, by α leader's example subscript α (l)assignment is given and by z+1 assignment to z, order represent β voter's example of the χ time iteration in subscript β χ, tally set the interior degree of polymerization with affiliated cluster equal assignment gives z transfer example of the χ time iteration subscript, tally set, the interior degree of polymerization and affiliated cluster; And by β+1 assignment to β; Judge whether β≤M sets up, if set up, then perform step 3.3; Otherwise perform step 3.5; represent that an example has equaled another example, it only represents that value corresponding to these two examples is identical, by the subscript of example on the right of equal sign, tally set, the interior degree of polymerization and affiliated cluster assignment to the subscript of equal sign left side example, tally set, the interior degree of polymerization and affiliated cluster;
If step 3.5 z≤0, then perform step 3.14; Otherwise, by χ+1 assignment to χ, and will assignment is given successively for the parameter that other is relevant to χ, the assignment associated by χ-1 is also needed to associate to corresponding χ, to keep the coherent of data and consistance, such as make β=1; And obtain β voter's example of described the χ time iteration with the χ time iteration z transfer example the Euclidean distance of label and by z-1 assignment to z;
If step 3.6 time, then by β+1 assignment to β, and judge whether β≤M sets up, if set up, repeated execution of steps 3.6; Otherwise perform step 3.5; If time, judge β voter's example of the χ time iteration affiliated cluster whether be empty, if it is empty, then perform step 3.7; Otherwise, represent β voter's example of the χ time iteration affiliated cluster value be the subscript of the χ time existing leader's example of iteration, be designated as perform step 3.8;
Step 3.7, by z transfer example of the χ time iteration subscript z (χ)assignment is given and by z+1 assignment to z, order and by β+1 assignment to β; And judge whether β≤M sets up, if set up, then repeated execution of steps 3.6; Otherwise perform step 3.5;
Step 3.8, formula (5) is utilized to obtain β voter's example of the χ time iteration with the influence power of the χ time existing leader's example of iteration
gra &beta; &chi; &epsiv; ( v ) ( &beta; &chi; ) = &rho; &beta; &chi; ( v ) &times; &rho; &epsiv; ( &beta; &chi; ) d &beta; &chi; &epsiv; ( v ) ( &beta; &chi; ) - - - ( 5 )
Formula (5) extends to the calculating of the influence power calculating wantonly one or two semantic identical example, only need to know the interior degree of polymerization of two examples and the Euclidean distance of both labels, or the Euclidean distance of the attribute degree of polymerization of two examples and both attributes, apply mechanically formula (5), just can obtain the influence power between two examples;
Step 3.9, formula (6) is utilized to obtain β voter's example of the χ time iteration with z transfer example of the χ time iteration influence power
gra &beta; &chi; z ( v ) ( &chi; ) = &rho; &beta; &chi; ( v ) &times; &rho; z ( &chi; ) d &beta; &chi; z ( v ) ( &chi; ) - - - ( 6 )
If step 3.10 then by β+1 assignment to β, and perform step 3.6; Otherwise, order and by z+1 assignment to z, order and by β+1 assignment to β, and judge whether β≤M sets up, if set up, then perform step 3.6; Otherwise perform step 3.5;
Step 3.11, formula (7) is utilized to obtain β voter's example of the χ time iteration with the influence power of the χ time existing leader's example of iteration
gra &beta; &chi; &epsiv; ( v ) ( &beta; &chi; ) = &rho; &beta; &chi; ( v ) &times; &rho; &epsiv; ( &beta; &chi; ) d &beta; &chi; &epsiv; ( v ) ( &beta; &chi; ) - - - ( 7 )
Step 3.12, formula (8) is utilized to obtain β voter's example of the χ time iteration with α leader's example influence power
gra &beta; &chi; &alpha; ( v ) ( l ) = &rho; &beta; &chi; ( v ) &times; &rho; &alpha; ( l ) d &beta; &chi; &alpha; ( v ) ( l ) - - - ( 8 )
If step 3.13 then by β+1 assignment to β, and perform step 3.3; Otherwise, by α leader's example subscript α (l)assignment is given and by z+1 assignment to z, order and judge whether β≤M sets up, if set up, then by β+1 assignment to β, and perform step 3.3; Otherwise perform step 3.5;
Step 3.14, by α+1 assignment to α; And judge whether α≤N sets up, if set up, make β=1, and perform step 3.2; Otherwise, perform step 3.15;
Step 3.15, by M during the χ time iteration voter example set D (v)corresponding affiliated cluster assignment is to M voter example set D successively (v)corresponding affiliated cluster
Step 3.16, to judge whether also to exist affiliated cluster be empty voter's example, if exist, then to arrange affiliated cluster be the value of the affiliated cluster of empty voter's example is-1; Therefore, the number of the value that the affiliated cluster of voter's example is desirable is N+1, and the value of the affiliated cluster of corresponding N number of leader's example and affiliated cluster are the situation of-1 respectively;
Step 4; Support vector machine is adopted to carry out rough sort to prediction example:
4.1, the prediction example set P={instp be made up of nump prediction example is set up 1, instp 2..., instp j..., instp nump; Instp jrepresent a jth prediction example; 1≤j≤nump; And have instp j={ attrp j; Labp j; Arrtp jrepresent a jth prediction example instp jproperty set; Labp jrepresent a jth prediction example instp jtally set; A note jth prediction example instp jaffiliated cluster be clup j; Predict in the present invention that example and known example must be same targets, namely the characteristic sum semanteme of object is identical, such as, known example is picture, then predict that example also needs to be picture, all by aberration, sizes etc. need the characteristics of objects described in detail as property set, by scenery picture, animal pictures etc. are non-be namely no Object Semanteme as tally set, two example set have property set and the tally set of same names, but are worth different, clear for stating, the present invention distinguishes with distinct symbols when discussing;
4.2, with num the affiliated cluster { clu that initialization example set D is corresponding 1, clu 2..., clu i..., clu numas training label, with the property set { attr of the known object of the num in initialization example set D 1, attr 2, attr i..., attr numas training sample; To predict nump the property set { attrp of example set P 1, attrp 2, attrp j..., attrp numpas forecast sample, and train with support vector machine method, obtain nump prediction label, give cluster belonging to nump that predicts example set P by nump prediction label difference assignment; Thus the rough sort completed prediction example set P; Support vector machine method has three inputs usually, is respectively training label, training sample and forecast sample, thus obtains an output, namely predicts label;
Step 5, to nump prediction example carry out many Tag Estimations;
Step 5.1, be upgrade example set the ψ time by nump exemplary integrated in num example in described initialization example set D and described prediction example set P D n e w ( &psi; ) = { inst 1 , inst 2 , ... , inst i , ... , inst n u m ; instp 1 , instp 2 , ... , instp j , ... , instp n u m p } , Be designated as D n e w ( &psi; ) = { inst 1 ( &psi; ) , inst 2 ( &psi; ) , ... , inst &Omega; ( &psi; ) , ... , inst n u m + n u m p ( &psi; ) } ; represent that Ω upgrades example the ψ time; 1≤Ω≤num+nump; ψ is update times, upgrade mainly to comprise and existing initialization example is become an example set with prediction exemplary integrated, and the ψ time of the tally set assignment to described correspondence that complete the prediction example set P of many Tag Estimations is upgraded in example set, ψ is initialized as 1, often complete after once upgrading, by ψ+1 assignment to ψ;
Step 5.2, described the ψ time renewal example set middle num+nump upgrade in example n attribute of each example respectively as n dimension coordinate, thus obtain Ω the ψ time renewal example example is upgraded the ψ time with ξ the Euclidean distance of attribute 1≤ξ≤num+nump and ξ ≠ Ω;
Step 5.3, formula (9) is utilized to obtain Ω the ψ time renewal example the attribute degree of polymerization thus obtain the attribute degree of polymerization of num+nump the renewal example upgraded for the ψ time
&Gamma; &Omega; ( &psi; ) = &Sigma; &xi; = 1 n u m + n u m p f ( d &Omega; &xi; ( &psi; ) - d c ( &gamma; ) ) - - - ( 9 )
When d &Omega; &xi; ( &psi; ) &le; d c ( &gamma; ) Time, f ( d &Omega; &xi; ( &psi; ) - d c ( &gamma; ) ) = 1 ; When d &Omega; &xi; ( &psi; ) > d c ( &gamma; ) Time, f ( d &Omega; &xi; ( &psi; ) - d c ( &gamma; ) ) = 0 ; Solve attribute degree of polymerization formula and interior degree of polymerization formula is similar to, but become the Euclidean distance of attribute by the Euclidean distance of label;
Step 5.4, initialization j=1;
If a jth prediction example instp in the described prediction example set P of step 5.5 jaffiliated cluster be clup jwith i-th known example inst in described initialization example set D iaffiliated cluster be clu iidentical; Formula (10) is then utilized to obtain i-th known example inst iexample instp is predicted with jth jinfluence power gra ij:
gra i j = &Gamma; i &times; &Gamma; j d i j - - - ( 10 )
In formula (10), Γ irepresent known example inst iexample set is upgraded at the ψ time the corresponding attribute degree of polymerization upgrading example, Γ jrepresent prediction example instp jexample set is upgraded at the ψ time the corresponding attribute degree of polymerization upgrading example, d ijrepresent described i-th known example inst iexample instp is predicted with jth jthe Euclidean distance of attribute;
Step 5.6, repetition step 5.5, thus obtain a jth prediction example instp jwith the influence power of described other known example of initialization example set D, and record maximum effect power gra max;
If step 5.7 gra ij=gra max, then labp is made j=lab i, represent the tally set labp of described prediction example set P jin each label and the tally set lab of described initialization example set D iin each label identical, thus obtain the prediction example of jth many Tag Estimations;
Step 5.8, by j+1 assignment to j, and judge whether j≤nump sets up, if set up, then return step 5.5 and perform, otherwise, has represented many Tag Estimations nump being predicted to example;
Step 5.9, the described tally set assignment completing the prediction example set P of many Tag Estimations is upgraded example set the ψ time to described correspondence in, thus obtain ψ+1 renewal example set example set is upgraded+1 time with described ψ the many Tag Estimations of self-adaptation are carried out as new initialization example set, thus enrich existing training set, improve the accuracy of next round prediction, when occur new there is the prediction example of identical characteristics of objects and identical Object Semanteme time, only need can complete from step 4 and many Tag Estimations are carried out to new prediction example.

Claims (4)

1. self-adaptation many Tag Estimations method, is characterized in that carrying out as follows:
Step 1: obtain initialization example set D:
Step 1.1, to be set up by the individual known object of num ' original illustration collection D '=inst ' 1, inst ' 2..., inst ' a..., inst ' num ', inst ' arepresent the original illustration corresponding to a known object; 1≤a≤num '; And have inst ' a=attr ' a; Lab ' a; Attr ' arepresent the property set of described a known object feature; Lab ' arepresent the tally set of described a known object semanteme; And have attr ' a=attr ' a, 1, attr ' a, 2..., attr ' a,n; Attr ' a,nrepresent the n-th attribute of a known object; N is the attribute number of a known object; Lab ' a=lab ' a, 1, lab ' a, 2..., lab ' a,x..., lab ' a,m; Lab ' a,xrepresent an xth label of a known object; M is the number of tags of a known object; 1≤x≤m; And have: lab ' a,x=1 represents that a known object semanteme meets an xth label; Lab ' a,x=0 represents that a known object semanteme does not meet an xth label;
Step 1.2, to the property set of the num ' individual known object feature in described original illustration collection D ' attr ' 1, attr ' 2..., attr ' a..., attr ' num 'be normalized respectively, obtain the individual known object feature of num ' after normalized property set attr " 1, attr " 2..., attr " a..., attr " num '; As the property set arrt of a known object feature after described normalization a" when m corresponding label value is 0, delete the original illustration belonging to a known object after described normalization; Thus obtain the initialization example set D={inst of num example formation 1, inst 2..., inst i..., inst num; Inst irepresent the example corresponding to i-th known object after initialization; And have inst i={ attr i; lab i; Attr irepresent the property set of i-th exemplary characteristics after initialization; lab irepresent the tally set of described i-th exemplary semantic after initialization; 1≤i≤num;
Step 2: the clustering degree solving each example in described initialization example set D, thus determine the leader's example in initialization example set D, example not in the know and voter's example:
Step 2.1, using m label of each example in num example in described initialization example set D as m dimension coordinate, thus obtain i-th example inst iwith a kth example inst kthe Euclidean distance d of label ik; 1≤k≤num and k ≠ i;
Step 2.2, definition iterations γ; And initialization γ=1; Define described i-th example inst iaffiliated cluster be clu i;
Step 2.3, formula (1) is utilized to obtain i-th example inst of the γ time iteration ithe interior degree of polymerization thus obtain the interior degree of polymerization of num example of the γ time iteration and the degree of polymerization in maximum is designated as
&rho; i ( &gamma; ) = &Sigma; k = 1 n u m f ( d i k - d c ( &gamma; ) ) - - - ( 1 )
In formula (1), it is the threshold value of the γ time iteration; When d i k &le; d c ( &gamma; ) Time, f ( d i k - d c ( &gamma; ) ) = 1 ; When d i k > d c ( &gamma; ) Time, f ( d i k - d c ( &gamma; ) ) = 0 ;
Step 2.4, formula (2) or formula (3) is utilized to obtain i-th example inst of the γ time iteration idiversity factor thus obtain the diversity factor of num example of the γ time iteration &delta; ( &gamma; ) = { &delta; 1 ( &gamma; ) , &delta; 2 ( &gamma; ) , ... , &delta; i ( &gamma; ) , ... , &delta; n u m ( &gamma; ) } :
&delta; i ( &gamma; ) = &Sigma; k = 1 n u m m a x ( d i k ) , When &rho; i ( &gamma; ) = &rho; max ( &gamma; ) - - - ( 2 )
when &rho; i ( &gamma; ) &NotEqual; &rho; max ( &gamma; ) - - - ( 3 )
Step 2.5, diversity factor δ to num example of described the γ time iteration (γ)be normalized, obtain the diversity factor after normalization &delta; &prime; ( &gamma; ) = { &delta; 1 &prime; ( &gamma; ) , &delta; 2 &prime; ( &gamma; ) , ... , &delta; i &prime; ( &gamma; ) , ... , &delta; n u m &prime; ( &gamma; ) } ;
Step 2.6, formula (4) is utilized to obtain i-th example inst of the γ time iteration iclustering degree thus obtain the clustering degree of num example of the γ time iteration sco ( &gamma; ) = { sco 1 ( &gamma; ) , sco 2 ( &gamma; ) , ... , sco i ( &gamma; ) ... , sco n u m ( &gamma; ) } :
sco i ( &gamma; ) = &rho; i ( &gamma; ) &times; &delta; i &prime; ( &gamma; ) - - - ( 4 )
Step 2.7, clustering degree sco to num example of described the γ time iteration (γ)carry out descending sort, obtain clustering degree series and order and described clustering degree series sco ' (γ)the corresponding interior degree of polymerization is &rho; ( &gamma; ) = { &rho; 1 &prime; ( &gamma; ) , &rho; 2 &prime; ( &gamma; ) , ... , &rho; t &prime; ( &gamma; ) , ... , &rho; n u m &prime; ( &gamma; ) } ; represent and work as sco i ( &gamma; ) = sco t &prime; ( &gamma; ) Time i-th example inst of the γ time iteration ithe interior degree of polymerization; 1≤t≤num;
Step 2.8, initialization t=1;
Step 2.9, judgement and whether set up, if set up, then the threshold value of the γ time iteration for effective value, and after recording t, perform step 2.10; Otherwise, judge whether set up, if set up, then by t+1 assignment to t, and repeated execution of steps 2.9; Otherwise, amendment threshold value by γ+1 assignment to γ, and return execution step 2.3;
If i-th of step 2.10 the γ time iteration example inst ithe interior degree of polymerization whether meet if meet, then described i-th example inst ifor example not in the know, and make described i-th example inst iaffiliated cluster clu i=-1; Otherwise, judge whether set up, if set up, then i-th example inst ifor leader's example, and make clu i=i, otherwise, i-th example inst ifor voter's example;
Step 2.11, add up the number of described leader's example and the number of described voter's example, and be designated as N and M respectively;
Step 2.12, remember that N number of leader's example set is D ( l ) = { inst 1 ( l ) , inst 2 ( l ) , ... , inst &alpha; ( l ) , ... , inst N ( l ) } , 1≤α≤N; Then with described N number of leader example set D (l)the corresponding interior degree of polymerization is represent α leader's example the interior degree of polymerization; With described N number of leader example set D (l)corresponding tally set is lab ( l ) = { lab 1 ( l ) , lab 2 ( l ) , ... , lab &alpha; ( l ) , ... , lab N ( l ) } ; represent α leader's example tally set; With described N number of leader example set D (l)corresponding affiliated cluster is clu ( l ) = { clu 1 ( l ) , clu 2 ( l ) , ... , clu &alpha; ( l ) , ... , clu N ( l ) } ; represent α leader's example affiliated cluster;
Step 2.13, note M voter's example set are D ( v ) = { inst 1 ( v ) , inst 2 ( v ) , ... , inst &beta; ( v ) , ... , inst M ( v ) } , 1≤β≤M; Then with described M voter example set D (v)the corresponding interior degree of polymerization is &rho; ( v ) ( &gamma; ) = { &rho; 1 ( v ) ( &gamma; ) , &rho; 2 ( v ) ( &gamma; ) , ... , &rho; &beta; ( v ) ( &gamma; ) , ... , &rho; M ( v ) ( &gamma; ) } ; represent β voter's example the interior degree of polymerization; With described M voter example set D (v)corresponding tally set is lab ( v ) = { lab 1 ( v ) , lab 2 ( v ) , ... , lab &beta; ( v ) , ... , lab M ( v ) } ; represent β voter's example tally set; With described M voter example set D (v)corresponding affiliated cluster is clu ( v ) = { clu 1 ( v ) , clu 2 ( v ) , ... , clu &beta; ( v ) , ... , clu M ( u ) } ; represent β voter's example affiliated cluster;
Step 3: obtain described M voter example set D (v)affiliated cluster clu (v):
Step 3.1, definition iterations χ; And initialization χ=1; And define z transfer example inst z; Z>=0; And initialization α=1, β=1, z=0;
Step 3.2, from described N number of leader example set D (l)in choose wantonly α leader's example obtaining described α leader's example is with β voter's example of the χ time iteration the Euclidean distance of label
If step 3.3 time, then by β+1 assignment to β, and judge whether β≤M sets up, if set up, repeated execution of steps 3.3; Otherwise perform step 3.5; If time, judge β voter's example of the χ time iteration affiliated cluster whether be empty, if it is empty, then perform step 3.4; Otherwise, represent β voter's example of the χ time iteration affiliated cluster value be the subscript of the χ time existing leader's example of iteration, be designated as perform step 3.11;
Step 3.4, by α leader's example subscript α (l)assignment is given and by z+1 assignment to z, order represent β voter's example of the χ time iteration in subscript β χ, tally set the interior degree of polymerization with affiliated cluster equal assignment gives z transfer example of the χ time iteration subscript, tally set, the interior degree of polymerization and affiliated cluster; And by β+1 assignment to β; Judge whether β≤M sets up, if set up, then perform step 3.3; Otherwise perform step 3.5;
If step 3.5 z≤0, then perform step 3.14; Otherwise, by χ+1 assignment to χ, and will assignment is given successively make β=1; And obtain β voter's example of described the χ time iteration with the χ time iteration z transfer example the Euclidean distance of label and by z-1 assignment to z;
If step 3.6 time, then by β+1 assignment to β, and judge whether β≤M sets up, if set up, repeated execution of steps 3.6; Otherwise perform step 3.5; If time, judge β voter's example of the χ time iteration affiliated cluster whether be empty, if it is empty, then perform step 3.7; Otherwise, represent β voter's example of the χ time iteration affiliated cluster value be the subscript of the χ time existing leader's example of iteration, be designated as perform step 3.8;
Step 3.7, by z transfer example of the χ time iteration subscript z (χ)assignment is given and by z+1 assignment to z, order and by β+1 assignment to β; And judge whether β≤M sets up, if set up, then repeated execution of steps 3.6; Otherwise perform step 3.5;
Step 3.8, formula (5) is utilized to obtain β voter's example of the χ time iteration with the influence power of the existing leader's example of described the χ time iteration
gra &beta; &chi; &epsiv; ( v ) ( &beta; &chi; ) = &rho; &beta; &chi; ( v ) &times; &rho; &epsiv; ( &beta; &chi; ) d &beta; &chi; &epsiv; ( v ) ( &beta; &chi; ) - - - ( 5 )
Step 3.9, formula (6) is utilized to obtain β voter's example of the χ time iteration with z transfer example of the χ time iteration influence power
gra &beta; &chi; z ( v ) ( &chi; ) = &rho; &beta; &chi; ( v ) &times; &rho; z ( &chi; ) d &beta; &chi; z ( v ) ( &chi; ) - - - ( 6 )
If step 3.10 then by β+1 assignment to β, and perform step 3.6; Otherwise, order and by z+1 assignment to z, order and by β+1 assignment to β, and judge whether β≤M sets up, if set up, then perform step 3.6; Otherwise perform step 3.5;
Step 3.11, formula (7) is utilized to obtain β voter's example of the χ time iteration with the influence power of the existing leader's example of described the χ time iteration
gra &beta; &chi; &epsiv; ( v ) ( &beta; &chi; ) = &rho; &beta; &chi; ( v ) &times; &rho; &epsiv; ( &beta; &chi; ) d &beta; &chi; &epsiv; ( v ) ( &beta; &chi; ) - - - ( 7 )
Step 3.12, formula (8) is utilized to obtain β voter's example of the χ time iteration with α leader's example influence power
gra &beta; &chi; &alpha; ( v ) ( l ) = &rho; &beta; &chi; ( v ) &times; &rho; &alpha; ( l ) d &beta; &chi; &alpha; ( v ) ( l ) - - - ( 8 )
If step 3.13 then by β+1 assignment to β, and perform step 3.3; Otherwise, by α leader's example subscript α (l)assignment is given and by z+1 assignment to z, order and by β+1 assignment to β, and judge whether β≤M sets up, if set up, then perform step 3.3; Otherwise perform step 3.5;
Step 3.14, by α+1 assignment to α; And judge whether α≤N sets up, if set up, make β=1, and perform step 3.2; Otherwise perform step 3.15;
Step 3.15, by M voter example set D described during the χ time iteration (v)corresponding affiliated cluster assignment gives described M voter example set D successively (v)corresponding affiliated cluster { clu 1 ( v ) , clu 2 ( v ) , ... , clu &beta; ( v ) , ... , clu M ( v ) } ;
Step 3.16, to judge whether also to exist affiliated cluster be empty voter's example, if exist, then to arrange affiliated cluster be the value of the affiliated cluster of empty voter's example is-1;
Step 4; Support vector machine is adopted to carry out rough sort to prediction example:
4.1, the prediction example set P={instp be made up of nump prediction example is set up 1, instp 2..., instp j..., instp nump; Instp jrepresent a jth prediction example; 1≤j≤nump; And have instp j={ attrp j; Labp j; Arrtp jrepresent a jth prediction example instp jproperty set; Labp jrepresent a jth prediction example instp jtally set; Remember a described jth prediction example instp jaffiliated cluster be clup j;
4.2, with num the affiliated cluster { clu that described initialization example set D is corresponding 1, clu 2..., clu i..., clu numas training label, with the property set { attr of num known object in described initialization example set D 1, attr 2, attr i..., attr numas training sample; With nump the property set { attrp of described prediction example set P 1, attrp 2, attrp j..., attrp numpas forecast sample, and train with support vector machine method, obtain nump and predict label, give nump of described prediction example set P affiliated cluster by described nump prediction label difference assignment; Thus the rough sort completed described prediction example set P;
Step 5, to nump prediction example carry out many Tag Estimations;
Step 5.1, be upgrade example set the ψ time by nump exemplary integrated in num example in described initialization example set D and described prediction example set P D n e w ( &psi; ) = { inst 1 , inst 2 , ... , inst i , ... , inst n u m ; instp 1 , instp 2 , ... , instp j , ... , instp n u m p } , Be designated as D n e w ( &psi; ) = { inst 1 ( &psi; ) , inst 2 ( &psi; ) , ... , inst &Omega; ( &psi; ) , ... , inst n u m + n u m p ( &psi; ) } ; represent that Ω upgrades example the ψ time; 1≤Ω≤num+nump;
Step 5.2, described the ψ time renewal example set middle num+nump upgrade in example n attribute of each example respectively as n dimension coordinate, thus obtain Ω the ψ time renewal example example is upgraded the ψ time with ξ the Euclidean distance of attribute 1≤ξ≤num+nump and ξ ≠ Ω;
Step 5.3, formula (9) is utilized to obtain Ω the ψ time renewal example the attribute degree of polymerization thus obtain the attribute degree of polymerization of num+nump the renewal example upgraded for the ψ time
&Gamma; &Omega; ( &psi; ) = &Sigma; &xi; = 1 n u m + n u m p f ( d &Omega; &xi; ( &psi; ) - d c ( &gamma; ) ) - - - ( 9 )
When d &Omega; &xi; ( &psi; ) &le; d c ( &gamma; ) Time, f ( d &Omega; &xi; ( &psi; ) - d c ( &gamma; ) ) = 1 ; When d &Omega; &xi; ( &psi; ) > d c ( &gamma; ) Time, f ( d &Omega; &xi; ( &psi; ) - d c ( &gamma; ) ) = 0 ;
Step 5.4, initialization j=1;
If a jth prediction example instp in the described prediction example set P of step 5.5 jaffiliated cluster be clup jwith i-th known example inst in described initialization example set D iaffiliated cluster be clu iidentical; Formula (10) is then utilized to obtain i-th known example inst iexample instp is predicted with jth jinfluence power gra ij:
gra i j = &Gamma; i &times; &Gamma; j d i j - - - ( 10 )
In formula (10), Γ irepresent known example inst iexample set is upgraded at the ψ time the corresponding attribute degree of polymerization upgrading example, Γ jrepresent prediction example instp jexample set is upgraded at the ψ time the corresponding attribute degree of polymerization upgrading example, d ijrepresent described i-th known example inst iexample instp is predicted with jth jthe Euclidean distance of attribute;
Step 5.6, repetition step 5.5, thus obtain a jth prediction example instp jwith the influence power of described other known example of initialization example set D, and record maximum effect power gra max;
If step 5.7 gra ij=gra max, then labp is made j=lab i, represent the tally set labp of described prediction example set P jin each label and the tally set lab of described initialization example set D iin each label identical, thus obtain the prediction example of jth many Tag Estimations;
Step 5.8, by j+1 assignment to j, and judge whether j≤nump sets up, if set up, then return step 5.5 and perform, otherwise, has represented many Tag Estimations nump being predicted to example;
2. self-adaptation many Tag Estimations method according to claim 1, is characterized in that: in described step 5, also comprises step 5.9, by the described the ψ time renewal example set of tally set assignment to described correspondence completing the prediction example set P of many Tag Estimations in, thus obtain ψ+1 renewal example set example set is upgraded+1 time with described ψ the many Tag Estimations of self-adaptation are carried out as new initialization example set.
3. self-adaptation many Tag Estimations method according to claim 1 and 2, it is characterized in that: when occur new there is the prediction example of identical characteristics of objects and identical Object Semanteme time, only need can complete from step 4 and many Tag Estimations are carried out to new prediction example.
4. self-adaptation many Tag Estimations method according to claim 1, is characterized in that in described step 2.9, amendment threshold value rule be: if then will deduct τ 2assignment is given otherwise, will add τ 2assignment is given 0.1≤τ 2≤ 0.5,75%≤τ 1< 100%.
CN201510501816.7A 2015-06-24 2015-08-14 Adaptive multi-tag Forecasting Methodology Active CN105069129B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510501816.7A CN105069129B (en) 2015-06-24 2015-08-14 Adaptive multi-tag Forecasting Methodology

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201510355030.9A CN104915436A (en) 2015-06-24 2015-06-24 Adaptive multi-tag predication method
CN2015103550309 2015-06-24
CN201510501816.7A CN105069129B (en) 2015-06-24 2015-08-14 Adaptive multi-tag Forecasting Methodology

Publications (2)

Publication Number Publication Date
CN105069129A true CN105069129A (en) 2015-11-18
CN105069129B CN105069129B (en) 2018-05-18

Family

ID=54084499

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201510355030.9A Withdrawn CN104915436A (en) 2015-06-24 2015-06-24 Adaptive multi-tag predication method
CN201510501816.7A Active CN105069129B (en) 2015-06-24 2015-08-14 Adaptive multi-tag Forecasting Methodology

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201510355030.9A Withdrawn CN104915436A (en) 2015-06-24 2015-06-24 Adaptive multi-tag predication method

Country Status (1)

Country Link
CN (2) CN104915436A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108629358A (en) * 2017-03-23 2018-10-09 北京嘀嘀无限科技发展有限公司 The prediction technique and device of object type
CN110162692A (en) * 2018-12-10 2019-08-23 腾讯科技(深圳)有限公司 User tag determines method, apparatus, computer equipment and storage medium
US11379758B2 (en) 2019-12-06 2022-07-05 International Business Machines Corporation Automatic multilabel classification using machine learning

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106909540A (en) * 2015-12-23 2017-06-30 神州数码信息系统有限公司 A kind of smart city citizen's preference discovery technique based on Cooperative Study
CN106971713B (en) * 2017-01-18 2020-01-07 北京华控智加科技有限公司 Speaker marking method and system based on density peak value clustering and variational Bayes
CN108647711B (en) * 2018-05-08 2021-04-20 重庆邮电大学 Multi-label classification method of image based on gravity model
CN110547806B (en) * 2019-09-11 2022-05-31 湖北工业大学 Gesture action online recognition method and system based on surface electromyographic signals

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090164416A1 (en) * 2007-12-10 2009-06-25 Aumni Data Inc. Adaptive data classification for data mining
CN102004801A (en) * 2010-12-30 2011-04-06 焦点科技股份有限公司 Information classification method
CN102364498A (en) * 2011-10-17 2012-02-29 江苏大学 Multi-label-based image recognition method
CN102945371A (en) * 2012-10-18 2013-02-27 浙江大学 Classifying method based on multi-label flexible support vector machine
CN103077228A (en) * 2013-01-02 2013-05-01 北京科技大学 Set characteristic vector-based quick clustering method and device
CN103927394A (en) * 2014-05-04 2014-07-16 苏州大学 Multi-label active learning classification method and system based on SVM

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090164416A1 (en) * 2007-12-10 2009-06-25 Aumni Data Inc. Adaptive data classification for data mining
CN102004801A (en) * 2010-12-30 2011-04-06 焦点科技股份有限公司 Information classification method
CN102364498A (en) * 2011-10-17 2012-02-29 江苏大学 Multi-label-based image recognition method
CN102945371A (en) * 2012-10-18 2013-02-27 浙江大学 Classifying method based on multi-label flexible support vector machine
CN103077228A (en) * 2013-01-02 2013-05-01 北京科技大学 Set characteristic vector-based quick clustering method and device
CN103927394A (en) * 2014-05-04 2014-07-16 苏州大学 Multi-label active learning classification method and system based on SVM

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIN LI 等: "Active Learning with Multi-Label SVM Classification", 《PROCEEDINGS OF THE TWENTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE》 *
李培培: "数据流中概念漂移检测与分类方法研究", 《中国博士学位论文全文数据库·信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108629358A (en) * 2017-03-23 2018-10-09 北京嘀嘀无限科技发展有限公司 The prediction technique and device of object type
CN108629358B (en) * 2017-03-23 2020-12-25 北京嘀嘀无限科技发展有限公司 Object class prediction method and device
CN110162692A (en) * 2018-12-10 2019-08-23 腾讯科技(深圳)有限公司 User tag determines method, apparatus, computer equipment and storage medium
CN110162692B (en) * 2018-12-10 2021-05-25 腾讯科技(深圳)有限公司 User label determination method and device, computer equipment and storage medium
US11379758B2 (en) 2019-12-06 2022-07-05 International Business Machines Corporation Automatic multilabel classification using machine learning

Also Published As

Publication number Publication date
CN104915436A (en) 2015-09-16
CN105069129B (en) 2018-05-18

Similar Documents

Publication Publication Date Title
CN105069129A (en) Self-adaptive multi-label prediction method
CN106651519B (en) Personalized recommendation method and system based on label information
CN108287864B (en) Interest group dividing method, device, medium and computing equipment
CN103744981B (en) System for automatic classification analysis for website based on website content
CN104090890B (en) Keyword similarity acquisition methods, device and server
CN105022754B (en) Object classification method and device based on social network
CN110532479A (en) A kind of information recommendation method, device and equipment
CN111798273A (en) Training method of purchase probability prediction model of product and purchase probability prediction method
CN111428138A (en) Course recommendation method, system, equipment and storage medium
CN110674312B (en) Method, device and medium for constructing knowledge graph and electronic equipment
CN108876470B (en) Tag user expansion method, computer device, and storage medium
CN107291755B (en) Terminal pushing method and device
CN104794500A (en) Tri-training semi-supervised learning method and device
CN105205501A (en) Multi-classifier combined weak annotation image object detection method
CN107577786B (en) A kind of matrix decomposition recommendation method based on joint cluster
CN110647683A (en) Information recommendation method and device
CN106294500A (en) The method for pushing of content item, Apparatus and system
CN112380433A (en) Recommendation meta-learning method for cold-start user
CN104778283A (en) User occupation classification method and system based on microblog
CN109447273A (en) Model training method, advertisement recommended method, relevant apparatus, equipment and medium
CN105574480B (en) A kind of information processing method, device and terminal
CN112148994A (en) Information push effect evaluation method and device, electronic equipment and storage medium
CN110674854B (en) Image classification model training method, image classification method, device and equipment
CN104572915B (en) One kind is based on the enhanced customer incident relatedness computation method of content environment
CN110213660B (en) Program distribution method, system, computer device and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant