CN102567392A - Control method for interest subject excavation based on time window - Google Patents

Control method for interest subject excavation based on time window Download PDF

Info

Publication number
CN102567392A
CN102567392A CN201010613845XA CN201010613845A CN102567392A CN 102567392 A CN102567392 A CN 102567392A CN 201010613845X A CN201010613845X A CN 201010613845XA CN 201010613845 A CN201010613845 A CN 201010613845A CN 102567392 A CN102567392 A CN 102567392A
Authority
CN
China
Prior art keywords
user
mark records
window
mark
records
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201010613845XA
Other languages
Chinese (zh)
Inventor
林欣
滕跃
肖洁
何克勤
张波
贺樑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Normal University
Original Assignee
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Normal University filed Critical East China Normal University
Priority to CN201010613845XA priority Critical patent/CN102567392A/en
Publication of CN102567392A publication Critical patent/CN102567392A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention provides a control method for interest subject excavation based on a time window. The control method comprises the steps of: a. determining a user label record and a social label record; b. determining a standard label record according to the user label record and the social label record; c. generating a user interest subject tree according to the standard label record; d. building a window and determining the corresponding relation between the window and the subject tree; and e. calculating the weight of the subject tree according to the corresponding relation. The invention also provides a corresponding control device. According to the invention, a great deal of user comparison and calculation are not needed to be referred; the interest subject tree is built by utilizing a prior probability formula, thus only the label record of each item of the user is needed to be scanned; and in a synergic label system, the user label record has respective characteristics and can accurately reflect the user personality, so that the whole users have sociality on the premise of not losing personality.

Description

A kind of control method of excavating based on the time window interest topic
Technical field
The present invention relates to the technical field of user behavior service recommendation, specifically a kind of algorithm that is embodied as user's recommendation from user's historical behavior excavation user interest point.
Background technology
Web has become the important channel that people obtain information, because Web information is growing, so-called " information overload " problem just makes people have to spend the great amount of time removal search, browses the information that oneself needs.Search engine is the instrument such as the Google of the most general assist people retrieving information, Baidu or the like.Information retrieval technique has satisfied the certain demand of people, but owing to its universal property, still can not satisfy the query requests of different background, various objectives and different times.Especially at e-commerce field such as Amazon, Taobao etc., the product that how to let the convenient and swift acquisition of user oneself need, to improving the key of goodwill and profit, user's experience simultaneously also be attract that permanent user exists at all.The personalized recommendation technology proposes to this problem, and it provides different services for different user, to meet the different needs.Therefore personalized notion and correlative study are arisen at the historic moment.
The sight that the Business Studies and the academic research of commending system has been attracted very many researchers; Up to the present the theory such as the collaborative filtering technology (Collaborative Filtering) of a lot of classics have also been proposed, content-based filtration (Content-based Filtering).The collaborative filtering technology is based on the user carries out the demonstration scoring of project, and its advantage is to filter notion beyond expression of words, and shortcoming is to need the user initiatively to show project is marked.Content-based filtration is main to the descriptor of user to project; Utilize the text-processing technology to analyze; Its advantage is more accurately to give the user to recommend; Its defective is need normally set up correlation model analysis through extracting key word than higher requirement to text-processing simultaneously, and institute's spended time complexity is than higher.Development along with the Web2.0 technology; The user has been changed the creator of Web content into from the recipient of traditional network information; The user carries out freely marking for own interested project; The mark user who produces can have access in any place with on any machine at any time, can only browse in this locality and be different from traditional collection, and this is that collaborative labeling system produces and reasons of development.How in social relationships net (Social Network) and collaborative Mk system (Collaborative Tagging System), to do two new focuses recommending to become present commending system.At first, two main thoughts of commending system with hypothesis are exactly: 1. similar user has similar hobby, can come through the record of similar users to recommend for the active user; 2. still interested probably after its that like before the user.Can reflect user's interest behavior based on the method for collaborative mark; The user can also can search for relevant item with it own interested and that browse other users through its mark simply through its project is marked in collaborative labeling system.Because user's mark can reflect user's interest, and possibly there are a plurality of points of interest in the user, recommends to have very high value so the excavation that the user is carried out point of interest will produce the user.The record and the society of individual consumer's mark project found the user to the mark records analysis of project interest; Its real-time is high, resource overhead is little; Though there is very big precision in user's part mark; But can improve user's mark quality in conjunction with the social relationships net, promote and recommend precision and accuracy.
Summary of the invention
To defective of the prior art, the purpose of this invention is to provide a kind of control method and control corresponding device that excavates based on the time window interest topic.
According to an aspect of the present invention, the control method of excavating based on the time window interest topic is provided, comprise step: a. confirms user's mark records and social mark records; B. according to said user's mark records and the social mark records mark records that settles the standard; C. generate the user interest subject tree according to said standard mark records; D. set up the corresponding relation of window and definite said window and said subject tree; E. calculate the weight of said subject tree according to said corresponding relation.
According to another aspect of the present invention, the control device that excavates based on the time window interest topic is provided also, comprises device: first confirms device, and it is used for confirming user's mark records and social mark records; Second confirms device, and it is used for according to said user's mark records and the social mark records mark records that settles the standard; First generating apparatus, it is used for generating the user interest subject tree according to said standard mark records; First treating apparatus, the corresponding relation that it is used to set up window and confirms said window and said subject tree; First calculation element, it is used for calculating according to said corresponding relation the weight of said subject tree.
The purpose of this invention is to provide a kind of based on time window interest topic mining algorithm; This method is that user's historical mark records is analyzed with social mark; Mark to the user is set up hierarchical tree; A point of interest of every tree representative of consumer is introduced the time window weight then and is come point of interest is carried out, and the point of interest after the ordering can accurately reflect the diversity and the skewed popularity of user's interest in real time like this.
The objective of the invention is to realize like this:
A kind of based on time window interest topic mining algorithm; This algorithm need be analyzed user's history mark and social mark records; To the frequent mark that uses of user in user's individual's the historical mark records is the root of user's basic interest hierarchical tree; The historical mark records of society comes user's mark quality is improved; Set up the user interest tree through the root of interest hierarchical tree and the relation of other marks then, recommend through utilizing the point of interest hierarchical tree in collaborative labeling system, to retrieve to form at last.The concrete operations step:
The first step: extract user's mark records and social mark records through web page analysis;
Second step: the mass value of weighing user's mark records;
The 3rd step: the mass value according to user's mark records comes user record is handled.If the mass value of user's mark records is higher than the mass value of social mark records; Then user's individual mark records is as the standard mark records; Otherwise use social mark records as the standard mark records, at last the standard mark records is confirmed as user's mark records.
The 4th step: the frequency of utilization of statistics of user's key word, the line ordering of going forward side by side;
The 5th step: use the prior probability formula and set up the user interest subject tree;
The 6th step: use commending system time size to carry out split window to the user;
The 7th step: combine institute's split window, calculate the weight of each window;
The 8th step: the user interest subject tree is traveled through, it is mapped in each window, form the corresponding relation of interest topic tree and window;
The 9th step: calculate user interest subject tree weight, and it is sorted
The tenth step: get TOP-N user interest subject tree, carry out the coordinate indexing and the recommendation of resource;
The present invention was further characterized in that for the 3rd step and the 7th step, used prior probability to set up hierarchical tree, and user individual mark and social mark are combined.
Compare with background technology, the present invention has following advantage:
(1), Yi Hangxing: do not need relatively calculating with reference to a large number of users.Only need carry out the analysis of independent historical mark records to each user who grasps.A large amount of neighbours users' searching need be carried out unlike traditional collaborative filtering, just its hobby can be predicted then.Unlike content-based feature extraction, need to introduce text and semantic processing simultaneously
(2), simplicity: this algorithm respectively marks frequency according to the user's; Utilized the prior probability formula to set up the interest topic tree; The mark records that only needs each project of scanning user; The same JSON file that uses social mark records can pass through its project obtains, and promptly use society's mark to substitute to the lower user of mark mass ratio, helps promoting the recommendation precision that when searching for the user is produced.
(3), personalization and socialization combine.In the system of collaborative mark, the record of user's mark has characteristic separately, can accurately reflect user's individual character, but because the freedom of mark, possibly there is mass defect in user's mark.Society's mark then is the common approval of social most people, has certain credibility.To the label of certain customers, have a certain upgrade to recommending the accuracy meeting with the label replacement individual of socialization.Make all users not losing under the personalized prerequisite, have a social nature again.
Description of drawings
Through reading the detailed description of non-limiting example being done with reference to following accompanying drawing, it is more obvious that other features, objects and advantages of the present invention will become:
Fig. 1 illustrates according to the first embodiment of the present invention, the process flow diagram of the control method of excavating based on the time window interest topic;
Fig. 2 illustrates according to a second embodiment of the present invention, the process flow diagram of the control method of excavating based on the time window interest topic;
Fig. 3 illustrates a third embodiment in accordance with the invention, the structural drawing of the control device that excavates based on the time window interest topic;
Fig. 4 illustrates according to an embodiment of the present invention, the principle schematic of the control method of excavating based on the time window interest topic;
Fig. 5 illustrates according to another embodiment of the present invention, the principle schematic of the control method of excavating based on the time window interest topic;
Fig. 6 illustrates according to an embodiment of the present invention, based on the synoptic diagram of the excavation control method of user interest subject tree;
Fig. 7 illustrates according to an embodiment of the present invention; Divide the control method that theme excavates based on time window; Mainly be that the user interest subject tree of excavating is sorted; Find user's recently real topics of interest tree, so last user is recommended just to seem more accurate.At first; Use the time of this system to divide on a time period the user; User's time just has been divided into many little time windows like this, the window of dividing is numbered since 1 once to increase progressively simultaneously, and the numbering of window is to follow such principle; The nearest mark records of user is to be in the wicket numbering, and the subject tree that the mark records in past is set up must be in the window of big window slogan.Support collection with the interest topic that the goes out tree of user's excavation maps in the window of dividing just now then, calculates the weight size of corresponding subject tree in window, for following choosing of TOP-N subject tree foundation is provided then.
Embodiment
The invention discloses a kind of based on time window interest topic mining algorithm; The present invention need be in the commending system environment of a reality; Historical mark records through to user items grasps and analyzes the record of the mark of this project with society, obtains user's interest topic tree, and it is mapped in the time window; It sorts to each according to this, then to its interest topic tree retrieve resources.Analytic process is simple, need not complicated algorithm; Can the real time reaction user interest, and provide prediction more accurately; User individual is combined with socialization, make prediction interest more near user's true predictive.
With reference to figure 4 and Fig. 5; The present invention need be in the commending system environment of a reality; Historical mark records through to user items grasps and analyzes the record of the mark of this project with society, obtains user's interest topic tree, and it is mapped in the time window; It sorts to each according to this
The method for building up of described user interest subject tree; Be to consider through statistics of user's key word frequency of utilization; Introduce the notion of user's mark records mass value simultaneously, the processing that user's mark records is correlated with improves the quality of user's mark records, then; Use the prior probability formula and judge the affiliated hierarchical relationship of each key word, set up the user interest subject tree.
Described window size is divided its user interest subject tree is shone upon, and is to consider the relative importance of user interest subject tree to the user, and according to the collaborative filtering principle: the user is to its interested project in the past, in the future also might be interested equally.Simultaneously there is diversity in user's interest, needs its nearest interested raising respective weights, retrieves and recommends through adjusting the sequence that each user interest subject tree comes to the end, can improve the accuracy of recommendation like this
Fig. 1 illustrates according to the first embodiment of the present invention, the process flow diagram of the control method of excavating based on the time window interest topic.Particularly, in the present embodiment, at first execution in step S210 confirms user's mark records and social mark records.Execution in step S211 then is according to said user's mark records and the social mark records mark records that settles the standard.Execution in step S212 generates the user interest subject tree according to said standard mark records then.Execution in step S213 sets up the corresponding relation that window is also confirmed said window and said subject tree then.Execution in step S214 then calculates the weight of said subject tree according to said corresponding relation.Execution in step S215 sorts to said subject tree according to said weight then.Last execution in step S216 will give the user with the forward corresponding commending contents of some subject trees of sorting position.
In a variant of present embodiment, said step S215 and step S216 can be omitted.
Fig. 2 illustrates according to a second embodiment of the present invention, the process flow diagram of the control method of excavating based on the time window interest topic.It will be appreciated by those skilled in the art that and can this enforcement be interpreted as an embodiment embodiment illustrated in fig. 1.Particularly, in the present embodiment, at first execution in step S220 extracts said user's mark records and social mark records from webpage.Whether execution in step S221 then, the mass value of judging said user's mark records be greater than the mass value of said social mark records.Further, if the judged result of said step S221 is sure, next the mass value of promptly said user's mark records then gets into step S2221 and continues to carry out greater than the mass value of said social mark records; If the judged result of said step S221 negates that the mass value of promptly said user's mark records is not more than the mass value of said social mark records, then next gets into step S2222 and continues to carry out.Wherein, through execution in step S2221, said user's mark records is confirmed as said standard mark records.Wherein, through execution in step S2222, said social mark records is confirmed as said standard mark records.Execution in step S223 confirms keyword according to said standard mark records then.Execution in step S224 then adds up the frequency of utilization generated frequency ranking results of said keyword.Execution in step S225 sets up said user interest tree according to said frequency ranking results then.Execution in step S226 carries out split window to user's size service time then.Execution in step S227 in conjunction with all split windows, calculates the weight of each window then.Execution in step S228 travels through said user interest tree then, and it is mapped in each window.Last execution in step S229 calculates the weight of said subject tree according to said corresponding relation.
Those skilled in the art can be interpreted as said step S220 the embodiment of the said step S210 among Fig. 1; Said step S221, step S2221 and step S2222 are interpreted as the embodiment of the said step S211 among Fig. 1; Said step S223, step S224 and step S225 are interpreted as the embodiment of the said step S212 among Fig. 1; Said step S226, step S227 and step S228 are interpreted as the embodiment of the said step S213 among Fig. 1.
In a preference of present embodiment, said step S220 comprises that step " is confirmed the mass value of mark records among the user " and step " mass value of established standards mark records and the mark records that settles the standard ".
In another preference of present embodiment, said step S225 comprises step " application prior probability formula is set up said user interest tree ".
Fig. 3 illustrates a third embodiment in accordance with the invention, the structural drawing of the control device that excavates based on the time window interest topic.Particularly, in the present embodiment, said control device 4 comprises that first confirms device 41, and it is used for confirming user's mark records and social mark records; Second confirms device 42, and it is used for according to said user's mark records and the social mark records mark records that settles the standard; First generating apparatus 43, it is used for generating the user interest subject tree according to said standard mark records; First treating apparatus 44, the corresponding relation that it is used to set up window and confirms said window and said subject tree; First calculation element 45, it is used for calculating according to said corresponding relation the weight of said subject tree.Preferably, can also comprise first collator 46, it is used for according to said weight said subject tree being sorted; First recommendation apparatus 47, it is used for giving the user with the forward corresponding commending contents of some subject trees of sorting position.
Preferably, said first confirms that device 41 comprises first extraction element, and it is used for extracting said user's mark records and social mark records from webpage.
Preferably, said second confirms that device 42 comprises first judgment means 421, and whether its mass value that is used to judge said user's mark records is greater than the mass value of said social mark records; The 3rd confirms device 422, when its judged result that is used for when said first judgment means is sure, said user's mark records is confirmed as said standard mark records; The 4th confirms device 423, its judged result that is used for when said first judgment means be negate the time, said social mark records is confirmed as said standard mark records.
Preferably, said first generating apparatus 43 comprises that the 5th confirms device 431, and it is used for confirming keyword according to said standard mark records; Second treating apparatus 432, it is used to add up the frequency of utilization generated frequency ranking results of said keyword; First apparatus for establishing 433, it is used for setting up said user interest tree according to said frequency ranking results.
Preferably, said first treating apparatus 44 comprises the 3rd treating apparatus 441, and it is used for user's size service time is carried out split window; Second calculation element 442, it is used to combine all split windows, calculates the weight of each window; The manages device 443 everywhere, and it is used for said user interest tree is traveled through, and it is mapped in each window.
In a preference of present embodiment, said first extraction element 411 comprises that the 6th confirms device, and it is used for the mass value of definite user's mark records; First deriving means, it is used for obtaining said user's mark records and social mark records from webpage.
In another preference of present embodiment, said first apparatus for establishing 433 comprises second apparatus for establishing, and it is used to use the prior probability formula and sets up said user interest tree.
Further, Fig. 4 illustrates according to an embodiment of the present invention, the synoptic diagram of the control method of excavating based on the time window interest topic.At first we obtain the mark records and its social mark records of all user resources from webpage; Thereby set up the system resource database; Then its user being carried out personal data loads and analyzes; The mass value that the calculates user's mark records mark records that settles the standard, if the mass value of user's mark records is higher than the mass value of social mark records, then user's individual mark records is as the standard mark records; Otherwise use social mark records as the standard mark records, thereby the standard mark records is confirmed as user's mark records.Secondly; User's mark records through after confirming is come for the user sets up the user interest subject tree, thereby for the user sets up model, at last the interest topic tree is mapped to time window; Calculate the weight of each window simultaneously; To the ordering of its user interest subject tree, choose the forward interest topic of rank and set the system resource database and carry out resource retrieval, the TOP-N of its result for retrieval is recommended the user.
Further, Fig. 5 illustrates according to another embodiment of the present invention, the synoptic diagram of the control method of excavating based on the time window interest topic.It will be appreciated by those skilled in the art that this embodiment shown in Figure 5 and embodiment illustrated in fig. 2 can being achieved through similar mode.For example, step 1 shown in Figure 5~9 can be achieved with reference to each step shown in Figure 2, do not repeat them here.Further, embodiment shown in Figure 5 also comprises step 10, promptly " calculate the weight of each interest topic tree; and to its ordering "; Execution in step 11 " is got TOP-N user interest subject tree, is carried out the coordinate indexing of resource " then, gets into step 12 " recommendation resource " at last.
It will be apparent to those skilled in the art that; Above-mentioned steps 11 can realize through following mode: the mass value that the social mark records of the mark records of its resource and corresponding resource is calculated user's mark records through this user; Concrete calculating as follows: at first the society through a certain resource mark records of user and respective resources writes down to occur simultaneously and marks number divided by the mark number of its union mark records quality as this resource; Calculate the mark records value of all resources of this user then according to this with quadrat method; Mean value through its all resource mark records values of this user is as this user's mark records mass value at last, the mark records that relatively settles the standard of the recording quality value through itself and society mark.Following such as a user: Item_User (tag to the mark of a resource 1, tag 2, tag 3, tag 4) simultaneously the social mark records of this resource be Item_Social (tag 1, tag 3, tag 5, tag 6), then this user to the mark mass value of this resource is:
| Item _ User ∩ Item _ Social | | Item _ User ∪ Item _ Social |
The user's here mark mass value is exactly that 2/6 to occur simultaneously be that identical mark records has 2 tag 1, ag 3Union is that all mark records numbers are 6 tag 1, tag 2, tag 3, tag 4, tag 5, tag 6); The threshold value of the recording quality value of society's mark is α; Compare through mean value and the α that calculates these all resource qualities of user then; Thereby settle the standard mark records, if the mark mass value of these all resources of user is higher than social mark records mass value, then with this user's mark records as the standard mark records; Otherwise with the social mark records of all respective resources of this user as the standard mark records, at last with the mark records of the standard mark records of confirming as the corresponding resource of this user.Further, said " just use the prior probability formula according to frequency and set up the user interest subject tree " can realize through following mode: for example, a certain user U is just like one group of mark records shown in the following table one, and promptly table one has shown that the user marks example:
tags
post 1 java,xml,jdom
post 2 java,xml,dom4j
post 3 java,classloader,jvm
post 4 java,classloader
post 5 linux,shell
post 6 linux,ubuntu
The resource that marked of post representative of consumer wherein; The tags representative of consumer is the record that corresponding resource marked; Obtain table 2 through the notion of the support in all mark records of user (being the corresponding row of tags) and the correlation rule being carried out the frequency statistics that record that each marked carries out each mark records here, promptly the label support is tabulated.Wherein, support shown in the table representes the number of times that mark records occurs in all mark records of this user, supports the appearance of this mark records of collection SupportSet representative of consumer in which resource:
Tags Support SupportSet
java
4 post 1,post 2,post 3,post 4
classloader 2 post 3,post 4
xml 2 post 1,post 2
linux 2 post 5,post 6
dom4j 1 post 2
shell 1 post 5
jdom 1 post 1
jvm 1 post 3
ubuntu 1 post 6
The mark records that at first selecting frequency is the highest is as the root of tree, because his support is the highest.But for label classloader, xml and linux, their support is all 2, and how letting computing machine discern a label is a branch node or root node, and this just need discern through the method for statistics according to the relation between the label.Very many like a label with the number of times that known root occurs simultaneously; Occurred 2 times with classloader and xml respectively such as java; And they separately support be 2; Just under the situation of classloader and xml appearance, all with the java co-occurrence, then decidable classloader and xml are the child nodes of a well-known root node java.Linux then as one with well-known root node and uncorrelated high-frequency label, the root node that can independently set as another interest topic.Here, judge the situation as classloader and linux node type, we adopt the prior probability formula to calculate, and suppose SupportSet (t i) and SupportSet (t i) represent mark records t respectively iSupport with support collection, but setting threshold α so, if
p ( t 1 | t 2 ) = | SupportSet ( t 1 ) ∩ SupportSet ( t 2 ) | Support ( t 2 ) ≥ α - - - ( 1 )
Then can judge t 2Be to be under the jurisdiction of t 1, specific in the last example because p (java|xml) and p (java|classloader) are 1, so they are as the child nodes of java.Similarly, the support of classloader and xml also is the same, can know that through said method they are the child nodes of java, but how the relation between them confirms then to consider to use following prior probability formula:
p ( t 1 , t 2 , | t 3 ) = | SupportSet ( t 1 ) ∩ SupportSet ( t 2 ) ∩ SupportSet ( t 3 ) | Support ( t 3 ) ≥ α - - - ( 2 )
If satisfy (2) formula, then can judge t 3Be t 2Child nodes, otherwise, t 3And t 2Be the brotgher of node, and be all t 1Child nodes; Here I only consider 3 layers of interest topic tree construction; Carry out under the root node of the subject tree of having set up step above the iterative loop successively by the frequency of mark records then; Till all mark records were all visited, go up example for this reason and just set up all user interest subject trees.
Particularly, those skilled in the art can understand with reference to 6 pairs of said process of figure and related content.For example, preferably, embodiment illustrated in fig. 6 demonstrating according to control method provided by the invention; It mainly is that the mark records that the user obtains is carried out statistics and analysis, at first all mark records of user is carried out the occurrence number of statistical frequency, just sorts by frequency then; Carrying out when iteration sets up the user interest subject tree; Set as first interest topic from the highest mark records of frequency earlier, judge that with prior probability company the child nodes that whether belongs to this interest topic still is the child node of its child nodes, so circulates then; So far all mark records scaned surfaces are intact, promptly accomplished the foundation of user interest subject tree.Those skilled in the art combine the foregoing description to be appreciated that content shown in Figure 6, do not repeat them here.
Further; Said step " maps to window with the user interest subject tree " and can realize through following mode: can write down its label time when because of the user respective resources being marked; Therefore the above-mentioned label support tabulation joining day, the following user's of elder generation's act mark records (be following table: the process that the user interest theme is mapped to window is described user's mark records table of band timestamp):
SupportSet tags timestamp
post 1,post 3 java,xml,jdom 1,3
post 5,post 6 java,xml,dom4j 5,7
post 2 java,classloader,jvm 2
post 4 java,classloader 4
post 10,post 7 linux,shell 3,7
post 8,post 9 linux,ubuntu 8,9
Wherein the mapping method of timestamp is following: the time point that begins mark records for the first time from the user begin to the time of the last mark records of user be the time period; Then it being carried out cutting by the time interval, be exemplified below here, is 10 days such as above-mentioned user's label time section; We choose 3 days for the time interval then; User's time period just has been cut into 3 sections then, will the time period number 1~3 in order, and user's time window just has 3; The last mark of user is small size window window 1 just simultaneously, and the time increases progressively the window slogan successively.While is according to the achievement method of above-mentioned " just use the prior probability formula according to frequency and set up the user interest subject tree "; We have set up following interest topic tree to it; Simultaneously we get subject tree the value of root node and its two layers of child nodes as the prediction theme, obtain 4 prediction subject trees as shown in Figure 7.Wherein, various different patterns are represented different contents among Fig. 7, and are specific as follows:
(java; Xml) be
Figure BSA00000403532200131
shape among the figure; (java is
Figure BSA00000403532200132
shape among the figure classloader);
(linux; Shell) be
Figure BSA00000403532200133
shape among the figure; (linux is
Figure BSA00000403532200134
shape among the figure ubuntu).
Instance among the figure promptly is the situation that its theme is mapped to time window, comes its theme is sorted through calculating the weight of each theme in window then, and main calculation methods is following:
(1) subject tree overall situation weight: the root node of computation tree is in the ratio of all subject tree root nodes:
globalWeight ( root i ) = | SupportSet ( root i ) | | Σ root i ∈ Topic ( u ) SupportSet ( root i ) |
Weigh the overall situation hobby of user according to this to a subject tree; Then the subject tree of java and linux overall situation weight was respectively for 0.6 and 0.4 (because theme java has occurred 6 times, linux has occurred 4 times) in the example in user's mark records in user's mark records.
(2) window weight: give each window different weights, more closely then weight is high more apart from the current time for window, otherwise; The corresponding reduction of weight meeting supposes that index is the numbering of window, is No. 1 apart from nearest window of current time then; And by that analogy, then the window weight definition is following form:
Win?Weight(index)=e -(γ·WinSize·(index-1)/now-earliest)
Wherein WinSize is the single window size; Adjustable in the algorithm, γ is the adjustment coefficient, and now and earliest are respectively the earliest time of current time and user record; Corresponding to and getting γ in the example is under 2 the situation; The weight of window 1 is 1, and the weight of window 2 is 0.52, and the weight of window 3 then is 0.26.
(3) two layers of child nodes is with respect to the root node weight: confirm at root node under the situation of weight, calculate the importance of two node layers to root node, under our root node occurs in window the situation, the probability that two node layers occur calculates, that is:
L 2 Weight ( l 2 j ) = p ( root i _ l 2 j | root i ) = | SupportSet ( root i ) ∩ Support ( l 2 j ) | | Support ( root i ) | - - - ( 4 - 6 )
Root wherein iRepresent the root node of an interest topic tree, root i_ l2 jRepresent two layers of child nodes under this interest topic tree; Defined after the above weight; Interest topic tree weight calculation process can be described below, and from the minimum window 1 beginning iteration of index, appears in the window like root node root; Then in this window, do calculating weight size, at this moment have two layers of child nodes under several this root nodes root about root node root Can be calculated, interest topic tree weight adopts formula (5) to carry out, and then calculates about the interest topic tree of root in the current window to stop; Continue the recommendation of next subject tree root node root in the current window, all root node root that are circulated in this window finish, and carry out the weight calculation of interest topic tree in next window then; Weight calculation until all about interest topic tree is accomplished, for last example then about (linux, ubuntu); (linux, shell) with (java, prediction score xml) will generate in window 1; Then the recommendation about them stops; And for (java, recommendation classloader) then can stop in window 2, and the process of at this moment recommending also just is through with.Below be the computing method of finally predicting score:
score(root i,l2 j)=globalWeight(root i)·WinWeight(index)·L2Weight(l2 j)(5)
After the weight of all interest topic trees that calculate this user; Then it is carried out sorting by weighted value; Choose TOP-N interest topic at last and set the retrieval of carrying out resource in the system resource storehouse, the result who retrieves is recommended the user, this user's recommendation is so far accomplished.
Further, it will be appreciated by those skilled in the art that above-mentioned Fig. 1 can further specify through said process and following example to embodiment illustrated in fig. 5.At first we obtain the mark records and its social mark records of all user resources from webpage; Thereby set up the system resource database; Then its user being carried out personal data loads and analyzes; The mass value that calculates user's mark records settles the standard, and (this Calculation Method is mainly calculated the mass value of user's mark records to mark records to the social mark records of the mark records of its resource and corresponding resource through this user; Concrete calculating as follows: at first the society through a certain resource mark records of user and respective resources writes down to occur simultaneously and marks number divided by the mark number of its union mark records quality as this resource; Calculate the mark records value of all resources of this user then according to this with quadrat method; Mean value through its all resource mark records values of this user is as this user's mark records mass value at last, the mark records that relatively settles the standard of the recording quality value through itself and society mark.Following such as a user: Item_User (tag to the mark of a resource 1, tag 2, tag 3, tag 4) simultaneously the social mark records of this resource be Item_Social (tag 1, tag 3, tag 5, tag 6), then this user to the mark mass value of this resource is:
| Item _ User ∩ Item _ Social | | Item _ User ∪ Item _ Social |
The user's here mark mass value is exactly that 2/6 to occur simultaneously be that identical mark records has 2 tag 1, tag 3Union is that all mark records numbers are 6 tag 1, tag 2, tag 3, tag 4, tag 5, tag 6The threshold value of the recording quality value of society's mark is α; Compare through mean value and the α that calculates these all resource qualities of user then; Thereby settle the standard mark records, if the mark mass value of these all resources of user is higher than social mark records mass value, then with this user's mark records as the standard mark records; Otherwise with the social mark records of all respective resources of this user as the standard mark records, at last with the mark records of the standard mark records of confirming as the corresponding resource of this user.) secondly; User's mark records through after confirming to set up user interest subject tree (method is through " just using the prior probability formula according to frequency and set up the user interest subject tree " in the above-mentioned steps 11) for the user; Thereby for the user sets up model; At last interest topic tree is mapped to time window (method is to be exactly the way that the method for the timing node of the introducing mark records in the above-mentioned steps 11 is carried out window size); Calculate the weight (method is in the above-mentioned steps 11 its interest topic tree to be mapped to the calculating weight behind the window) of each window simultaneously; To the ordering of its user interest subject tree (ordering that the window weight of its subject tree that calculates is according to value carried out from big to small); Choose the forward interest topic of rank and set the system resource database and carry out resource retrieval (set carry out the keyword searches formed with subject tree root node and two layers of child nodes in its system resource storehouse with its interest topic of choosing), the TOP-N of its result for retrieval is recommended the user.
More than specific embodiment of the present invention is described.It will be appreciated that the present invention is not limited to above-mentioned specific implementations, those skilled in the art can make various distortion or modification within the scope of the claims, and this does not influence flesh and blood of the present invention.

Claims (15)

1. a control method of excavating based on the time window interest topic is characterized in that, comprises the steps:
A. confirm user's mark records and social mark records;
B. according to said user's mark records and the social mark records mark records that settles the standard;
C. generate the user interest subject tree according to said standard mark records;
D. set up the corresponding relation of window and definite said window and said subject tree;
E. calculate the weight of said subject tree according to said corresponding relation.
2. control method according to claim 1 is characterized in that said step a comprises the steps:
A1. from webpage, extract said user's mark records and social mark records.
3. control method according to claim 1 and 2 is characterized in that said step b comprises the steps:
Whether the mass value of b1. judging said user's mark records is greater than the mass value of said social mark records;
B2. if the judged result of said step b1 is sure, then said user's mark records is confirmed as said standard mark records;
B2 '. if the judged result of said step b1 negates then said social mark records to be confirmed as said standard mark records.
4. according to each described control method in the claim 1 to 3, it is characterized in that said step c comprises the steps:
C1. confirm keyword according to said standard mark records;
C2. add up the frequency of utilization generated frequency ranking results of said keyword;
C3. set up said user interest tree according to said frequency ranking results.
5. control method according to claim 4 is characterized in that said step c3 comprises the steps:
C31. use the prior probability formula and set up said user interest tree.
6. according to each described control method in the claim 1 to 5, it is characterized in that said steps d comprises the steps:
D1. user's size service time is carried out split window;
D2. combine all split windows, calculate the weight of each window;
D3. said user interest tree is traveled through, it is mapped in each window.
7. according to each described control method in the claim 1 to 6, it is characterized in that, also comprise the steps:
F. according to said weight said subject tree is sorted;
G. will give the user with the forward corresponding commending contents of some subject trees of sorting position.
8. according to each described control method in the claim 2 to 7, it is characterized in that said step a1 comprises the steps:
A11. from webpage, obtain individual mark records;
A12. from webpage, obtain social mark records with the corresponding resource of user.
9. a control device that excavates based on the time window interest topic is characterized in that, comprises like lower device:
First confirms device, and it is used for confirming user's mark records and social mark records;
Second confirms device, and it is used for according to said user's mark records and the social mark records mark records that settles the standard;
First generating apparatus, it is used for generating the user interest subject tree according to said standard mark records;
First treating apparatus, the corresponding relation that it is used to set up window and confirms said window and said subject tree;
First calculation element, it is used for calculating according to said corresponding relation the weight of said subject tree.
10. control device according to claim 8 is characterized in that, said first confirms that device comprises like lower device:
First extraction element, it is used for extracting said user's mark records and social mark records from webpage.
11. according to Claim 8 or 9 described control device, it is characterized in that said second confirms that device comprises like lower device:
Whether first judgment means, its mass value that is used to judge said user's mark records be greater than the mass value of said social mark records;
The 3rd confirms device, when its judged result that is used for when said first judgment means is sure, said user's mark records is confirmed as said standard mark records;
The 4th confirms device, its judged result that is used for when said first judgment means be negate the time, said social mark records is confirmed as said standard mark records.
12., it is characterized in that said first generating apparatus comprises like lower device according to each described control device in the claim 9 to 11:
The 5th confirms device, and it is used for confirming keyword according to said standard mark records;
Second treating apparatus, it is used to add up the frequency of utilization generated frequency ranking results of said keyword;
First apparatus for establishing, it is used for setting up said user interest tree according to said frequency ranking results.
13. control device according to claim 12 is characterized in that, said first apparatus for establishing comprises like lower device:
Second apparatus for establishing, it is used to use the prior probability formula and sets up said user interest tree.
14., it is characterized in that said first treating apparatus comprises like lower device according to each described control device in the claim 9 to 13:
The 3rd treating apparatus, it is used for user's size service time is carried out split window;
Second calculation element, it is used to combine all split windows, calculates the weight of each window;
The manages device everywhere, and it is used for said user interest tree is traveled through, and it is mapped in each window.
15. according to each described control device in the claim 9 to 14, it is characterized in that, also comprise like lower device:
First collator, it is used for according to said weight said subject tree being sorted;
First recommendation apparatus, it is used for giving the user with the forward corresponding commending contents of some subject trees of sorting position.
CN201010613845XA 2010-12-24 2010-12-24 Control method for interest subject excavation based on time window Pending CN102567392A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010613845XA CN102567392A (en) 2010-12-24 2010-12-24 Control method for interest subject excavation based on time window

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010613845XA CN102567392A (en) 2010-12-24 2010-12-24 Control method for interest subject excavation based on time window

Publications (1)

Publication Number Publication Date
CN102567392A true CN102567392A (en) 2012-07-11

Family

ID=46412829

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010613845XA Pending CN102567392A (en) 2010-12-24 2010-12-24 Control method for interest subject excavation based on time window

Country Status (1)

Country Link
CN (1) CN102567392A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902538A (en) * 2012-12-25 2014-07-02 中国银联股份有限公司 Information recommendation device and method based on decision-making tree
CN104035998A (en) * 2014-06-13 2014-09-10 中国船舶重工集团公司第七二二研究所 Service need satisfaction and extension method based on social tagging
CN104967555A (en) * 2015-05-19 2015-10-07 小米科技有限责任公司 Method and device for updating network community information issuing time and server
CN105787055A (en) * 2016-02-26 2016-07-20 合网络技术(北京)有限公司 Information recommendation method and device
CN106445969A (en) * 2015-08-11 2017-02-22 北京字节跳动科技有限公司 Global interest exploration and recommendation method and device
CN107133370A (en) * 2017-06-19 2017-09-05 南京邮电大学 A kind of label recommendation method based on correlation rule
WO2017198039A1 (en) * 2016-05-16 2017-11-23 中兴通讯股份有限公司 Tag recommendation method and device
CN109783628A (en) * 2019-01-16 2019-05-21 福州大学 The keyword search KSAARM algorithm of binding time window and association rule mining

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751448A (en) * 2009-07-22 2010-06-23 中国科学院自动化研究所 Commendation method of personalized resource information based on scene information
CN101853470A (en) * 2010-05-28 2010-10-06 浙江大学 Collaborative filtering method based on socialized label

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751448A (en) * 2009-07-22 2010-06-23 中国科学院自动化研究所 Commendation method of personalized resource information based on scene information
CN101853470A (en) * 2010-05-28 2010-10-06 浙江大学 Collaborative filtering method based on socialized label

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张坚: "Web挖掘个性化模型研究", 《计算机与信息技术》, no. 1, 31 December 2006 (2006-12-31) *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902538A (en) * 2012-12-25 2014-07-02 中国银联股份有限公司 Information recommendation device and method based on decision-making tree
CN103902538B (en) * 2012-12-25 2017-03-15 中国银联股份有限公司 Information recommending apparatus and method based on decision tree
CN104035998A (en) * 2014-06-13 2014-09-10 中国船舶重工集团公司第七二二研究所 Service need satisfaction and extension method based on social tagging
CN104967555A (en) * 2015-05-19 2015-10-07 小米科技有限责任公司 Method and device for updating network community information issuing time and server
CN106445969B (en) * 2015-08-11 2019-03-05 北京字节跳动科技有限公司 A kind of overall situation interest explores recommended method and device
CN106445969A (en) * 2015-08-11 2017-02-22 北京字节跳动科技有限公司 Global interest exploration and recommendation method and device
CN105787055A (en) * 2016-02-26 2016-07-20 合网络技术(北京)有限公司 Information recommendation method and device
CN105787055B (en) * 2016-02-26 2020-04-21 合一网络技术(北京)有限公司 Information recommendation method and device
WO2017198039A1 (en) * 2016-05-16 2017-11-23 中兴通讯股份有限公司 Tag recommendation method and device
CN107391509A (en) * 2016-05-16 2017-11-24 中兴通讯股份有限公司 Label recommendation method and device
CN107391509B (en) * 2016-05-16 2023-06-02 中兴通讯股份有限公司 Label recommending method and device
CN107133370A (en) * 2017-06-19 2017-09-05 南京邮电大学 A kind of label recommendation method based on correlation rule
CN109783628A (en) * 2019-01-16 2019-05-21 福州大学 The keyword search KSAARM algorithm of binding time window and association rule mining
CN109783628B (en) * 2019-01-16 2022-06-21 福州大学 Method for searching KSAARM by combining time window and association rule mining

Similar Documents

Publication Publication Date Title
CN105677844B (en) A kind of orientation of moving advertising big data pushes and user is across screen recognition methodss
CN102708096B (en) Network intelligence public sentiment monitoring system based on semantics and work method thereof
CN103744981B (en) System for automatic classification analysis for website based on website content
CN105718579B (en) A kind of information-pushing method excavated based on internet log and User Activity identifies
CN104899273B (en) A kind of Web Personalization method based on topic and relative entropy
CN102567392A (en) Control method for interest subject excavation based on time window
CN111708740A (en) Mass search query log calculation analysis system based on cloud platform
CN101216825B (en) Indexing key words extraction/ prediction method
CN106339502A (en) Modeling recommendation method based on user behavior data fragmentation cluster
CN103226578B (en) Towards the website identification of medical domain and the method for webpage disaggregated classification
US8271495B1 (en) System and method for automating categorization and aggregation of content from network sites
TWI695277B (en) Automatic website data collection method
CN101329674A (en) System and method for providing personalized searching
CN103324666A (en) Topic tracing method and device based on micro-blog data
CN103914478A (en) Webpage training method and system and webpage prediction method and system
CN104484431A (en) Multi-source individualized news webpage recommending method based on field body
CN103294781A (en) Method and equipment used for processing page data
CN105426514A (en) Personalized mobile APP recommendation method
CN109800350A (en) A kind of Personalize News recommended method and system, storage medium
CN104899229A (en) Swarm intelligence based behavior clustering system
CN104965931A (en) Big data based public opinion analysis method
CN103886020A (en) Quick search method of real estate information
KR100954842B1 (en) Method and System of classifying web page using category tag information and Recording medium using by the same
CN102955813A (en) Information searching method and information searching system
CN108021715A (en) Isomery tag fusion system based on semantic structure signature analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20120711