CN102915335B - Based on the information correlation method of user operation records and resource content - Google Patents

Based on the information correlation method of user operation records and resource content Download PDF

Info

Publication number
CN102915335B
CN102915335B CN201210345320.1A CN201210345320A CN102915335B CN 102915335 B CN102915335 B CN 102915335B CN 201210345320 A CN201210345320 A CN 201210345320A CN 102915335 B CN102915335 B CN 102915335B
Authority
CN
China
Prior art keywords
resource
user
content
task
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210345320.1A
Other languages
Chinese (zh)
Other versions
CN102915335A (en
Inventor
杨智强
殷钊
王衡
汪国平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201210345320.1A priority Critical patent/CN102915335B/en
Publication of CN102915335A publication Critical patent/CN102915335A/en
Application granted granted Critical
Publication of CN102915335B publication Critical patent/CN102915335B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to the information correlation method based on user operation records and resource content, first according to user to the resource content that relates in the operation historical record in personal computer and operation, the task model based on operation note of automatic mining user and the topic model based on resource content, then in conjunction with the incidence relation between task model and topic model scaling information, finally find out other resources maximally related with Current resource when user uses resource for user and recommend user, in whole process, carrying out any operation bidirectional without the need to user.The task model of automatic mining of the present invention based on operation historical record and the topic model based on resource content, can when user uses resource, other resources that recommendation may be relevant to this resource automatically, and period is without the need to any operation bidirectional.The present invention is intended to save user to the query time of file, ensures the consistance of user task as far as possible, effectively alleviates burden during user's task switching.

Description

Based on the information correlation method of user operation records and resource content
Technical field
The present invention relates to the information correlation method based on user operation records and resource content under operating system environment, belong to computer software technical field.
Background technology
The personal information amount of current user is increasing, excessive information can make people produce to lose time, Delayed Decision, cannot be absorbed in the problem such as main task and pressure, can see reference paper Waddington, P. (1997) .DyingforInformation? AreportontheeffectsofinformationoverloadintheUKandworldw ide.Reuters.1997.Knowledge Worker, the people of the occupations such as such as professor, lawyer and slip-stick artist, situation impression for information overload is the most deep, because they need to carry out various different task in routine duties, and carries out needing in the process of task to search and process a large amount of information.This just inevitably creates a problem, is interrupted or when task switch, will pay a large amount of energy for the relevant information or resource obtaining current task in task.
The present situation of the avalanche type growth of information, and current operating system does not provide the Information Administration Mode needed for user, personal user effectively cannot be obtained and become very outstanding with the problem of managing personal information.Personal information management is exactly the research field about how helping people to address this problem.For user provides perfect personal information management can run into challenge on a lot of psychology.These challenges can be summed up as 2 following points: one, and it is very difficult for article (such as file) being carried out being sorted in cognition.Two, the details about article that user can remember usually can not be used for retrieval.Current research proposes many solutions from the angle solving these two challenges.
For user provides better Information Organization and presentation mode to be the important research directions of personal information management.The project folder that the people such as OferBergman realize, can document OferBergman, RuthBeyth-Marom, RafiNachmias, TheProjectFragmentationProbleminPersonalInformationManag ement, CHI2006Proceedings, 2006 is reference, under all same subject information (comprising the page etc. of document, mail, collection) of user being stored in identical file folder, user can store and give same subject information for change under same catalogue.
Except organizing information better and presenting, more powerful information retrieval function is also the important means realizing personal information management.The people such as Dumais achieve StuffI'veSeen (SIS) system, concrete methods of realizing, with reference to Dumais, S.T., Cutrell, E., Cadiz, J.J., Jancke, G., Sarin, R.andRobbins, D.C. (2003) .StuffI'veSeen:Asystemforpersonalinformationretrievaland re-use.InProc.SIGIR2003,72-79..SIS is designed with two crucial aspects.An aspect is for the information of different tissues structure provides unified mark, thus utilizes unified mark to realize unified retrieval.Another aspect be utilize such as browse time, file the user such as author than the contextual information being easier to remember for user provides retrieval.
Information Organization and the mode presented need user in advance resource to be classified, and fail fundamentally user to be freed from heavy mutual burden.The mode of information retrieval reduces the expense that user searches resource in a certain degree, but frequent retrieval can make the task of user produce to be interrupted, and can not concentrate on current work.
Based on the information correlation method of user operation records and resource content, one resource recommendation service in real time and accurately can be provided for computer user, solve above-mentioned Problems existing.
Summary of the invention
The object of the invention is to propose a kind of resource information correlating method.The present invention is mainly used in personal computer, and according to operation note and the resource content of accessing in user's past, the forward direction user searching resource user recommends relevant information, for user saves the time overhead of information of searching.
For reaching above-mentioned purpose, technical scheme of the present invention is: based on the information correlation method of user operation records and resource content, its step comprises:
1) monitoring users multiple Action Events in a computer, Gains resources content and operation note are also stored in Local or Remote database;
2) described operation note is converted into specific format vector, sets up the task model based on operation note;
2-1) time slice sequence cutting and vector conversion are carried out to described operation note;
2-2) according to implicit expression Di Li Cray apportion model with described Action Events for data, simultaneously with described timeslice for unit, set up task model;
3) topic model based on resource content is set up according to described resource content;
3-1) according to the set of letters extracted in described resource content and vocabulary, be word frequency vector representation by the Content Transformation of each resource;
3-2) described word frequency vector is represented by implicit expression Di Li Cray apportion model, set up topic model;
4) calculate the correlation degree of topic model and task model described in Current resource and other resources respectively, complete the process of information association and select the highest resource of the degree of association to return user.
Described Action Events comprises: open resource event, close resource event, by a resource switch to another resource event, described resource content comprises: document and webpage.
The attribute of the Action Events needs collection that described document is relevant comprises time, event type, the title of resource and the path of resource, and the Action Events relevant with described webpage needs the attribute gathered to comprise time, event type, web page title and webpage URL.
Described time slice sequence cutting method is:
I) all resources in statistical operation record, set up the numbering of each resource in vocabulary, described resource are formed a vocabulary;
Ii) vector of samples A is defined j={ a 1, a 2..., a n..., a nfor representing jth time sampling time all resources state, wherein a=(0,1), n are the corresponding resource number of Action Events, and N is total number resource, and j is jth time sampling;
Iii) according to cycle c, timeslice is sampled, obtain cutting time slice sequence wherein, for the total number of vector, i is sampling number, and t is the length of timeslice, and c is the sampling period.
The extraction of described resource content comprises: remove punctuation mark, Chinese word segmentation, removal stop words, statistics vocabulary, obtain word frequency vector, is word frequency vector by the Content Transformation of each resource of aforesaid operations.
Preferably, in described task model, obtain the task distribution probability of sheet preset time and the resource distribution probability of Given task and the task distribution probability about the generation of certain resource.
Preferably, in described topic model, obtain the theme distribution probability of given resource and the word distribution probability of given theme.
Preferably, the method for compute associations degree is: calculate Current resource and other resources similarity in the probability distribution of described task model and topic model according to Kullback-Leibler modal distance, be weighted and obtain total distance.
Preferably, parameter estimation is carried out by Gibbs sampling in the study of described task model and topic model.
Preferably, described subscriber computer installs Windows or android system.
Good effect of the present invention is:
The present invention proposes a kind of resource associations and recommend method.According to user to the resource content related in the operation historical record in personal computer and operation, the task model of automatic mining based on operation historical record and the topic model based on resource content, can when user uses resource, automatic recommendation may be relevant to this resource other resources, and period is without the need to any operation bidirectional.The present invention is intended to save user to the query time of file, ensures the consistance of user task as far as possible, effectively alleviates burden during user's task switching.
Accompanying drawing explanation
Fig. 1 is system architecture diagram in the embodiment of the information correlation method that the present invention is based on user operation records and resource content;
Fig. 2 is the process flow diagram of information acquisition module in the embodiment of the present invention;
Fig. 3 is the process flow diagram of information management module in the embodiment of the present invention;
Fig. 4 is the information correlation method that the present invention is based on user operation records and resource content carries out timeslice cutting example to Action Events;
Fig. 5 is that the information correlation method that the present invention is based on user operation records and resource content carries out pretreated process flow diagram to resource content;
Fig. 6 is the corresponding relation figure that task model of the present invention and implicit expression Di Li Cray distribute (LDA) model;
Fig. 7 is recommending module process flow diagram of the present invention;
Fig. 8 is commending system interface of the present invention example.
Embodiment
Inventive principle
The present invention according to user to the resource content that relates in the operation historical record in personal computer and operation, the task model based on operation note of automatic mining user and the topic model based on resource content, then in conjunction with the incidence relation between task model and topic model scaling information, find out other resources maximally related with Current resource when user uses resource for user and present to user, in whole process, carrying out any operation bidirectional without the need to user.
A) under operating system environment, monitoring users to the Action Events of two or more target resource, the content of Gains resources.
B) the operation note historical data of user is done to cutting and the conversion of time slice sequence, then utilize implicit expression Di Li Cray apportion model specific algorithm can see [Blei2002] Blei, D.M., Ng, A.Y., & Jordan, M.I. (2002) .LatentDirichletallocation.InAdvancesinNeuralInformation ProcessingSystems14.MITPress, Cambridge, MA, 2002. set up the task model based on operation note.
C) content extraction, pre-service and word frequency vectorization are done to the resource related in the operation note of user, then utilize the foundation of implicit expression Di Li Cray apportion model based on the topic model of resource content.
D) weigh according to the task model based on operation note and the correlativity of the topic model based on resource content to resource.
Described in step a, user operation case comprises: open resource event, close resource event, by a resource switch to another resource event.
Step a realizes the monitoring to described two or more target resource by following manner: the Action Events of monitoring users; Described (three kinds of above-mentioned events) event is converted to interaction data; Be content-data by the Content Transformation of the resource related in Action Events; The operation two or more target resource done by described interaction data screening user; Described interaction data and content-data entirety can be stored in Local or Remote database backup.
Comprehensive, the inventive method can be implemented in the following manner:
1) utilize information acquisition module monitoring users Action Events in a computer, obtain the content of the resource related in Action Events, and Action Events and resource content are sent to information management module.
2) utilize information management module to receive from the Action Events of information acquisition module and resource content, be recorded to database, and the data inquiry request of response data pretreatment module.
3) utilizing data preprocessing module to inquiring about the service data that obtains and resource content filters and changes, then the service data of specific format and resource content being passed to data-mining module.
4) data-mining module is to service data after the study of pre-defined algorithm, generates task model; To resource content after the study of pre-defined algorithm, generate topic model.Then the task model generated and topic model are passed to recommending module.
5) recommending module is after obtaining two kinds of models, according to the Generalization bounds of specifying, recommends for Current resource filters out maximally related top n resource.
Resource in Action Events includes but not limited to the resource of following classes: document, webpage.
The method of work of information acquisition module is:
1) gather user's Action Events on computers and content, different Action Events needs to gather different attributes.The Action Events relevant with document needs the attribute gathered to comprise time, event type, the title of resource and the path of resource, and the Action Events relevant with webpage needs the attribute gathered to comprise time, event type, web page title and webpage URL.
2) Action Events collected and resource content are sent to information management module.
The method of work of information management module is:
1) receive the Action Events sent of information acquisition module and resource content, and preserved in a database.
2) response is from the data inquiry request of data preprocessing module, and the Action Events of defined time period in request and resource content are issued data preprocessing module.
Pretreatment module can be divided into two parts, and a part is responsible for carrying out pre-service to the historical data of Action Events, and another part carries out pre-service to resource content.
Pretreated method of work carried out to Action Events as follows:
By the timeslice cutting of the historical data of the Action Events of user, the mode Action Events in each timeslice being converted into word frequency goes to represent.First need all resources related in statistical operation data, all resources form a vocabulary, and each resource has unique numbering in vocabulary.Action Events utilizes label corresponding to resource operated by it to represent.Here need definition two parameters, one is the length t of timeslice, and another is sampling period c, and represent that the time often crossing cycle c, just sampling should be carried out, the unit of t and c is all defined as second here.Further, the vectorial A of definition sampling is needed j={ a 1, a 2..., a n..., a n, wherein a={0,1}, n are the numbering of the corresponding resource of Action Events, and N is total number of resource, and j represents jth time sampling, A jreflect the state of all resources when jth time is sampled.Carry out sampling time sheet t with cycle c, we can obtain m vectorial A, wherein we finally define wherein K is time slice sequence, illustrates the running status of each resource in timeslice, and i represents i-th sampling, and m represents the sampling number in timeslice.Aforesaid way take timeslice as unit, with the Action Events in timeslice for data, achieves the conversion of time slice sequence to word frequency vector.
Pretreated method of work carried out to resource content as follows:
1) punctuation mark is removed
Punctuation mark does not have practical significance, has a negative impact to content analysis meeting, and can adopt the method for filtrator that it is removed, the origin-location of punctuation mark replaces with space.
2) Chinese word segmentation
Chinese word segmentation program is utilized to carry out participle to resource content.
3) stop words is removed
It is higher that stop words refers to the frequency occurred in article, but but there is no the word of what practical significance, mainly some adverbial words, function word and modal particle etc., such as " go back ", " ", " " (with reference to from, change Berlin. the stop words treatment technology in Knowledge Extraction. modem long jump skill intelligence technology .2007, the 8th phase, 48-51).These words can produce interference effect in the process of data mining, so must delete.The method of deleting is according to vocabulary of stopping using, and after the word occurred is deleted, is replaced with space by the original position deleting word at inactive vocabulary.
4) add up vocabulary, obtain word frequency vector
Add up all various words of all articles, composition vocabulary, each word has unique numbering in vocabulary.After participle, the content of each resource is the set of a word.To each resource, according to its set of letters and vocabulary, be that word frequency vector represents by the Content Transformation of each resource.
The method of work of data-mining module is:
After the data obtaining vectorization, after the study of implicit expression Di Li Cray apportion model, distribution probability can be obtained.In task model, obtain be preset time sheet task distribution probability and the resource distribution probability of Given task, obtain the distribution probability of task about the generation of certain resource simultaneously.In topic model, what obtain is the theme distribution probability of given resource and the word distribution probability of given theme.
The method of work of recommending module is:
Foundation for Current resource recommendation related resource is the correlation degree between resource.Basic Ways utilizes the correlation degree between resource to sort to resource.Recommending module obtains task model and topic model from data-mining module, and these two kinds of models are weighed from the operation angle different with content two correlation degree between resource respectively.Recommending module is weighted these two kinds of correlation degrees and obtains a total correlation degree, then carries out sorting and recommending according to total correlation degree.
Realize in task model to be time slice sequence-task-resource with the article-theme-word in implicit expression Di Li Cray apportion model corresponding, and weigh the correlation degree between resource, be also just equivalent to weigh the correlation degree between word.Correlation degree between word can to get on measurement from them at the similarity degree of each theme.Be provided with word w 1and w 2, their correlation degree can pass through condition theme distribution P (Z|w 1) and P (Z|w 2) weigh.
In topic model, weigh the correlation degree between resource, be exactly weigh the correlation degree between article.In implicit expression Di Li Cray apportion model, article is equivalent to drop to the low theme vector of dimensional comparison from the vector of the very high word of dimension.So, calculate the similarity between two articles, can be calculated by their theme probability distribution.
The standard method weighing the difference of two distributions is that the KL distance calculating them can realize with reference to following mode: Steyvers, M., & Griffiths, T. (2007) .ProbabilisticTopicModels.InT.Landauer, DMcNamara, S.Dennis, andW.Kintsch, editors, LatentSemanticAnalysis:ARoadtoMeaning.LaurenceErlbaum, InPress..Kullback-Leibler distance, is also KL difference (Kullback-Leiblerdivergence), and what its was weighed is the difference condition of two probability distribution in similar events space.Suppose that two probability distribution are p and q respectively, their KL distance can calculate by formula below:
D ( p , q ) = Σ i = 1 T p i log 2 p i q i
Wherein, pi and qi represents i-th dimension of ProbabilityDistribution Vector p and q respectively, and T represents total dimension of p and q.
Can know from formula, when p and q every one dimension all correspondent equal time, namely two distribution complete equal time, D (p, q)=0.KL distance is asymmetrical, that is D (p, q) ≠ D (q, p).In the application recommended, use one based on the distance of the symmetry of KL distance, computing formula is as follows:
K L ( p , q ) = 1 2 [ D ( p , q ) + D ( q , p ) ]
Task model and topic model can be the KL distance that two resources calculate that is weighed a probability distribution variances respectively, if the distance that task model obtains is KL 1, the distance that topic model obtains is KL 2, calculate total distance L by formula below:
L-α*KL 1+β*KL 2
Wherein, α and β is the parameter of setting, can be set, also can set according to user preference by empirical value.
The resource that last length L is less, represents more similar to Current resource, so correlation degree is higher, more should recommend user.
Below by embodiment, the invention will be further described by reference to the accompanying drawings.
The inventive method is by the embodiment System Implementation shown in Fig. 1.
As shown in Figure 1, the present embodiment system mainly comprises: information acquisition module, for the Action Events of monitoring users, obtains the resource content that Action Events relates to, and is sent to information management module; Information management module, receives Action Events and resource content and response data inquiry request; Data preprocessing module, for being converted to specific format to Action Events and resource content and passing to data-mining module; Data-mining module, for being learnt by pre-defined algorithm service data and resource content, produces task model and topic model respectively; Recommending module, for according to the Generalization bounds of specifying, provides Current resource maximally related resource recommendation list.
Introduce the internal process of each module below.
Information acquisition module (as Fig. 2) is in background work, be responsible for Real-time Collection user Action Events on computers, comprise open resource, close resource, by a resource switch to another resource event, and obtain the resource content that Action Events relates to.The Action Events relevant with document needs the attribute gathered to comprise time, event type, the title of resource and the path of resource, and the Action Events relevant with webpage needs the attribute gathered to comprise time, event type, web page title and webpage URL.The Action Events collected and resource content are sent to information management module in real time.
Information management module (as Fig. 3) is responsible for real-time reception from the Action Events of information acquisition module and resource content.First, Action Events is converted to service data, service data and resource content are recorded to database, the data in database can be used for other application to use; Then, respond the data inquiry request from data preprocessing module, the service data of defined time period in request and resource content are returned to data preprocessing module.
Data preprocessing module is responsible for the data prediction of two aspects, on the one hand pre-service is carried out to the service data of inquiry gained, use " pretreated method of work is carried out to Action Events " described in summary of the invention part, by the timeslice cutting of the historical data of the Action Events of user, mode Action Events in each timeslice being converted into word frequency goes to represent, is finally expressed as word frequency vector.Fig. 4 shows an example of Action Events being carried out to timeslice cutting.
On the other hand, pre-service is carried out to resource content, use " pretreated method of work is carried out to resource content " (as the Fig. 5) described in summary of the invention part, through removing punctuation mark, Chinese word segmentation, removing stop words, add up vocabulary, obtain word frequency vector four steps, is that word frequency vector represents by the Content Transformation of each resource.
The word frequency vector of the word frequency vector sum resource content of the service data of gained is for data-mining module.
Data-mining module adopts implicit expression Di Li Cray to distribute the data of (LDA) model to vectorization and learns, and obtains distribution probability, for the calculating of recommending module to two resource associations degree.
In task model, define the theme in the corresponding LDA model of task to be excavated, one section of article in the corresponding LDA model of vector description of each timeslice, the word (as Fig. 6) in the corresponding LDA model of each resource.The method of being sampled by Gibbs carries out the study that parameter estimation realizes LDA model, can in the hope of preset time sheet task distribution probability and the resource distribution probability of Given task, meanwhile, also can obtain the distribution probability of task about the generation of certain resource.
In topic model, one section of article in the vector description of each resource content and corresponding LDA model, each resource content is by the word in each word of participle gained and corresponding LDA model.According to the method similar to task model, carry out parameter estimation by Gibbs sampling thus realize the study of LDA model, the theme distribution probability of given resource and the word distribution probability of given theme can be obtained.
Recommending module (as Fig. 7) is responsible for refreshing in real time and is recommended interface, be that Current resource recommends relevant resource, and the foundation of recommending is the correlation degree between resource.Task model and topic model are weighed from the operation angle different with content two correlation degree between resource respectively, therefore Generalization bounds adopts the correlation degree obtained two kinds of models to be weighted to obtain a total correlation degree, then carries out sorting and recommending according to total correlation degree.
In task model, resource corresponds to the word in LDA model, the correlation degree in the correlation degree between resource i.e. LDA model between word.The correlation degree of two words can be weighed by the similarity degree of the condition theme distribution of given word.Be provided with word w 1and w 2, their condition theme distribution is respectively P (Z|w 1) and P (Z|w 2), can by calculating P (Z|w 1) and P (Z|w 2) KL distance specifically weigh difference between them, thus weigh word w 1and w 2correlation degree, the namely correlation degree of the resource of their correspondences.
In topic model, resource corresponds to the article in LDA model, and the correlation degree weighed between resource namely calculates the similarity between two articles.Be provided with article d 1and d 2, in topic model, obtain their theme probability distribution P (Z|d 1) and P (Z|d 2), their KL distance can be calculated equally to weigh the difference between them, thus weigh article d 1and d 2correlation degree between corresponding resource.
The computing method of KL distance as discussed in the summary of the invention section.When user is using certain resource, for other resources in user operation history, in task model and topic model, calculate one according to the method described above respectively and weigh the KL distance with the probability distribution variances of Current resource, if the distance obtained in task model is KL 1, the distance obtained in topic model is KL 2, the mode eventually through weighting calculates total distance L=α * KL of these two kinds of models comprehensive 1+ β * KL 2, α and β is the parameter of setting.
Total distance L is less, represents that a correlation degree between resource and Current resource is higher.Finally, commending system interface is refreshed in real time, and in interface, shows resource recommendation list according to resource associations degree order from high to low (namely total distance L is ascending).Fig. 8 shows the example at commending system interface, and user directly can double-click resource in interface to open it.

Claims (7)

1., based on the information correlation method of user operation records and resource content, its step comprises:
1) monitoring users multiple Action Events in a computer, Gains resources content and operation note are also stored in Local or Remote database;
2) described operation note is converted into specific format vector, sets up the task model based on operation note;
2-1) time slice sequence cutting and vector conversion are carried out to described operation note;
2-2) according to implicit expression Di Li Cray apportion model with described Action Events for data, simultaneously with described timeslice for unit, set up task model;
3) topic model based on resource content is set up according to described resource content;
3-1) according to the set of letters extracted in described resource content and vocabulary, be word frequency vector representation by the Content Transformation of each resource;
3-2) described word frequency vector is represented by implicit expression Di Li Cray apportion model, set up topic model;
4) calculate the correlation degree of topic model and task model described in Current resource and other resources respectively, complete the process of information association and select the highest resource of the degree of association to return user;
4-1) in described task model, obtain the task distribution probability of sheet preset time and the resource distribution probability of Given task and the task distribution probability about the generation of certain resource;
4-2) in described topic model, obtain the theme distribution probability of given resource and the word distribution probability of given theme;
4-3) method of compute associations degree is: calculate Current resource and other resources similarity in the probability distribution of described topic model and task model according to Kullback-Leibler modal distance, be weighted and obtain total distance, total distance is less, then represent that correlation degree is higher;
4-4) according to the order display resource recommendation list from high to low of resource associations degree.
2. as claimed in claim 1 based on the information correlation method of user operation records and resource content, it is characterized in that, described Action Events comprises: open resource event, close resource event, by a resource switch to another resource event, described resource content comprises: document and webpage.
3. as claimed in claim 2 based on the information correlation method of user operation records and resource content, it is characterized in that, the Action Events relevant with described document needs the attribute gathered to comprise time, event type, the title of resource and the path of resource, and the Action Events relevant with described webpage needs the attribute gathered to comprise time, event type, web page title and webpage URL.
4., as claimed in claim 1 based on the information correlation method of user operation records and resource content, it is characterized in that, described time slice sequence cutting method is:
I) all resources in statistical operation record, set up the numbering of each resource in vocabulary, described resource are formed a vocabulary;
Ii) vector of samples A is defined j={ a 1, a 2..., a n..., a nfor representing jth time sampling time all resources state, wherein a={0,1}, n are the corresponding resource number of Action Events, and N is total number resource, and j is jth time sampling;
Iii) according to cycle c, timeslice is sampled, obtain cutting time slice sequence wherein, for the total number of vector, i is sampling number, and t is the length of timeslice, and c is the sampling period.
5. as claimed in claim 1 based on the information correlation method of user operation records and resource content, it is characterized in that, the extraction of described resource content comprises: remove punctuation mark, Chinese word segmentation, removal stop words, statistics vocabulary, obtaining word frequency vector, is word frequency vector by the Content Transformation of each resource of aforesaid operations.
6., as claimed in claim 1 based on the information correlation method of user operation records and resource content, it is characterized in that, in the study of described task model and topic model, carry out parameter estimation by Gibbs sampling.
7., as claimed in claim 1 based on the information correlation method of user operation records and resource content, it is characterized in that, described subscriber computer installs Windows or android system.
CN201210345320.1A 2012-09-17 2012-09-17 Based on the information correlation method of user operation records and resource content Active CN102915335B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210345320.1A CN102915335B (en) 2012-09-17 2012-09-17 Based on the information correlation method of user operation records and resource content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210345320.1A CN102915335B (en) 2012-09-17 2012-09-17 Based on the information correlation method of user operation records and resource content

Publications (2)

Publication Number Publication Date
CN102915335A CN102915335A (en) 2013-02-06
CN102915335B true CN102915335B (en) 2016-04-27

Family

ID=47613702

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210345320.1A Active CN102915335B (en) 2012-09-17 2012-09-17 Based on the information correlation method of user operation records and resource content

Country Status (1)

Country Link
CN (1) CN102915335B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150374B (en) * 2013-03-11 2017-02-08 中国科学院信息工程研究所 Method and system for identifying abnormal microblog users
CN103412880B (en) * 2013-07-17 2017-02-22 百度在线网络技术(北京)有限公司 Method and device for determining implicit associated information between multimedia resources
CN104572707A (en) * 2013-10-18 2015-04-29 北京卓易讯畅科技有限公司 Preferable object information providing method and device
CN103793465B (en) * 2013-12-20 2018-06-22 武汉理工大学 Mass users behavior real-time analysis method and system based on cloud computing
CN103631970B (en) * 2013-12-20 2017-08-18 百度在线网络技术(北京)有限公司 The method and apparatus for excavating attribute and entity associated relation
US10210171B2 (en) * 2014-06-18 2019-02-19 Microsoft Technology Licensing, Llc Scalable eventual consistency system using logical document journaling
CN105376649B (en) * 2015-11-24 2018-09-14 江苏有线技术研究院有限公司 Realize the blind operating method of the set-top box of accurate combined recommendation and system
CN105825415A (en) * 2016-03-15 2016-08-03 广东省科技基础条件平台中心 S&T (Science and Technology) resource supply and demand matching method
CN106469206A (en) * 2016-08-31 2017-03-01 广州酷狗计算机科技有限公司 The method and apparatus of pushed information
CN106570157B (en) * 2016-11-03 2020-04-17 北京金山安全软件有限公司 Picture pushing method and device and electronic equipment
CN106777304B (en) * 2016-12-30 2020-03-20 中国民航信息网络股份有限公司 Theme pushing method and device
CN107391546B (en) * 2017-06-01 2020-07-07 浙江唯见科技有限公司 Method and system for full association of VR resources
CN108563648B (en) * 2017-11-29 2021-06-25 腾讯科技(上海)有限公司 Data display method and device, storage medium and electronic device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101968798A (en) * 2010-09-10 2011-02-09 中国科学技术大学 Community recommendation method based on on-line soft constraint LDA algorithm
CN102004774A (en) * 2010-11-16 2011-04-06 清华大学 Personalized user tag modeling and recommendation method based on unified probability model
CN102495872A (en) * 2011-11-30 2012-06-13 中国科学技术大学 Method and device for conducting personalized news recommendation to mobile device users

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101968798A (en) * 2010-09-10 2011-02-09 中国科学技术大学 Community recommendation method based on on-line soft constraint LDA algorithm
CN102004774A (en) * 2010-11-16 2011-04-06 清华大学 Personalized user tag modeling and recommendation method based on unified probability model
CN102495872A (en) * 2011-11-30 2012-06-13 中国科学技术大学 Method and device for conducting personalized news recommendation to mobile device users

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《一种基于LDA的在线主题演化挖掘模型》;崔凯 等;《计算机科学》;20101115;157页图2,157页左边栏倒数第1段-右边栏第1段,158页左边栏第2段,158页左边栏2.3增量Gibbs算法,158页右边栏3.1主题间相似性度量 *

Also Published As

Publication number Publication date
CN102915335A (en) 2013-02-06

Similar Documents

Publication Publication Date Title
CN102915335B (en) Based on the information correlation method of user operation records and resource content
CN103177090B (en) A kind of topic detection method and device based on big data
CN103226578B (en) Towards the website identification of medical domain and the method for webpage disaggregated classification
CN103294815B (en) Based on key class and there are a search engine device and method of various presentation modes
CN105069102A (en) Information push method and apparatus
CN101751458A (en) Network public sentiment monitoring system and method
CN103544255A (en) Text semantic relativity based network public opinion information analysis method
CN104951539A (en) Internet data center harmful information monitoring system
CN104182389A (en) Semantic-based big data analysis business intelligence service system
CN103678564A (en) Internet product research system based on data mining
CN105302810A (en) Information search method and apparatus
CN102737021B (en) Search engine and realization method thereof
CN101393555A (en) Rubbish blog detecting method
CN105389341A (en) Text clustering and analysis method for repeating caller work orders of customer service calls
CN103838754A (en) Information searching device and method
CN106649498A (en) Network public opinion analysis system based on crawler and text clustering analysis
Zhang Application of data mining technology in digital library.
CN103995828B (en) A kind of cloud storage daily record data analysis method
CN106844588A (en) A kind of analysis method and system of the user behavior data based on web crawlers
CN103761246A (en) Link network based user domain identifying method and device
Li et al. Netnews bursty hot topic detection based on bursty features
CN103823868A (en) Event recognition method and event relation extraction method oriented to on-line encyclopedia
CN113449077A (en) News popularity calculation method, equipment and storage medium
CN104965894A (en) Data analysis system for IDC hazardous information monitoring platform
Guan et al. Research and design of internet public opinion analysis system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant