CN102915335A - Information associating method based on user operation record and resource content - Google Patents

Information associating method based on user operation record and resource content Download PDF

Info

Publication number
CN102915335A
CN102915335A CN2012103453201A CN201210345320A CN102915335A CN 102915335 A CN102915335 A CN 102915335A CN 2012103453201 A CN2012103453201 A CN 2012103453201A CN 201210345320 A CN201210345320 A CN 201210345320A CN 102915335 A CN102915335 A CN 102915335A
Authority
CN
China
Prior art keywords
resource
user
model
resources
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012103453201A
Other languages
Chinese (zh)
Other versions
CN102915335B (en
Inventor
杨智强
殷钊
王衡
汪国平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201210345320.1A priority Critical patent/CN102915335B/en
Publication of CN102915335A publication Critical patent/CN102915335A/en
Application granted granted Critical
Publication of CN102915335B publication Critical patent/CN102915335B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to an information associating method based on a user operation record and resource content. The method comprises the following steps: firstly, automatically excavating a task model (of a user) based on the operation record and a subject model based on the resource content according to the operation history record and the relevant resource content (of the user) in the operation in a personal computer; subsequently, combining the association relationship between the measuring information of the task model and the subject model, and finally finding out other resources which are most relevant to the current resource for the user when the user uses the resource, and recommending the other resources to the user, wherein the user does not need any extra operation in the whole operation process. The task model based on the operation history record and the subject model based on the resource content are automatically excavated, other resources relevant to the resource are automatically recommended as much as possible without any extra operation when the user uses the resources; and the invention aims at saving the time that the user spends in checking the file, so as to guarantee the consistency of user tasks as much as possible, and effectively alleviate the burden of the user to switch over the tasks.

Description

Information association method based on user operation record and resource content
Technical Field
The invention relates to an information association method based on user operation records and resource contents in an operating system environment, and belongs to the technical field of computer software.
Background
Today, users have increasingly large amounts of personal information, and the excessive amount of information causes problems of wasting time, delaying decisions, failing to concentrate on the main task and stress, which can be seen in reference Waddington, p. (1997). Knowledge workers, such as professors, lawyers, and engineers, are most deeply experienced in situations of information overload because they are required to perform a variety of different tasks during their day-to-day activities, and to search and process a large amount of information during the performance of the tasks. This inevitably creates a problem in that a great deal of effort is expended to obtain information or resources related to the current task when the task is interrupted or switched.
The current situation of avalanche growth of information and the fact that the current operating systems do not provide information management modes for users, make the problem that individual users cannot effectively acquire and manage individual information become prominent. Personal information management is an area of research on how to help people solve this problem. Providing users with perfect personal information management can encounter many psychological challenges. These challenges can be attributed to two points: first, it is cognitively very difficult to classify items (e.g., documents). Second, details about the item that the user can remember are often not available for retrieval. Current research has proposed many solutions from the perspective of addressing these two challenges.
Providing better information organization and presentation for users is an important research direction for personal information management. The project folder implemented by office Bergman et al can store all the same subject Information (including documents, mails, collected pages, etc.) of the user in the same folder by taking document office Bergman, Ruth byte-Marom, Rafi Nachmias, the project Fragmentation protocol in Personal Information Management, CHI 2006 as reference, and the user can store and retrieve the same subject Information in the same directory.
In addition to better organizing and presenting information, more powerful information retrieval functions are also important means to enable personal information management. Dumais et al realized a StuffI 'e Sen (SIS) system, embodied in a manner referenced to Dumais, S.T., Cutrell, E.S., Cadiz, J.J., Jancke, G.S., Sarin, R.and Robbins, D.C. (2003). StuffI' e Sen: A system for personal information retrieval and re-use. in Proc.SIGIR2003,72-79. The design of SIS has two key aspects. One aspect is to provide uniform tagging of information of different organizational structures, thereby enabling uniform retrieval using uniform tagging. Another aspect is to provide the user with a search using context information that the user can remember more easily, such as the time of browsing, the author of the document, etc.
The way of information organization and presentation requires the user to classify the resources in advance, and fails to fundamentally relieve the user from the heavy interaction burden. The information retrieval mode reduces the cost of searching resources for the user to a certain extent, but frequent retrieval causes the task of the user to be interrupted and the user cannot concentrate on the current work.
The information association method based on the user operation record and the resource content can provide a real-time and accurate resource recommendation service for computer users, and solve the existing problems.
Disclosure of Invention
The invention aims to provide a resource information correlation method. The method is mainly applied to the personal computer, and relevant information is recommended to the user before the user searches for the resources according to the past operation records of the user and the accessed resource content, so that the time overhead of searching for the information is saved for the user.
In order to achieve the purpose, the technical scheme of the invention is as follows: the information association method based on the user operation record and the resource content comprises the following steps:
1) monitoring a plurality of operation events of a user in a computer, acquiring resource content and operation records and storing the resource content and the operation records in a local or remote database;
2) converting the operation records into specific format vectors, and establishing a task model based on the operation records;
2-1) carrying out time slice sequence segmentation and vector conversion on the operation records;
2-2) establishing a task model by taking the operation event as data and the time slice as a unit according to an implicit Dirichlet allocation model;
3) establishing a theme model based on the resource content according to the resource content;
3-1) converting the content of each resource into word frequency vector representation according to the word set and the vocabulary table extracted from the resource content;
3-2) expressing the word frequency vector by an implicit Dirichlet allocation model, and establishing a topic model;
4) and respectively calculating the association degree of the current resource with other resources, namely the topic model and the task model, finishing the processing of information association and selecting the resource with the highest association degree to return to the user.
The operational events include: opening a resource event, closing the resource event, and switching from one resource to another resource event, wherein the resource content comprises: documents and web pages.
The attributes which need to be collected by the operation event related to the document comprise time, event type, title of the resource and path of the resource, and the attributes which need to be collected by the operation event related to the webpage comprise time, event type, title of the webpage and webpage URL.
The time slice sequence segmentation method comprises the following steps:
i) counting all resources in the operation record, establishing the number of each resource in a vocabulary table, and forming the resources into a vocabulary table;
ii) defining a sampling vector Aj={a1,a2,…,an,...,aNThe resource state information is used for representing the states of all the resources at the jth sampling, wherein a is (0, 1), N is the resource number corresponding to the operation event, N is the total number of the resources, and j is the jth sampling;
iii) sampling the time slices according to the period c to obtain a sequence of sliced time slices
Figure BDA00002149515000031
Wherein,
Figure BDA00002149515000032
is the total number of vectors, i is the number of sampling times, t is the length of the time slice, and c is the sampling period.
The extraction of the resource content comprises: removing punctuation marks, Chinese word segmentation, stop words and counting a vocabulary table to obtain word frequency vectors, and converting the content of each resource into the word frequency vectors through the operation.
Preferably, in the task model, a task distribution probability for a given time slice and a resource distribution probability for a given task and a distribution probability for the occurrence of the task with respect to a certain resource are obtained.
Preferably, in the topic model, a topic distribution probability of a given resource and a word distribution probability of a given topic are obtained.
Preferably, the method of calculating the degree of association is: and calculating the similarity of the probability distribution of the current resource and other resources in the task model and the topic model according to the Kullback-Leibler model distance, and weighting to obtain the total distance.
Preferably, parameter estimation is carried out through Gibbs sampling in the learning of the task model and the subject model.
Preferably, the user computer is installed with a Windows or Android system.
The invention has the positive effects that:
the invention provides a resource association and recommendation method. According to the operation history records of the user in the personal computer and the resource content involved in the operation, the task model based on the operation history records and the theme model based on the resource content are automatically mined, so that other resources possibly related to the resources can be automatically recommended when the user uses the resources without any additional operation. The invention aims to save the query time of a user for files, ensure the consistency of tasks of the user as much as possible and effectively reduce the burden of the user when switching the tasks.
Drawings
FIG. 1 is a block diagram of a system architecture in an embodiment of an information association method based on user operation records and resource contents according to the present invention;
FIG. 2 is a flow diagram of an information collection module in an embodiment of the invention;
FIG. 3 is a flow diagram of an information management module in an embodiment of the invention;
FIG. 4 is an example of time slicing operation events based on the information association method of user operation records and resource contents according to the present invention;
FIG. 5 is a flow chart of the present invention for pre-processing resource content based on a user operation record and a resource content information association method;
FIG. 6 is a diagram of the correspondence between the mission model and the implicit Dirichlet allocation (LDA) model of the present invention;
FIG. 7 is a recommendation module flow diagram of the present invention;
FIG. 8 is an example recommendation system interface of the present invention.
Detailed Description
Principle of the invention
According to the operation history records of the user in the personal computer and the resource content related to the operation, the task model based on the operation records and the theme model based on the resource content of the user are automatically mined, then the incidence relation between the task model and the theme model measurement information is combined, other resources most relevant to the current resources are found for the user and presented to the user when the user uses the resources, and the user does not need to perform any additional operation in the whole process.
a) In the operating system environment, the operating events of a user on two or more target resources are monitored, and the content of the resources is obtained.
b) The method comprises the steps of carrying out segmentation and conversion on a time slice sequence on operation record historical data of a user, and then utilizing a specific algorithm of an implicit Dirichlet allocation model, which can be seen in [ Blei 2002] Blei, D.M., Ng, A.Y., & Jordan, M.I. (2002).
c) And performing content extraction, preprocessing and word frequency vectorization on resources related in the operation records of the user, and then establishing a theme model based on the resource content by using an implicit Dirichlet allocation model.
d) And measuring the relevance of the resources according to the task model based on the operation records and the topic model based on the resource contents.
Step a, the user operation event comprises: open resource events, close resource events, switch from one resource to another resource event.
Step a realizes the monitoring of the two or more target resources by the following ways: monitoring an operation event of a user; converting the (three events above) events into interaction data; converting the content of the resource involved in the operation event into content data; screening the operation of the user on two or more target resources by the interactive data; the interaction data and the content data may be stored integrally in a local or remote database for later use.
Overall, the process of the invention can be carried out in the following manner:
1) the information acquisition module is used for monitoring the operation event of a user in a computer, acquiring the content of the resource related to the operation event, and sending the operation event and the resource content to the information management module.
2) And the information management module is used for receiving the operation event and the resource content from the information acquisition module, recording the operation event and the resource content to the database, and responding to the data query request of the data preprocessing module.
3) And the operation data and the resource content obtained by query are filtered and converted by using the data preprocessing module, and then the operation data and the resource content in a specific format are transmitted to the data mining module.
4) The data mining module generates a task model after learning the operation data through a preset algorithm; after the resource content is learned through a preset algorithm, a theme model is generated. The generated task model and topic model are then passed to a recommendation module.
5) And after obtaining the two models, the recommending module screens out the top N most relevant resources for the current resources according to the specified recommending strategy to recommend.
Resources in an operational event include, but are not limited to, the following types of resources: documents, web pages.
The working method of the information acquisition module comprises the following steps:
1) the method comprises the steps of collecting operation events and contents of a user on a computer, wherein different attributes need to be collected for different operation events. The attributes required to be collected by the operation event related to the document comprise time, event type, title of the resource and path of the resource, and the attributes required to be collected by the operation event related to the webpage comprise time, event type, webpage title and webpage URL.
2) And sending the collected operation events and resource contents to the information management module.
The working method of the information management module comprises the following steps:
1) and receiving the operation event and the resource content sent by the information acquisition module, and storing the operation event and the resource content in the database.
2) And responding to a data query request from the data preprocessing module, and sending the operation events and the resource contents of the time period specified in the request to the data preprocessing module.
The preprocessing module can be divided into two parts, one part is responsible for preprocessing historical data of the operation events, and the other part is responsible for preprocessing resource contents.
The working method for preprocessing the operation event comprises the following steps:
and segmenting historical data of the operation events of the user by using time slices, and converting the operation events in each time slice into a word frequency mode for representation. Firstly, all resources involved in the operation data need to be counted, all the resources form a vocabulary table, and each resource has a unique number in the vocabulary table. The operation event is represented by a reference number corresponding to the resource operated by the operation event. Two parameters need to be defined, one is the length t of the time slice and the other is the sampling period c, which means that the sampling is performed once every time of the period c, where the unit of t and c is defined as seconds. Also, a vector a of samples needs to be definedj={a1,a2,...,an,...,aNWhere a is (0.1), N is the number of the resource corresponding to the operation event, N is the total number of the resources, j represents the j-th sampling, ajReflecting the state of all resources at the j-th sample. Sampling a time slice t with a period c, we can get m vectors A, where
Figure BDA00002149515000051
We finally define
Figure BDA00002149515000052
Where K is a time slice sequence representing the operating state of each resource within a time slice, i represents the ith sample, and m represents the number of samples within a time slice. In the mode, the time slices are taken as units, and the operation events in the time slices are taken as data, so that the conversion from the time slice sequence to the word frequency vector is realized.
The working method for preprocessing the resource content comprises the following steps:
1) removing punctuation marks
Punctuation marks have no practical significance and can have negative effects on content analysis, and can be removed by adopting a filter method, and the original positions of the punctuation marks are replaced by blank spaces.
2) Chinese word segmentation
And utilizing a Chinese word segmentation program to segment the resource content.
3) Removing stop words
Stop words refer to words which appear frequently in the article but have no practical meaning, mainly including adverbs, fictional words, and tone words, such as "still", "of" and "o" (refer to self, Huablin. stop word processing technology in knowledge extraction, modern book information technology, 2007, 8 th, 48-51). These words can have interfering effects during data mining and must be deleted. The deleting method is that the original position of the deleted word is replaced by a blank space after the word appearing in the stop word list is deleted according to the stop word list.
4) Counting the vocabulary to obtain word frequency vector
All different words of all articles are counted to form a vocabulary, and each word has a unique number in the vocabulary. After word segmentation, the content of each resource is a set of words. For each resource, the content of each resource is converted into a word frequency vector to be expressed according to the word set and the vocabulary of each resource.
The working method of the data mining module comprises the following steps:
after vectorized data are obtained, the distribution probability can be obtained after learning of an implicit Dirichlet distribution model. In the task model, the task distribution probability of a given time slice and the resource distribution probability of a given task are obtained, while the distribution probability of the task with respect to the occurrence of a certain resource is obtained. In the topic model, the topic distribution probability for a given resource and the word distribution probability for a given topic are obtained.
The working method of the recommendation module comprises the following steps:
the basis for recommending the related resources for the current resources is the degree of association between the resources. The basic approach is to order the resources by the degree of association between the resources. The recommendation module obtains a task model and a topic model from the data mining module, and the two models respectively measure the association degree between the resources from two different aspects of operation and content. The recommending module weights the two association degrees to obtain a total association degree, and then sorts and recommends according to the total association degree.
The task model realizes the correspondence between time slice sequences, tasks and resources and articles, topics and words in the implicit Dirichlet allocation model, and measures the association degree between the resources, namely measures the association degree between the words. The degree of association between words can be measured from their similarity in individual topics. Is provided with a word w1And w2Their degree of association can be distributed by conditional topic P (Z | w)1) And P (Z | w)2) To measure.
The relevance degree between the resources is measured in the topic model, namely the relevance degree between the articles is measured. In the implicit dirichlet allocation model, an article is equivalent to dropping from a vector of words with very high dimensionality to a topic vector with a lower dimensionality. Therefore, the similarity between two articles is calculated by their topic probability distribution.
The standard way to measure the difference between the two distributions is to calculate their KL distance by reference to the following: stevers, m., & Griffiths, T. (2007), basic Topic models, In t.landauer, DMcNamara, s.dennis, and w.kintsch, editors, content Semantic Analysis, a Road to means. The Kullback-Leibler distance, also called KL-difference (Kullback-Leibler divergence), measures the difference between two probability distributions in the same event space. Assuming that the two probability distributions are p and q, respectively, their KL distances can be calculated by the following equation:
D ( p , q ) = Σ i = 1 T p i log 2 p i q i
wherein p isiAnd q isiDenotes the ith dimension of the probability distribution vectors p and q, respectively, and T denotes the overall dimension of p and q.
As can be seen from the equation, D (p, q) =0 when each dimension of p and q is equal, i.e., the two distributions are completely equal. The KL distance is asymmetric, that is D (p, q) ≠ D (q, p). In the proposed application, a symmetric distance based on the KL distance is used, and the calculation formula is as follows:
KL ( p , q ) = 1 2 [ D ( p , q ) + D ( q , p ) ]
the task model and the topic model can respectively calculate a KL distance for measuring the difference of probability distribution for two resources, and the distance obtained by setting the task model is KL1The distance obtained by the topic model is KL2The total distance L is calculated by the following equation:
L=α*KL1-β*KL2
where α and β are set parameters, and may be set by empirical values or according to user preferences.
The smaller the final distance L, the more similar the resource is to the current resource, so the higher the association degree is, the more the resource should be recommended to the user.
The invention is further described below by way of example with reference to the accompanying drawings.
The process of the present invention is carried out by the embodiment system shown in fig. 1.
As shown in fig. 1, the system of the present embodiment mainly includes: the information acquisition module is used for monitoring the operation event of the user, acquiring the resource content related to the operation event and sending the resource content to the information management module; the information management module receives the operation event and the resource content and responds to the data query request; the data preprocessing module is used for converting the operation events and the resource contents into a specific format and transmitting the specific format to the data mining module; the data mining module is used for learning the operation data and the resource content through a preset algorithm and respectively generating a task model and a theme model; and the recommending module is used for providing a resource recommending list most relevant to the current resource according to the specified recommending strategy.
The internal flow of each module is described below.
The information acquisition module (as shown in fig. 2) works in the background and is responsible for acquiring operation events of a user on a computer in real time, wherein the operation events comprise the steps of opening a resource, closing the resource, switching from one resource to another resource event, and acquiring the resource content related to the operation events. The attributes required to be collected by the operation event related to the document comprise time, event type, title of the resource and path of the resource, and the attributes required to be collected by the operation event related to the webpage comprise time, event type, webpage title and webpage URL. And the collected operation events and resource contents are sent to the information management module in real time.
The information management module (as shown in fig. 3) is responsible for receiving the operation events and resource contents from the information acquisition module in real time. Firstly, converting an operation event into operation data, recording the operation data and resource content into a database, wherein the data in the database can be used by other applications; then, responding to the data query request from the data preprocessing module, and returning the operation data and the resource content of the time period specified in the request to the data preprocessing module.
The data preprocessing module is responsible for data preprocessing in two aspects, on one hand, preprocessing operation data obtained by query, and using the 'working method for preprocessing operation events' in the invention content part, the historical data of the operation events of the user is segmented by time slices, and the operation events in each time slice are converted into word frequency to be represented, and finally represented as word frequency vectors. FIG. 4 illustrates one example of time slicing operation events.
On the other hand, the resource content is preprocessed, and the content of each resource is converted into a word frequency vector for representation through four steps of removing punctuation marks, dividing Chinese words, removing stop words, counting a vocabulary table and obtaining the word frequency vector by using the working method for preprocessing the resource content (shown in figure 5) described in the section of the invention content.
And the obtained word frequency vector of the operation data and the word frequency vector of the resource content are used by the data mining module.
The data mining module learns the subtended quantized data by adopting an implicit Dirichlet allocation (LDA) model to obtain a distribution probability which is used for calculating the association degree of the two resources by the recommending module.
In the task model, the task to be mined is defined to correspond to the topic in the LDA model, the vector of each time slice describes an article in the corresponding LDA model, and each resource corresponds to a word in the LDA model (as shown in fig. 6). Parameter estimation is carried out through a Gibbs sampling method to realize learning of the LDA model, the task distribution probability of a given time slice and the resource distribution probability of a given task can be obtained, and meanwhile, the distribution probability of the task about the occurrence of a certain resource can also be obtained.
In the topic model, a vector description of each resource content corresponds to an article in the LDA model, and each word obtained by segmenting each resource content corresponds to a word in the LDA model. According to a method similar to the task model, parameter estimation is carried out through Gibbs sampling so as to realize learning of the LDA model, and the theme distribution probability of given resources and the word distribution probability of given themes can be obtained.
The recommending module (as shown in fig. 7) is responsible for refreshing the recommending interface in real time, and recommending related resources for the current resources according to the association degree between the resources. The task model and the topic model respectively measure the association degree between the resources from two different angles of operation and content, so that the recommendation strategy obtains a total association degree by weighting the association degrees obtained by the two models, and then carries out sequencing and recommendation according to the total association degree.
In the task model, the resources correspond to words in the LDA model, and the association degree between the resources is the association degree between the words in the LDA model. The degree of association of two words can be measured by the degree of similarity of the conditional topic distributions of a given word. Is provided with a word w1And w2Their conditional topic distributions are respectively P (Z | w)1) And P (Z | w)2) Can be calculated by calculating P (Z | w)1) And P (Z | w)2) In particular, measure the difference between them, and thus the word w1And w2I.e., the degree of association of their corresponding resources.
In the topic model, the resources correspond to the articles in the LDA model, and the degree of association between the resources is measured, that is, the similarity between two articles is calculated. Is provided with an article d1And d2Their topic probability distribution P (Z | d) is obtained in the topic model1) And P (Z | d)2) Likewise, their KL distance may be calculated to measure the difference between them, and thus article d1And d2Between corresponding resourcesThe degree of association.
The KL distance is calculated as described in the summary of the invention. When a user uses a certain resource, for other resources in the user operation history, calculating a KL distance for measuring the difference of probability distribution of the current resource in the task model and the topic model according to the method, and setting the distance obtained in the task model as KL1The distance obtained in the topic model is KL2Finally, the total distance L-alpha KL of the two models is calculated in a weighting mode1+β*KL2And α and β are set parameters.
The smaller the total distance L, the higher the degree of association between a resource and the current resource. And finally, refreshing the interface of the recommendation system in real time, and displaying the resource recommendation list in the interface according to the sequence of the resource association degree from high to low (namely the total distance L is from small to large). FIG. 8 shows an example of a recommendation system interface that a user can directly double-click on a resource in the interface to open.

Claims (10)

1. The information association method based on the user operation record and the resource content comprises the following steps:
1) monitoring a plurality of operation events of a user in a computer, acquiring resource content and operation records and storing the resource content and the operation records in a local or remote database;
2) converting the operation records into specific format vectors, and establishing a task model based on the operation records;
2-1) carrying out time slice sequence segmentation and vector conversion on the operation records;
2-2) establishing a task model by taking the operation event as data and the time slice as a unit according to an implicit Dirichlet allocation model;
3) establishing a theme model based on the resource content according to the resource content;
3-1) converting the content of each resource into word frequency vector representation according to the word set and the vocabulary table extracted from the resource content;
3-2) expressing the word frequency vector by an implicit Dirichlet allocation model, and establishing a topic model;
4) and respectively calculating the association degree of the current resource with other resources, namely the topic model and the task model, finishing the processing of information association and selecting the resource with the highest association degree to return to the user.
2. The information association method based on user operation record and resource content as claimed in claim 1, wherein the operation event includes: opening a resource event, closing the resource event, and switching from one resource to another resource event, wherein the resource content comprises: documents and web pages.
3. The information correlation method based on user operation record and resource content according to claim 1, wherein the attributes required to be collected of the operation event related to the document comprise time, event type, title of resource and path of resource, and the attributes required to be collected of the operation event related to the web page comprise time, event type, web page title and web page URL.
4. The information association method based on user operation record and resource content as claimed in claim 1, wherein the time slice sequence slicing method is:
i) counting all resources in the operation record, establishing the number of each resource in a vocabulary table, and forming the resources into a vocabulary table;
ii) defining a sampling vector Aj={a1,a2,…,an,…,aNIs used to indicate the j-th samplingSampling states of all resources, wherein a is (0, 1), N is a resource number corresponding to an operation event, N is the total number of the resources, and j is the jth sampling;
iii) sampling the time slices according to the period c to obtain a sequence of sliced time slices
Figure FDA00002149514900011
Wherein,is the total number of vectors, i is the number of sampling times, t is the length of the time slice, and c is the sampling period.
5. The information association method based on user operation record and resource content as claimed in claim 1, wherein the extracting of the resource content comprises: removing punctuation marks, Chinese word segmentation, stop words and counting a vocabulary table to obtain word frequency vectors, and converting the content of each resource into the word frequency vectors through the operation.
6. The method according to claim 1, wherein in the task model, a task distribution probability of a given time slice and a resource distribution probability of a given task and a distribution probability of an occurrence of a task with respect to a certain resource are obtained.
7. The method of claim 1, wherein in the topic model, a topic distribution probability of a given resource and a word distribution probability of a given topic are obtained.
8. The information association method based on user operation record and resource content as claimed in claim 1, wherein the method of calculating the association degree is: and calculating the similarity of the probability distribution of the current resource and other resources in the topic model and the task model according to the Kullback-Leibler model distance, and weighting to obtain the total distance.
9. The information correlation method based on user operation record and resource content as claimed in claim 6, wherein parameter estimation is performed by Gibbs sampling in the learning of the task model and the subject model.
10. The information correlation method based on user operation record and resource content according to claim 1, wherein the user computer installs Windows or Android system.
CN201210345320.1A 2012-09-17 2012-09-17 Based on the information correlation method of user operation records and resource content Active CN102915335B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210345320.1A CN102915335B (en) 2012-09-17 2012-09-17 Based on the information correlation method of user operation records and resource content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210345320.1A CN102915335B (en) 2012-09-17 2012-09-17 Based on the information correlation method of user operation records and resource content

Publications (2)

Publication Number Publication Date
CN102915335A true CN102915335A (en) 2013-02-06
CN102915335B CN102915335B (en) 2016-04-27

Family

ID=47613702

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210345320.1A Active CN102915335B (en) 2012-09-17 2012-09-17 Based on the information correlation method of user operation records and resource content

Country Status (1)

Country Link
CN (1) CN102915335B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150374A (en) * 2013-03-11 2013-06-12 中国科学院信息工程研究所 Method and system for identifying abnormal microblog users
CN103412880A (en) * 2013-07-17 2013-11-27 百度在线网络技术(北京)有限公司 Method and device for determining implicit associated information between multimedia resources
CN103631970A (en) * 2013-12-20 2014-03-12 百度在线网络技术(北京)有限公司 Method and device for mining associated relationship between attributes and entities
CN104572707A (en) * 2013-10-18 2015-04-29 北京卓易讯畅科技有限公司 Preferable object information providing method and device
CN105376649A (en) * 2015-11-24 2016-03-02 江苏有线技术研究院有限公司 Set top box blind operation method and system for realizing accurate combination recommendation
CN105825415A (en) * 2016-03-15 2016-08-03 广东省科技基础条件平台中心 S&T (Science and Technology) resource supply and demand matching method
CN106469206A (en) * 2016-08-31 2017-03-01 广州酷狗计算机科技有限公司 The method and apparatus of pushed information
CN106570157A (en) * 2016-11-03 2017-04-19 北京金山安全软件有限公司 Picture pushing method and device and electronic equipment
CN106663103A (en) * 2014-06-18 2017-05-10 微软技术许可有限责任公司 Scalable eventual consistency system using logical document journaling
CN106777304A (en) * 2016-12-30 2017-05-31 中国民航信息网络股份有限公司 The method for pushing and device of theme
CN107391546A (en) * 2017-06-01 2017-11-24 浙江唯见科技有限公司 The fully associative method and system of VR resources
CN103793465B (en) * 2013-12-20 2018-06-22 武汉理工大学 Mass users behavior real-time analysis method and system based on cloud computing
CN108563648A (en) * 2017-11-29 2018-09-21 腾讯科技(上海)有限公司 data display method and device, storage medium and electronic device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101968798A (en) * 2010-09-10 2011-02-09 中国科学技术大学 Community recommendation method based on on-line soft constraint LDA algorithm
CN102004774A (en) * 2010-11-16 2011-04-06 清华大学 Personalized user tag modeling and recommendation method based on unified probability model
CN102495872A (en) * 2011-11-30 2012-06-13 中国科学技术大学 Method and device for conducting personalized news recommendation to mobile device users

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101968798A (en) * 2010-09-10 2011-02-09 中国科学技术大学 Community recommendation method based on on-line soft constraint LDA algorithm
CN102004774A (en) * 2010-11-16 2011-04-06 清华大学 Personalized user tag modeling and recommendation method based on unified probability model
CN102495872A (en) * 2011-11-30 2012-06-13 中国科学技术大学 Method and device for conducting personalized news recommendation to mobile device users

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
崔凯 等: "《一种基于LDA的在线主题演化挖掘模型》", 《计算机科学》 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150374A (en) * 2013-03-11 2013-06-12 中国科学院信息工程研究所 Method and system for identifying abnormal microblog users
CN103412880B (en) * 2013-07-17 2017-02-22 百度在线网络技术(北京)有限公司 Method and device for determining implicit associated information between multimedia resources
CN103412880A (en) * 2013-07-17 2013-11-27 百度在线网络技术(北京)有限公司 Method and device for determining implicit associated information between multimedia resources
CN104572707A (en) * 2013-10-18 2015-04-29 北京卓易讯畅科技有限公司 Preferable object information providing method and device
CN103631970A (en) * 2013-12-20 2014-03-12 百度在线网络技术(北京)有限公司 Method and device for mining associated relationship between attributes and entities
CN103631970B (en) * 2013-12-20 2017-08-18 百度在线网络技术(北京)有限公司 The method and apparatus for excavating attribute and entity associated relation
CN103793465B (en) * 2013-12-20 2018-06-22 武汉理工大学 Mass users behavior real-time analysis method and system based on cloud computing
CN106663103A (en) * 2014-06-18 2017-05-10 微软技术许可有限责任公司 Scalable eventual consistency system using logical document journaling
CN106663103B (en) * 2014-06-18 2020-08-18 微软技术许可有限责任公司 Scalable eventual consistency system using logical document logs
CN105376649A (en) * 2015-11-24 2016-03-02 江苏有线技术研究院有限公司 Set top box blind operation method and system for realizing accurate combination recommendation
CN105376649B (en) * 2015-11-24 2018-09-14 江苏有线技术研究院有限公司 Realize the blind operating method of the set-top box of accurate combined recommendation and system
CN105825415A (en) * 2016-03-15 2016-08-03 广东省科技基础条件平台中心 S&T (Science and Technology) resource supply and demand matching method
CN106469206A (en) * 2016-08-31 2017-03-01 广州酷狗计算机科技有限公司 The method and apparatus of pushed information
CN106570157B (en) * 2016-11-03 2020-04-17 北京金山安全软件有限公司 Picture pushing method and device and electronic equipment
CN106570157A (en) * 2016-11-03 2017-04-19 北京金山安全软件有限公司 Picture pushing method and device and electronic equipment
CN106777304A (en) * 2016-12-30 2017-05-31 中国民航信息网络股份有限公司 The method for pushing and device of theme
CN106777304B (en) * 2016-12-30 2020-03-20 中国民航信息网络股份有限公司 Theme pushing method and device
CN107391546B (en) * 2017-06-01 2020-07-07 浙江唯见科技有限公司 Method and system for full association of VR resources
CN107391546A (en) * 2017-06-01 2017-11-24 浙江唯见科技有限公司 The fully associative method and system of VR resources
CN108563648A (en) * 2017-11-29 2018-09-21 腾讯科技(上海)有限公司 data display method and device, storage medium and electronic device
CN108563648B (en) * 2017-11-29 2021-06-25 腾讯科技(上海)有限公司 Data display method and device, storage medium and electronic device

Also Published As

Publication number Publication date
CN102915335B (en) 2016-04-27

Similar Documents

Publication Publication Date Title
CN102915335B (en) Based on the information correlation method of user operation records and resource content
AU2022201654A1 (en) System and engine for seeded clustering of news events
CN103176985B (en) The most efficient a kind of internet information crawling method
Sukanya et al. Techniques on text mining
KR101681109B1 (en) An automatic method for classifying documents by using presentative words and similarity
WO2006116516A2 (en) Temporal search results
CN108563783B (en) Financial analysis management system and method based on big data
MXPA05009467A (en) Systems, methods, and interfaces for providing personalized search and information access.
EP1897002A2 (en) Sensing, storing, indexing, and retrieving data leveraging measures of user activity, attention, and interest
CN104298776A (en) LDA model-based search engine result optimization system
CN106776672A (en) Technology development grain figure determines method
CN101231640A (en) Method and system for automatically computing subject evolution trend in the internet
CA2956627A1 (en) System and engine for seeded clustering of news events
Kanhabua et al. Learning to detect event-related queries for web search
WO2015143263A1 (en) Publication scope visualization and analysis
KR20140081721A (en) System and method for deducting imporant keyword using textmining, and a medium having computer readable program for executing the method
Ceolin et al. Capturing the ineffable: Collecting, analysing, and automating web document quality assessments
CN103425748B (en) A kind of document resources advise the method for digging and device of word
KR101753768B1 (en) A knowledge management system of searching documents on categories by using weights
CN113449077A (en) News popularity calculation method, equipment and storage medium
JP5217518B2 (en) Relationship information acquisition system, relationship information acquisition method, and relationship information acquisition program
Guan et al. Research and design of internet public opinion analysis system
Imambi et al. A novel feature selection method for classification of medical documents from pubmed
Campbell et al. An approach for the capture of context-dependent document relationships extracted from Bayesian analysis of users' interactions with information
Berendt et al. STORIES in time: a graph-based interface for news tracking and discovery

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant