CN103577579A - Resource recommendation method and system based on potential demands of users - Google Patents

Resource recommendation method and system based on potential demands of users Download PDF

Info

Publication number
CN103577579A
CN103577579A CN201310549102.4A CN201310549102A CN103577579A CN 103577579 A CN103577579 A CN 103577579A CN 201310549102 A CN201310549102 A CN 201310549102A CN 103577579 A CN103577579 A CN 103577579A
Authority
CN
China
Prior art keywords
user
theme
resource
descriptor
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310549102.4A
Other languages
Chinese (zh)
Other versions
CN103577579B (en
Inventor
王庆红
李鹏
周育忠
陶秀洁
龚婷
陈传夫
王平
王晓光
冉从敬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
CSG Electric Power Research Institute
Research Institute of Southern Power Grid Co Ltd
Original Assignee
Wuhan University WHU
Research Institute of Southern Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU, Research Institute of Southern Power Grid Co Ltd filed Critical Wuhan University WHU
Priority to CN201310549102.4A priority Critical patent/CN103577579B/en
Publication of CN103577579A publication Critical patent/CN103577579A/en
Application granted granted Critical
Publication of CN103577579B publication Critical patent/CN103577579B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of data retrieval, and discloses a resource recommendation method and system based on potential demands of users. The method includes the steps that resources are clustered and subjects of the resources are extracted by the utilization of a text clustering and subject mining algorithm; based on a clustering result, a subject term of each subject is calculated, and then a subject term list in the corresponding field is acquired; the resources are automatically indexed by the utilization of the subject term lists, and subject terms contained in each independent resource are calculated; by the combination of records of the users for operating the independent resources and the attributes of the users, the attention of the users for certain subjects is calculated; a user demand model is built, and similarity of the subjects between the users is calculated; the authority of designated information to the subjects is calculated by the utilization of the relations among data in the independent resources; the resources are screened according to the user demand model, and the resources with the high matching degree are recommended to the users. According to the method and system, by the utilization of the close correlation of the potential information demands of the users and professional fields of the users, the information resources matched with the demands of the users can be more accurately recommended to the users.

Description

Resource recommendation method based on user's potential demand and system
Technical field
The present invention relates to data retrieval field, especially relate to a kind of resource recommendation method and system based on user's potential demand.
Background technology
Along with the development of Web2.0 technology, by this quick and easy information carrier of Internet, people can create thousands of information every day, so it has become people's one of Important Platform of obtaining information in daily life.But, after quantity of information rapid expanding, thing followed problem is that information is spread unchecked problem, the data of countless repetitions are presented in face of user by Internet, make people want to become very difficult and consuming time by the interested information of NetFind oneself, and this phenomenon is called " information overload ".
In the face of such challenge, the search engine based on internet arises at the historic moment.People can utilize search website to search the data of oneself wanting conventionally; but the keyword that general search engine is only just inputted by user mates relevant information and it is returned to user; the content of returning during different user input same detection condition is identical, does not consider that individual subscriber interest carries out differentiation design.Therefore, the information sifting that search engine can provide is limited in one's ability, can not fundamentally solve the problem of information overload.
In addition, the another one distinguishing feature of search engine is exactly that employing information pulls mode, and user, according to the information requirement of oneself, " pulls " in face of user from internet by information.If but user cannot accurate description oneself information requirement and inputted inappropriate term, will pull also unmatched information resources of a large amount of and own demand.These problems are to causing existing search engine to be difficult to embody all sidedly the demand difference of different user, and search efficiency, degree of accuracy and user satisfaction are difficult to the state that reaches desirable.
Summary of the invention
For the above-mentioned defect existing in prior art, technical matters to be solved by this invention is how for the difference of different user, to provide accurate information.
For solving the problems of the technologies described above, on the one hand, the invention provides a kind of resource recommendation method based on user's potential demand, the method comprising the steps of:
S1, utilizes text cluster and Topics Crawling algorithm to carry out cluster and subject extraction to resource;
S2, based on cluster result, calculates the descriptor under each theme, obtains the thesaurus in corresponding field;
S3, utilizes thesaurus to carry out automatic indexing to resource, calculates the descriptor that each independent resource comprises;
S4, to the operation note of independent resource and user property, calculates the attention rate of user to certain theme in conjunction with user, sets up user's request model and calculates the Topic Similarity between user; Utilize the relation between data in independent resource to calculate the technorati authority of appointed information to theme;
S5, according to user's request model discrimination resource, by the higher resource recommendation of matching degree to user.
Preferably, in described step S1, adopt improved stratification subject extraction model hLDA to carry out described cluster and subject extraction.
Preferably, in described step S4, the Topic Similarity calculation procedure between user u and v is:
Model user u and v demand model M separately uand M v; Remember M simultaneously uand M vtheme set is separately
Figure BSA0000097293370000021
with
According to M uand M vin the theme that comprises set up theme set n is M uand M veach self-contained theme number sum;
Figure BSA0000097293370000024
calculate respectively the T that user u and v are right iattention rate S (u, T i) and S (v, T i);
At theme space { T 1, T 2..., T non set up respectively theme attention rate vector U and V:U={S (u, the T of user u, v 1), S (u, T 2) ..., S (u, T n) and V={S (v, T 1), S (v, T 2) ..., S (v, T n); The cosine value of the angle of compute vector U and V is as the Topic Similarity between u and v.
Preferably, in described step S5, according to user's request model M uscreening resource comprises step:
For M ueach theme comprising, by the standard descriptor under this theme and accordingly auxiliary word put into vocabulary Dic; After all themes are disposed, vocabulary Dic has comprised model M uin all standard descriptor and auxiliary words;
For M ueach the theme T comprising, obtains all documents of comprising this theme, and these documents are put into set Docs; After all themes are disposed, set Docs is all M that comprised uin the collection of document of at least one theme;
Each document in pair set Docs, the word occurrence number sum TF in the document in statistics vocabulary Dic dic; In set Docs after all Document Statistices, according to the TF of each document dicsort, several the most forward documents are recommended to user.
Preferably, for user u, its user's request model M ube expressed as: M u=(A u, T u), A wherein uuser u community set, A={a 1, a 2..., a n, attribute a ithe attribute being associated with demand, T uthe set of the theme paid close attention to of user u, T ube represented as the theme T that user u pays close attention to iset, i=1,2 ..., n.
Preferably, in described step S2, utilize mutual information to carry out descriptor calculating:
Calculate after the mutual information of each candidate key words and corresponding theme, according to the value of mutual information is descending, sort; Finally get front several candidate word of mutual information value maximum as the descriptor of this theme.
Preferably, in described step S2, after calculating descriptor:
Also adopt the way of manual intervention to examine the descriptor of calculating, the descriptor that audit is passed through enters standard thesaurus;
Meanwhile, utilize the hierarchical relationship between descriptor to set up the upper and lower relation between descriptor in standard thesaurus;
And utilize HowNet as synonymicon, calculate the synonym of each descriptor in standard thesaurus.
Preferably, in described step S5, also according to the Topic Similarity between described user, utilize the user's that similarity is the highest demand model to carry out similar recommendation to targeted customer; And/or
According to described appointed information, the technorati authority of theme being carried out to authority to user recommends.
On the other hand, the present invention also provides a kind of resource recommendation system based on user's potential demand simultaneously, and this system comprises:
Pretreatment module, for utilizing text cluster and Topics Crawling algorithm to carry out cluster and subject extraction to resource;
Thesaurus module, for based on cluster result, calculates the descriptor under each theme, obtains the thesaurus in corresponding field;
Index module, for utilizing thesaurus to carry out automatic indexing to resource, calculates the descriptor that each independent resource comprises;
Computing module, in conjunction with user to the operation note of independent resource and user property, calculate the attention rate of user to certain theme; Set up user's request model and calculate the Topic Similarity between user; Utilize the relation between data in independent resource to calculate the technorati authority of appointed information to theme;
Recommending module, for according to user's request model discrimination resource, by the higher resource recommendation of matching degree to user.
The invention provides a kind of resource recommendation method and system based on user's potential demand, utilize user's potential Intelligence Request and the professional domain Close relation of oneself, by excavating the potential Intelligence Request of user based on professional domain, can be more accurately to user recommends and user's request matches resource of information.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of the resource recommendation method based on user's potential demand in one embodiment of the present of invention;
Fig. 2 is clustering documents and subject extraction model schematic diagram in a preferred embodiment of the present invention;
Fig. 3 is descriptor computation process schematic flow sheet in a preferred embodiment of the present invention;
Fig. 4 is automatic indexing process flow schematic diagram in a preferred embodiment of the present invention;
Fig. 5 is author in a preferred embodiment of the present invention, research institution's technorati authority computation process schematic flow sheet;
Fig. 6 is the theme technorati authority model framework figure of author, research institution in a typical application scenarios of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described.Obviously, described embodiment is for implementing better embodiment of the present invention, and described description is to illustrate that rule of the present invention is object, not in order to limit scope of the present invention.Protection scope of the present invention should with claim the person of being defined be as the criterion, the embodiment based in the present invention, the every other embodiment that those of ordinary skills obtain under the prerequisite of not making creative work, belongs to the scope of protection of the invention.
Existing search engine information sifting is limited in one's ability and comparatively passive, in order to address the above problem, there is personalizedization recommended technology: at information resources service end by the demand of analysis user, user may interested information be initiatively pushed to user.The principal feature of commending system for it be the process of an active push, this active push mode of commending system has just in time overcome the defect of traditional search engines information pull mode: how accurately user often the also unclear information requirement of expression oneself, or also do not know the information requirement of oneself, just cannot obtain valuable information with search engine yet.
The core of personalized recommendation technology is exactly how analysis mining goes out the potential information requirement of user, such as utilizing user's Operation Log (as the resources such as books, song, film are browsed to record etc.) to analyze individual preference, geographical location information of user etc., and to targeted customer, recommend associated out of Memory resource on this basis.In current user-customized recommended technology, collaborative filtering is most study, most widely used recommended technology, and it is that Operation Log analysis based on other users obtains the content recommendation to targeted customer, and the personalized degree of recommendation is very high.As long as the current recommend method all things considered based on collaborative filtering thought is divided into two large classes, the one, the proposed algorithm based on user's similarity, a class is the proposed algorithm of content-based filtration.Proposed algorithm based on similarity is exactly to describe the incidence relation between user-resource by building user-resource matrix, calculates on this basis the similarity between user, then by with the information recommendation of the similar users of targeted customer to targeted customer; Thereby the proposed algorithm of content-based filtration analyzes by the information resources content that user had been browsed the characteristic model that obtains user, then utilize user characteristics model to give a mark to resource, the high attraction of score is given targeted customer by recommended.These class methods have made up the Sparse Problems of user-project rating matrix to a certain extent in conjunction with methods such as machine learning, data minings.
Above research contents is the common information resource based on Internet mostly, and as webpage, song, video etc., the interested resource category of user is extensive, therefore utilizes collaborative filtering and information filtering to be difficult to the affiliated field of accurately analysis user resource requirement.Yet for thering is the structuring of relatively complete metadata or semi-structured resource (typically as professional digital resource of information, as the digital resource of information of colleges and universities, scientific research institutions, large enterprise etc.), there is the following characteristics that is obviously different from the upper public resource of Internet: resource of information has professional domain classification very clearly; Resource of information has relatively complete metadata, as author, keyword, classification number etc.; User uses digital library or scientific and technological information platform generally to need authentication, and identity information is clearer and more definite simultaneously, except user name, also has the information such as institutional affiliation; The purpose of user's Gains resources is stronger, and the resource of information obtaining is closely related with own interested research field.So traditional personalized recommendation method based on Internet public resource can not meet the requirement that staff intelligence resource information system personalized information desired more accurately, based on professional domain interest is recommended.
In an embodiment of the present invention, by excavating the potential Intelligence Request of user based on professional domain, can be more accurately to user recommends and user's request matches resource of information.Referring to Fig. 1, in one embodiment of the invention, the resource recommendation method based on user's potential demand comprises step:
S1, utilizes text cluster and Topics Crawling algorithm to carry out cluster and subject extraction to resource;
S2, based on cluster result, calculates the descriptor under each theme, obtains the thesaurus in corresponding field;
S3, utilizes thesaurus to carry out automatic indexing to resource, calculates the descriptor that each independent resource comprises;
S4, to the operation note of independent resource and user property, calculates the attention rate of user to certain theme in conjunction with user; Set up user's request model and calculate the Topic Similarity between user; Utilize the relation between data in independent resource to calculate the technorati authority of appointed information to theme;
S5, according to user's request model discrimination resource, by the higher resource recommendation of matching degree to user.
Below the various optimal ways of above-described embodiment are done to further expansion explanation, in a preferred embodiment of the invention, for further outstanding Technique Rule of the present invention and actual effect, scope of resource is limited in technical information information, but relevant technical staff in the field should be appreciated that, technical information information is a concrete classification in total data resource, for other, there is the structuring of relatively complete metadata or semi-structured resource (is carried out the structured network file of mark, as XML, HTML etc. with general format; Or have clear and definite field to be described further resource, as patent documentation etc.; Or other are through the sorted resource of roughing), technical scheme of the present invention obviously also can directly apply to wherein, so the preferred embodiments of the present invention should not regarded limitation of the present invention as.
The user of scientific and technological information infosystem, in order to obtain own interested resource, generally can screen by following three kinds of approach: research institution under term, literature author, author.Every piece of scientific and technical literature has specific theme, and each theme has one group of descriptor to be described theme; Every piece of scientific and technical literature is all associated with author, and each author has specific research field, and author's research field can be described by the theme of chapter that author sends the documents; Each author and own affiliated ,Ru university of research institution, scientific research institutions are associated; Between scientific and technical literature, may there is topic relativity; Between author, between mechanism, may there is common dispatch, the such cooperative relationship of shared science and technology item.Therefore, between user's research theme, scientific and technical literature, author and research institution, there is potential comparatively complicated incidence relation.These incidence relations will be fully utilized in the present invention, as digging user potential demand and the foundation to user's recommendation.Meanwhile, the present invention not only recommends scientific and technical literature to user, but also utilizes the incidence relation between research theme, document, author, research institution to user, to recommend authoritative author and the authoritative research institution of user's domain of interest.
Wherein, in a preferred embodiment of the invention, in order effectively to build thesaurus, first need to from resource of information, extract the theme containing.What in subject extraction method, be most widely used is LDA topic extraction model, and this is that a kind of conventional three layers of Bayesian probability generate topic models, and the relation between word, document and potential semantic topic three is highlighted.Its parameter can not increase and linear growth has good generalization ability along with document sets, is very popular models of field such as machine learning, information retrieval.But, before LDA model carries out Topics Crawling to extensive collection of document, need in advance artificial designated key number K.But generally,, a given extensive collection of document, cannot determine in advance and wherein comprise how many themes.Meanwhile, traditional LDA model can not be by document automatic cluster in subject extraction process, and the theme therefore extracting does not have semantic hierarchies relation.
Therefore, in step S1 of the present invention, preferably adopt improved stratification subject extraction model hLDA to carry out the extraction of resource theme, extract the theme of processing and there is semantic hierarchies relation, can automatically to document, carry out cluster simultaneously.The more important thing is, improved hLAD model of the present invention takes full advantage of the quoted passage relation in scientific and technical literature: having the document quote and to be drawn relation is more likely to belong to same theme, and more likely by cluster together.The improved hLDA model of the present invention is as shown in Figure 2:
In improved hLDA model, adopt following symbol to carry out the correlation parameter of mark document clustering and subject extraction model, node T represents the set of paths of L layer tree; The super parameter of priori that the path probability that γ is tree distributes; NCRP is a statistic processes, and its allocation probability is distributed in the tree of unlimited range, the unlimited degree of depth; C 1, C 2, C 3..., C lrepresent the node in tree; α is the proportion between implicit theme, is to describe the potential theme prior distribution super parameter of collection of document on the theme level of the tree at its place; θ is the distribution proportion of document on theme, and θ obedience Dirichlet distribution Dir (θ | α), represent the weight of destination document m each implicit theme in the theme level at its place; Z represents the theme containing in document; W represents the word in document; The lexical item of each the node theme in β representative tree distributes; The super parameter of prior distribution that η distributes for describing theme lexical item; Parameter lambda determines that topic is from quoted passage m ' or the ratio of document m self; Prior probability ψ is depended in the distribution of λ; Stochastic variable s represents that the adduction relationship between document m and m: s=0 represents that document m does not have citing document m ', so the theme of document m is determined by the topic distribution prior probability α of document itself and the topic distribution θ of document itself completely; If s=1, the theme of document m is determined jointly by m and m ', and parameter lambda determines that topic is from quoted passage m or the ratio of document m self; Prior probability ψ is depended in the distribution of λ.
Based on above-mentioned improved hLDA model, the word process that extracts a document through excavation is as follows:
For each the theme k ∈ T in tree, the descriptor that generates β~Dirichlet (η) distributes;
To each piece of document m, according to C mthe path of~nCRP (γ) spanning tree;
L dimension theme to the document distributes, if s=0 generates θ m~Dirichlet (α); If s=1, first generates λ~Dirichlet (ψ), λ determines topic from quoted passage m or the ratio of document m self; Then generate θ m~λ Dirichlet (α)+(1-λ) Dirichlet (α ');
To n word in document, the theme Z that selects this word to give m, n| Mult (θ m), select subsequently word W m,n| { Z m,n, C m, β } and~Mult (β, C m[Z m, n]).
Wherein, in cluster and subject extraction, adopt the Gibbs methods of sampling to estimate model parameter.Gibbs sampling only need be to variable Z m,i(i word W in document m ithe theme of giving) and variable C m, l(the l layer theme of document m in theme hierarchical tree path) estimated to calculate.The process of whole Gibbs sampling is divided into following two steps:
First, predictor Z m, i, its condition posterior probability distribution and expression formula is as follows:
P ( Z m , i = j | Z m , - i , W , Z m ′ , - i , W ) ∝ λ n - i , j ( w i ) + βn - i , j ( d m ) + α n - i , j ( . ) + Wβn - i , . ( d m ) + Tα + ( 1 - λ ) n - i , j ( m ′ , w i ) + βn - i , j ( d m ′ ) + α n - i , j ( . ) + Wβn - i , . ( d m ′ ) + Tα ;
Wherein, Z m ,-irepresent all other k ≠ i word W in document m ktheme give situation; Z m ' ,-i, W represents all other k ≠ i word W in document m ' ktheme give situation;
Figure BSA0000097293370000082
represent to be given in document m the word W of theme j inumber;
Figure BSA0000097293370000083
represent to be endowed in document m the word number of theme j;
Figure BSA0000097293370000084
represent to be endowed total word number of theme j; represent the total words in document m;
Figure BSA0000097293370000086
represent to be given in document m ' the word W of theme j inumber;
Figure BSA0000097293370000087
represent to be endowed in document m ' word number of theme j; represent the total words in document m '; Parameter lambda determines that topic is from quoted passage m ' or the ratio of document m self; α and β are respectively the priori that document subject matter distributes and theme lexical item distributes.
Secondly, predictor C m, l, its condition posterior probability distribution and expression formula is as follows:
p(C m|W,C -m,Z)∝p(W m|C,W -m,Z)p(C m|C -m);
Wherein, W -mand C -mthe word and the path of document m in theme hierarchical tree that represent respectively all documents except document m; Use Bayes rule, p (W m| C, W -m, Z) be the maximum likelihood function of document m, p (C m| C -m) be C mprior probability in theme hierarchical tree.P(W m| C, W -m, computing formula Z) is as follows:
( W m | C , W - m , Z ) = Π l = 1 L ( Γ ( n C m , l , - m ( · ) ) + Wη ΠwΓ ( n C m , l , - m ( w ) + η ) Π w Γ ( n C m , l , - m ( w ) + n C m , l , m ( w ) + η ) Γ ( n C m , l , - m ( · ) + n C m , l , m ( · ) + Wη ) ;
Wherein,
Figure BSA00000972933700000810
represent that the word w in document m is endowed theme C m, lnumber; represent that all words in document m are endowed theme C m, lnumber.
Figure BSA00000972933700000812
word w in all documents of expression except document m is endowed theme C m, lnumber.
Figure BSA00000972933700000813
all words in all documents of expression except document m are endowed theme C m, lnumber.W is the sum of word in dictionary, and Γ () is standard gamma function.
By the excavation of topic model, the potential theme lying in scientific and technical literature is found automatically, and the document in literature collection will carry out cluster according to the theme of automatic discovery.Next a very important step is to find out the one group of descriptor that represents each theme.Can adopt the descriptor of calculating with the following method each theme: each theme represents by one group of a plurality of document that belong to this theme, subject extraction and document clustering algorithm that the collection of document of each theme has been described by step S1 complete; On this basis, for a theme and belong to the collection of document of this theme, find out one group of word that can represent the document set.This group word be by calculate each word with belong to filter out after the mutual information of document of this theme can represent this theme before M word obtain.
In step S2 of the present invention, utilize mutual information to carry out descriptor calculating, mutual information is a Useful Information tolerance, and it refers to two correlativitys between event sets.The mutual information of two stochastic variable X and Y is defined as:
I ( X , Y ) = Σ y ∈ Y Σ x ∈ X p ( x , y ) log p ( x , y ) p . ( x ) p . ( y ) ;
Wherein p (x, y) is the joint probability distribution function of stochastic variable X and Y, and p. (x) and p. (y) are marginal probability distribution function.Particularly, utilize flow process that mutual information carries out descriptor calculating as shown in Figure 3, concrete steps are:
S21, while carrying out topic word filtering, definition stochastic variable U and C, when one piece of document package is containing descriptor t, U value is e t=1, when one piece of document does not comprise descriptor t, U value is e t=0.When one piece of document package is contained in theme c, C value is e c=1.When one piece of document is not contained in theme c, C value is e c=0;
S22, for a word t in a theme c, the mutual information of word t and theme c is:
I ( U , C ) = Σ e t ∈ { 0,1 } Σ e c ∈ { 0,1 } p ( U = e t , C = e c ) log 2 p ( U = e t , C = e c ) p ( U = e t ) p ( U = e c ) ;
After adopting maximal possibility estimation, above formula equals:
I ( U , C ) = N 11 N log 2 NN 11 N 1 . N . 1 + N 01 N log 2 NN 01 N 0 . N . 1 + N 10 N log 2 NN 10 N 1 . N . 0 + N 00 N log 2 NN 00 N 0 . N . 0 ;
N wherein 10represent to comprise descriptor t but the number of files in theme c not N 11represent to comprise the descriptor t also number of files in theme c simultaneously, N 01represent not comprise descriptor t but number of files in theme c N 00represent not comprise the descriptor t number of files in theme c not simultaneously, N 1.represent the total number of documents that comprises descriptor t, N .1represent to be included in the total number of documents in theme c, N 0.represent not comprise the total number of documents of descriptor t, N .0represent the not total number of documents in theme c, N is all total number of documents.
S23, supposes that the descriptor candidate collection by all documents under a Topics Crawling theme c is out W={w 1, w 2..., w n.Calculate after the mutual information of each candidate key words and this theme c, according to the value of mutual information is descending, sort.Finally get front several candidate word of mutual information value maximum as the descriptor of this theme.
In addition, after calculating descriptor, also can adopt the way of manual intervention to examine the descriptor of calculating, the descriptor that audit is passed through enters standard thesaurus.Meanwhile, utilize the hierarchical relationship between descriptor to set up the upper and lower relation between descriptor in standard thesaurus; And utilize HowNet as synonymicon, calculate the synonym of each descriptor in standard thesaurus.
The automatic indexing of standard descriptor is the basic work of Text Automatic Processing, is constructing on the basis of thesaurus, every piece of scientific and technical literature is calculated automatically to the standard descriptor containing.Can effectively improve user search, text classification, the precision that scientific literature is mated with user's request.
Further, in order to describe document automatic indexing algorithm, adopt following symbol: standard subject heading list NW is the set of standard descriptor, NW={nw 1, nw 2..., nw n, wherein N is the number of standard descriptor in standard thesaurus; Right
Figure BSA0000097293370000101
(i=1,2 ..., N), its synonym set is designated as
Figure BSA00000972933700001021
Figure BSA0000097293370000102
wherein K is standard descriptor nw isynonym number; In standard thesaurus NW, the set of all standard descriptor is designated as SW; Scientific and technical literature set is designated as D, D={d (1), d (2)..., d (M), the number that wherein M is document; Note document d (i)the set of middle standard descriptor is
Figure BSA0000097293370000103
note document d ithe synset of middle standard descriptor is combined into
Figure BSA0000097293370000104
based on this, as shown in Figure 4, concrete steps are the document automatic indexing flow process adopting in step S3 of the present invention:
S31, adds the dictionary of participle device to carry out participle to document NW and SW; Right ∀ d ( i ) ∈ D ( i = 1,2 , . . . , M ) , D (i)be represented as lexical item set
Figure BSA0000097293370000106
? W d ( i ) = { w 1 , w 2 , . . . , w n } ;
S32,
Figure BSA0000097293370000108
if w ∈ is NW, w is added
Figure BSA0000097293370000109
? NW d ( i ) = NW d ( i ) + { w } ;
S33,
Figure BSA00000972933700001012
if w ∈ is SW, w is added
Figure BSA00000972933700001023
? SW d ( i ) = SW d ( i ) + { w } ;
S34,
Figure BSA00000972933700001014
synonym relation by standard thesaurus finds its corresponding synonym standard descriptor nw, and nw is added ? NW d ( i ) = NW d ( i ) + { nw } ;
S35, has calculated document d to this (i)the set of all standard descriptor that comprise note wherein L is document d (i)the number of all standard descriptor that comprise;
S36,
Figure BSA00000972933700001019
calculate modular word w about the weighted value of original text:
S ( w ) = ( w t · f t + w a · f a + w fls · f fls + w c · f c ) · w len · log w tf / idf ;
Wherein, w t, w a, w fls, w cbe respectively modular word w at the head and the tail sentence of title, summary, text paragraph, the weight that other parts of text paragraph occur; f t, f a, f fls, f cbe respectively modular word w at the head and the tail sentence of title, summary, text paragraph and the number of times of text other parts appearance, w lenlength for modular word w; w tf/idftf/idf value for modular word w.
Get front 5 words of weighted value maximum as the automatic indexing word of this scientific and technical literature.
User's theme attention rate has been described the interest level of user to certain theme.Vacuum metrics user's theme attention rate of the present invention has been considered following factor: the number of times of the document that belongs to this theme that user browsed; The number of times of the document that belongs to this theme that user downloaded; The author who belongs to this motif document that user browses or downloaded or the technorati authority of research institution.Consider that these factors are based on following reasonable assumption: the number of times of the document that belongs to this theme that user browses, downloaded is more, the author who belongs to this motif document that user browses or downloaded or the technorati authority of research institution are higher, illustrate that user is higher to the attention rate of this theme.
The present invention adopts following symbol to describe the computing method of user's theme attention rate: user u is designated as S (u, T) to the attention rate of theme T; The collection of document that user u browsed is designated as D bu; The collection of document that user u downloaded is designated as D du; The document author set of user's browsing and download is A u; The document author institutional affiliation set of user's browsing and download is O u; Author a is designated as C (a, T) to the technorati authority of theme T; The o of mechanism is designated as C (o, T) to the technorati authority of theme T.The present invention is as follows to the calculation procedure of S (u, T):
Collection of document D bu, D duin each piece of document, extract the theme of every piece of article;
Statistic document set D buin comprise theme T article record, be designated as
Figure BSA0000097293370000111
Statistic document set D duin comprise theme T article record, be designated as
Figure BSA0000097293370000112
Add up the article record that comprises theme T in all scientific and technical literatures, be designated as N t;
Utilize lower formula to calculate S (u, T):
S ( u , T ) = ( 0.5 N bu T + N du T ) log N N T + Σ a ∈ A u C ( a , T ) + Σ o ∈ o u C ( o , T ) ;
Wherein: N is total number of documents, the inverse document frequency being the theme, if theme T occurs in more documents, illustrates that the ubiquity of this theme is higher, therefore
Figure BSA0000097293370000115
with
Figure BSA0000097293370000116
weight will reduce.
Except User operation log, another key factor that proposed algorithm need to be considered is exactly the similarity between user, according to Operation Log, cannot determine the targeted customer to recommend content time, can to targeted customer, recommend according to the content recommendation of similar users.Similarity between user can be recommended by the user characteristics based on different.The proposed algorithm proposing due to the present invention is the demand based on user, and user's request descriptive model is set up based on theme, therefore the present invention is based on theme and calculates the similarity between user.
Preferably, the Topic Similarity calculation procedure between user u and v is:
Topic Similarity calculation procedure between user u and v is:
Model user u and v demand model M separately uand M v; Remember M simultaneously uand M vtheme set is separately with
Figure BSA0000097293370000122
According to M uand M vin the theme that comprises set up theme set
Figure BSA0000097293370000123
n is M uand M veach self-contained theme number sum;
Figure BSA0000097293370000124
calculate respectively the T that user u and v are right iattention rate S (u, T i) and S (v, T i);
At theme space { T 1, T 2..., T non set up respectively theme attention rate vector U and V:U={S (u, the T of user u, v 1), S (u, T 2) ..., S (u, T n) and V={S (v, T 1), S (v, T 2) ..., S (v, T n); The cosine value of the angle of compute vector U and V is as the Topic Similarity between u and v.
Wherein, for user u, its Requirements description model M urepresent M uby two tuples (A, T), represented, i.e. M u=(A u, T u), A wherein uuser u community set, A={a 1, a 2..., a n, attribute a wherein ithat the attribute that is associated with demand is as specialty, institutional affiliation, affiliated function, work position etc.; T uthe set of the theme paid close attention to of user u, T ube represented as T iset, T wherein i(i=1,2 ..., n) be the theme that user u pays close attention to, theme T ithe set being formed by a plurality of elements, T ii element for set { NW i, SNW i: { S 1, S 2..., S n, NW ifor describing theme T istandard descriptor, SNW istandard descriptor NW iauxiliary set of words, be used for to theme T isupplement description, auxiliary set of words SNW iby two parts content, formed: the one, standard descriptor NW isynonym in thesaurus, the keyword of the article that another part is browsed for user.
For user u, its Requirements description model M uthe obtaining step of the set of the theme that middle user u pays close attention to is as follows:
According to User operation log record, find the collection of document D of user's browsing and download;
Document subject matter extracts the theme set T that obtains collection of document D dthe set T of the theme of paying close attention to as user u u;
To theme set T din each theme T i, the construction step of its content is: from collection of document D, find and belong to theme T idocument subclass for document subclass
Figure BSA0000097293370000126
in each document d, calculate the standard descriptor that it contains, and join theme T ithe set of standard descriptor
Figure BSA0000097293370000127
in; For
Figure BSA0000097293370000128
in each standard descriptor NW i, the synonym by it in thesaurus joins NW iauxiliary set of words SNW iin; Simultaneously for document subclass
Figure BSA0000097293370000129
in each document d, if d comprises standard descriptor NW i, the keyword of document d is also joined to NW iauxiliary set of words SNW iin; By element { NW i, SNW i: { S 1, S 2..., S njoin theme T iin.
The commending system that the present invention realizes except according to user's request to user recommends resource of information, also to user, recommend authoritative author and the authoritative research institution relevant with user's request.Because user's request adopts the model based on theme, describe, therefore need to calculate author and mechanism about the technorati authority of certain theme.
The cooperative relationship that the theme technorati authority of author and mechanism is calculated based between author, between mechanism is calculated.As shown in Figure 5, author of the present invention, mechanism's theme technorati authority calculation procedure are as follows:
(1) utilize subject extraction algorithm to calculate the theme of all documents, the document that comprises designated key is picked out to the collection of document that forms this theme; Utilize that the common dispatch relation of each document in the document set is set up between author, the cooperative relationship figure between mechanism; Author relationships Tu, mechanism graph of a relation is merged into a heterogeneous network.
On this heterogeneous network, set up 3 random walk models, as shown in Figure 6, respectively: the author's random walk model G (A) setting up according to author's collaboration relation, the random walk model G of mechanism (O) setting up according to institution cooperation relation, the random walk model G of author mechanism (AO) setting up according to the affiliated relation of author and mechanism.In figure, the weight on every limit obtains by following factor weighted calculation: the quantity of jointly sending the documents; The citation times of each document of co-present.Set up author's popularity assessment models C (A).C (A) model mainly utilizes two features of author in information system: the document that comprises designated key is measured author's popularity about particular topic in system by the citation times of the document of collecting number of times and comprising designated key, as a Consideration of author impact degree.
(2) to simple substance node random walk model G (A) and G (O), adopt traditional PageRank algorithm, utilize homogeneity intra-node relation and tightness degree (weight) iteration mutually, calculate the pagerank value of each node M A = ( 1 - α ) A A + α n A I · I T , PR A n + 1 = M A · PR A n , M O = ( 1 - α ) A O + α n O I · I T And PR O n + 1 = M O · PR O n ;
Wherein, A a, A onetwork chart G a, G oadjacency matrix, the paper of author, mechanism of take is collaborateed the weight that number of times is adjacency matrix limit; M a, M ofor G a, G ofrom current state, jump to the probability transfer matrix of next state; I is that component is 1 column vector entirely; I ttransposed matrix for I; n a, n odimension for adjacency matrix;
Figure BSA0000097293370000133
for the n time distribution of pagerank of whole information, and
Figure BSA0000097293370000134
(3) for the mixing random walk model G (AO) of heterogeneous node, adopt HITS thought, regard author's node as Hub, mechanism node and regard Authorities as, set up Co-PageRank algorithm, calculate the metric of author and mechanism PR A n + 1 = λA AO · PR O n + ( 1 - λ ) A A · PR A n And PR O n + 1 = λA OA · PR A n + ( 1 - λ ) A O · PR O n ;
Wherein, parameter lambda has determined two subnetwork G aOsignificance level in peer metric assigning process, can be by controlling hybrid network G to the adjusting of λ aOinfluence power to peer metric, A aOoutgoing mechanism arrives author's probability transfer matrix, corresponding A oAfor the probability transfer matrix of author to mechanism, two probability transfer matrixs are according to setting up adjacency matrix with the affiliated pass between author and mechanism; Other symbol implications are with above describing.
(4) author's popularity assessment models C (A) main indexes is drawn number of times for collection rate and the document of author in system, and tolerance formula is as follows: C (A)=f a+ r a;
Wherein, f athe document that comprises particular topic that expression author delivers is by collection rate, and the document that comprises particular topic of delivering by author is collected number of times and weighed with the ratio of always collecting number of times all about the document of this particular topic; r arepresent author's document ratio that is cited, the document citation times that comprises particular topic of delivering by author is weighed with the ratio of always collecting number of times all about the document of this particular topic.
(5) integrate a plurality of modules, set up Integrated Evaluation Model, the final technorati authority value about particular topic T of author and mechanism is: PR o=λ PR o(G (O))+(1-λ) PR o(G (AO)) and PR a=α PR a(G (AO))+β PR a(G (A))+χ PR a(C (A));
Wherein, α, β, χ, λ are weight factor, control the significance level of modules to final technorati authority.Can, by regulating these parameter factors, adjust the influence power of each module.Three parameter alpha, β, χ meet alpha+beta+χ=1.
Finally, in step S5, according to user's request model M uselect the resource of information, authoritative author, the authoritative institution that match with demand to recommend user:
For M ueach theme comprising, by the standard descriptor under this theme and accordingly auxiliary word put into vocabulary Dic; After all themes are disposed, vocabulary Dic has comprised model M uin all standard descriptor and auxiliary words;
For M ueach the theme T comprising, obtains all documents of comprising this theme, and these documents are put into set Docs; After all themes are disposed, set Docs is all M that comprised uin the collection of document of at least one theme;
Each document in pair set Docs, the word occurrence number sum TF in the document in statistics vocabulary Dic dic; In set Docs after all Document Statistices, according to the TF of each document dicsort, several the most forward documents are recommended to user.
Preferably, according to model M uthe authoritative author that selection matches and the step of authoritative institution are as follows:
For M ueach theme comprising, calculates the attention rate of user to this theme; After all themes are disposed, by getting several the most forward themes as the set of candidate's theme after attention rate sequence;
For each theme of candidate's theme set-inclusion, calculate all authors and the research institution technorati authority under this theme; By getting the most forward several authoritative authors and research institution after technorati authority sequence, recommend user;
If the recommendation resource obtaining in said process, author, mechanism's quantity are very few, further utilize step S4 to find out the highest similar user, utilize the user's that similarity is the highest demand model to carry out similar recommendation to targeted customer.
One of ordinary skill in the art will appreciate that, the all or part of step realizing in above-described embodiment method is to come the hardware that instruction is relevant to complete by program, described program can be stored in a computer read/write memory medium, this program is when carrying out, each step that comprises above-described embodiment method, and described storage medium can be: ROM/RAM, magnetic disc, CD, storage card etc.Therefore, with said method accordingly, the present invention also discloses a kind of resource recommendation system based on user's potential demand simultaneously, comprising:
Pretreatment module, for utilizing text cluster and Topics Crawling algorithm to carry out cluster and subject extraction to resource;
Thesaurus module, for based on cluster result, calculates the descriptor under each theme, obtains the thesaurus in corresponding field;
Index module, for utilizing thesaurus to carry out automatic indexing to resource, calculates the descriptor that each independent resource comprises;
Computing module, in conjunction with user to the operation note of independent resource and user property, calculate the attention rate of user to certain theme; Set up user's request model and calculate the Topic Similarity between user; Utilize the relation between data in independent resource to calculate the technorati authority of appointed information to theme;
Recommending module, for according to user's request model discrimination resource, by the higher resource recommendation of matching degree to user.
The invention provides a kind of resource recommendation method and system based on user's potential demand, utilize user's potential Intelligence Request and the professional domain Close relation of oneself, by excavating the potential Intelligence Request of user based on professional domain, can be more accurately to user recommends and user's request matches resource of information.
Above-mentioned explanation illustrates and has described some preferred embodiments of the present invention, but as previously mentioned, be to be understood that the present invention is not limited to disclosed form herein, should not regard the eliminating to other embodiment as, and can be used for various other combinations, modification and environment, and can, in invention contemplated scope described herein, by technology or the knowledge of above-mentioned instruction or association area, change.And the change that those skilled in the art carry out and variation do not depart from the spirit and scope of the present invention, all should be in the protection domain of claims of the present invention.

Claims (9)

1. the resource recommendation method based on user's potential demand, is characterized in that, described method comprises step:
S1, utilizes text cluster and Topics Crawling algorithm to carry out cluster and subject extraction to resource;
S2, based on cluster result, calculates the descriptor under each theme, obtains the thesaurus in corresponding field;
S3, utilizes thesaurus to carry out automatic indexing to resource, calculates the descriptor that each independent resource comprises;
S4, to the operation note of independent resource and user property, calculates the attention rate of user to certain theme in conjunction with user; Set up user's request model and calculate the Topic Similarity between user; Utilize the relation between data in independent resource to calculate the technorati authority of appointed information to theme;
S5, according to user's request model discrimination resource, by the higher resource recommendation of matching degree to user.
2. method according to claim 1, is characterized in that, in described step S1, adopts improved stratification subject extraction model hLDA to carry out described cluster and subject extraction.
3. method according to claim 1, is characterized in that, in described step S4, the Topic Similarity calculation procedure between user u and v is:
Model user u and v demand model M separately uand M v; Remember M simultaneously uand M vtheme set is separately with
According to M uand M vin the theme that comprises set up theme set
Figure FSA0000097293360000013
n is M uand M veach self-contained theme number sum;
Figure FSA0000097293360000014
calculate respectively the T that user u and v are right iattention rate S (u, T i) and S (v, T i);
At theme space { T 1, T 2..., T non set up respectively theme attention rate vector U and V:U={S (u, the T of user u, v 1), S (u, T 2) ..., S (u, T n) and V={S (v, T 1), S (v, T 2) ..., S (v, T n); The cosine value of the angle of compute vector U and V is as the Topic Similarity between u and v.
4. method according to claim 1, is characterized in that, in described step S5, according to user's request model M uscreening resource comprises step:
For M ueach theme comprising, by the standard descriptor under this theme and accordingly auxiliary word put into vocabulary Dic; After all themes are disposed, vocabulary Dic has comprised model M uin all standard descriptor and auxiliary words;
For M ueach the theme T comprising, obtains all documents of comprising this theme, and these documents are put into set Docs; After all themes are disposed, set Docs is all M that comprised uin the collection of document of at least one theme;
Each document in pair set Docs, the word occurrence number sum TF in the document in statistics vocabulary Dic dic; In set Docs after all Document Statistices, according to the TF of each document dicsort, several the most forward documents are recommended to user.
5. according to the method described in claim 3 or 4, it is characterized in that, for user u, its user's request model M ube expressed as: M u=(A u, T u), A wherein uuser u community set, A={a 1, a 2..., a n, attribute a ithe attribute being associated with demand, T uthe set of the theme paid close attention to of user u, T ube represented as the theme T that user u pays close attention to iset, i=1,2 ..., n.
6. method according to claim 1, is characterized in that, in described step S2, utilizes mutual information to carry out descriptor calculating:
Calculate after the mutual information of each candidate key words and corresponding theme, according to the value of mutual information is descending, sort; Finally get front several candidate word of mutual information value maximum as the descriptor of this theme.
7. according to the method described in claim 1 or 6, it is characterized in that, in described step S2, after calculating descriptor:
Also adopt the way of manual intervention to examine the descriptor of calculating, the descriptor that audit is passed through enters standard thesaurus;
Meanwhile, utilize the hierarchical relationship between descriptor to set up the upper and lower relation between descriptor in standard thesaurus;
And utilize HowNet as synonymicon, calculate the synonym of each descriptor in standard thesaurus.
8. method according to claim 4, is characterized in that, in described step S5, also according to the Topic Similarity between described user, utilizes the user's that similarity is the highest demand model to carry out similar recommendation to targeted customer; And/or
According to described appointed information, the technorati authority of theme being carried out to authority to user recommends.
9. the resource recommendation system based on user's potential demand, is characterized in that, described system comprises:
Pretreatment module, for utilizing text cluster and Topics Crawling algorithm to carry out cluster and subject extraction to resource;
Thesaurus module, for based on cluster result, calculates the descriptor under each theme, obtains the thesaurus in corresponding field;
Index module, for utilizing thesaurus to carry out automatic indexing to resource, calculates the descriptor that each independent resource comprises;
Computing module, in conjunction with user to the operation note of independent resource and user property, calculate the attention rate of user to certain theme; Set up user's request model and calculate the Topic Similarity between user; Utilize the relation between data in independent resource to calculate the technorati authority of appointed information to theme;
Recommending module, for according to user's request model discrimination resource, by the higher resource recommendation of matching degree to user.
CN201310549102.4A 2013-11-08 2013-11-08 Resource recommendation method and system based on potential demands of users Active CN103577579B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310549102.4A CN103577579B (en) 2013-11-08 2013-11-08 Resource recommendation method and system based on potential demands of users

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310549102.4A CN103577579B (en) 2013-11-08 2013-11-08 Resource recommendation method and system based on potential demands of users

Publications (2)

Publication Number Publication Date
CN103577579A true CN103577579A (en) 2014-02-12
CN103577579B CN103577579B (en) 2015-01-21

Family

ID=50049355

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310549102.4A Active CN103577579B (en) 2013-11-08 2013-11-08 Resource recommendation method and system based on potential demands of users

Country Status (1)

Country Link
CN (1) CN103577579B (en)

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105022760A (en) * 2014-04-30 2015-11-04 深圳市腾讯计算机系统有限公司 News recommendation method and device
CN105023214A (en) * 2015-07-17 2015-11-04 蓝舰信息科技南京有限公司 Title knowledge point intelligent recommending method
CN105095202A (en) * 2014-04-17 2015-11-25 华为技术有限公司 Method and device for message recommendation
CN105095279A (en) * 2014-05-13 2015-11-25 深圳市腾讯计算机系统有限公司 File recommendation method and apparatus
CN105139211A (en) * 2014-12-19 2015-12-09 Tcl集团股份有限公司 Product brief introduction generating method and system
CN105138671A (en) * 2015-09-07 2015-12-09 百度在线网络技术(北京)有限公司 Human-computer interaction guiding method and device based on artificial intelligence
CN105740444A (en) * 2016-02-02 2016-07-06 桂林电子科技大学 User score-based project recommendation method
CN105912580A (en) * 2016-03-31 2016-08-31 比美特医护在线(北京)科技有限公司 Information acquisition method and device and information-pushing method and device
CN106201465A (en) * 2016-06-23 2016-12-07 扬州大学 Software project personalized recommendation method towards open source community
CN107180028A (en) * 2016-03-09 2017-09-19 广州网律互联网科技有限公司 A kind of recommended technology combined based on LDA with annealing algorithm
CN107577690A (en) * 2017-05-17 2018-01-12 中广核工程有限公司 The recommendation method and recommendation apparatus of magnanimity information data
CN108287904A (en) * 2018-05-09 2018-07-17 重庆邮电大学 A kind of document context perception recommendation method decomposed based on socialization convolution matrix
CN108875071A (en) * 2018-07-05 2018-11-23 中北大学 A kind of education resource recommended method based on multi-angle of view interest
CN109255073A (en) * 2018-08-28 2019-01-22 麒麟合盛网络技术股份有限公司 A kind of personalized recommendation method, device and electronic equipment
CN109492092A (en) * 2018-09-29 2019-03-19 北明智通(北京)科技有限公司 Document classification method and system based on LDA topic model
CN109636639A (en) * 2018-12-13 2019-04-16 平安医疗健康管理股份有限公司 Medication detection method, device, equipment and storage medium based on big data analysis
CN110263140A (en) * 2019-06-20 2019-09-20 北京百度网讯科技有限公司 A kind of method for digging of descriptor, device, electronic equipment and storage medium
CN110533253A (en) * 2019-09-04 2019-12-03 安徽大学 A kind of scientific research cooperative Relationship Prediction method based on Heterogeneous Information network
CN110674320A (en) * 2019-09-27 2020-01-10 百度在线网络技术(北京)有限公司 Retrieval method and device and electronic equipment
CN111310058A (en) * 2020-03-27 2020-06-19 北京百度网讯科技有限公司 Information theme recommendation method and device, terminal and storage medium
CN111813918A (en) * 2020-06-18 2020-10-23 国网上海市电力公司 Scientific and technological resource recommendation processing method and device
CN113032671A (en) * 2021-03-17 2021-06-25 北京百度网讯科技有限公司 Content processing method, content processing device, electronic equipment and storage medium
CN113034231A (en) * 2021-03-23 2021-06-25 深圳装速配科技有限公司 Multi-supply-chain commodity intelligent recommendation system and method based on SaaS cloud service
CN113191123A (en) * 2021-04-08 2021-07-30 中广核工程有限公司 Indexing method and device for engineering design archive information and computer equipment
CN113220917A (en) * 2020-02-06 2021-08-06 阿里巴巴集团控股有限公司 Background map recommendation method, device and storage medium
CN113434706A (en) * 2020-03-23 2021-09-24 北京国双科技有限公司 Academic collaboration relation analysis method and device
CN113590773A (en) * 2021-06-10 2021-11-02 中国铁道科学研究院集团有限公司科学技术信息研究所 Text theme indexing method, device and equipment and readable storage medium
CN113902526A (en) * 2021-10-19 2022-01-07 平安科技(深圳)有限公司 Artificial intelligence based product recommendation method and device, computer equipment and medium
CN115658851A (en) * 2022-12-27 2023-01-31 药融云数字科技(成都)有限公司 Medical literature retrieval method, system, storage medium and terminal based on theme
CN116662556A (en) * 2023-08-02 2023-08-29 天河超级计算淮海分中心 Text data processing method integrating user attributes
CN117436679A (en) * 2023-12-21 2024-01-23 四川物通科技有限公司 Meta-universe resource matching method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1383328A (en) * 2001-04-23 2002-12-04 日本电气株式会社 Method and system for recommending program
CN101751448A (en) * 2009-07-22 2010-06-23 中国科学院自动化研究所 Commendation method of personalized resource information based on scene information
CN102646122A (en) * 2012-02-21 2012-08-22 北京航空航天大学 Automatic building method of academic social network
CN102929928A (en) * 2012-09-21 2013-02-13 北京格致璞科技有限公司 Multidimensional-similarity-based personalized news recommendation method
CN103049575A (en) * 2013-01-05 2013-04-17 华中科技大学 Topic-adaptive academic conference searching system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1383328A (en) * 2001-04-23 2002-12-04 日本电气株式会社 Method and system for recommending program
CN101751448A (en) * 2009-07-22 2010-06-23 中国科学院自动化研究所 Commendation method of personalized resource information based on scene information
CN102646122A (en) * 2012-02-21 2012-08-22 北京航空航天大学 Automatic building method of academic social network
CN102929928A (en) * 2012-09-21 2013-02-13 北京格致璞科技有限公司 Multidimensional-similarity-based personalized news recommendation method
CN103049575A (en) * 2013-01-05 2013-04-17 华中科技大学 Topic-adaptive academic conference searching system

Cited By (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095202A (en) * 2014-04-17 2015-11-25 华为技术有限公司 Method and device for message recommendation
CN105095202B (en) * 2014-04-17 2018-10-30 华为技术有限公司 Message recommends method and device
US10891553B2 (en) 2014-04-17 2021-01-12 Huawei Technologies Co., Ltd. Method and apparatus for recommending message
CN105022760A (en) * 2014-04-30 2015-11-04 深圳市腾讯计算机系统有限公司 News recommendation method and device
CN105022760B (en) * 2014-04-30 2019-06-25 深圳市腾讯计算机系统有限公司 A kind of news recommended method and device
CN105095279A (en) * 2014-05-13 2015-11-25 深圳市腾讯计算机系统有限公司 File recommendation method and apparatus
CN105139211A (en) * 2014-12-19 2015-12-09 Tcl集团股份有限公司 Product brief introduction generating method and system
CN105139211B (en) * 2014-12-19 2021-06-22 Tcl科技集团股份有限公司 Product brief introduction generation method and system
CN105023214A (en) * 2015-07-17 2015-11-04 蓝舰信息科技南京有限公司 Title knowledge point intelligent recommending method
CN105023214B (en) * 2015-07-17 2019-03-26 蓝舰信息科技南京有限公司 A kind of topic knowledge point intelligent recommendation method
CN105138671A (en) * 2015-09-07 2015-12-09 百度在线网络技术(北京)有限公司 Human-computer interaction guiding method and device based on artificial intelligence
CN105740444A (en) * 2016-02-02 2016-07-06 桂林电子科技大学 User score-based project recommendation method
CN107180028A (en) * 2016-03-09 2017-09-19 广州网律互联网科技有限公司 A kind of recommended technology combined based on LDA with annealing algorithm
CN105912580A (en) * 2016-03-31 2016-08-31 比美特医护在线(北京)科技有限公司 Information acquisition method and device and information-pushing method and device
CN106201465B (en) * 2016-06-23 2020-08-21 扬州大学 Software project personalized recommendation method for open source community
CN106201465A (en) * 2016-06-23 2016-12-07 扬州大学 Software project personalized recommendation method towards open source community
CN107577690A (en) * 2017-05-17 2018-01-12 中广核工程有限公司 The recommendation method and recommendation apparatus of magnanimity information data
CN107577690B (en) * 2017-05-17 2021-01-05 中广核工程有限公司 Recommendation method and recommendation device for mass information data
CN108287904A (en) * 2018-05-09 2018-07-17 重庆邮电大学 A kind of document context perception recommendation method decomposed based on socialization convolution matrix
CN108875071A (en) * 2018-07-05 2018-11-23 中北大学 A kind of education resource recommended method based on multi-angle of view interest
CN108875071B (en) * 2018-07-05 2021-03-19 中北大学 Learning resource recommendation method based on multi-view interest
CN109255073B (en) * 2018-08-28 2022-03-29 麒麟合盛网络技术股份有限公司 Personalized recommendation method and device and electronic equipment
CN109255073A (en) * 2018-08-28 2019-01-22 麒麟合盛网络技术股份有限公司 A kind of personalized recommendation method, device and electronic equipment
CN109492092A (en) * 2018-09-29 2019-03-19 北明智通(北京)科技有限公司 Document classification method and system based on LDA topic model
CN109492092B (en) * 2018-09-29 2020-07-17 北京智通云联科技有限公司 Document classification method and system based on L DA topic model
CN109636639B (en) * 2018-12-13 2023-02-03 深圳平安医疗健康科技服务有限公司 Big data analysis-based medication detection method, device, equipment and storage medium
CN109636639A (en) * 2018-12-13 2019-04-16 平安医疗健康管理股份有限公司 Medication detection method, device, equipment and storage medium based on big data analysis
CN110263140A (en) * 2019-06-20 2019-09-20 北京百度网讯科技有限公司 A kind of method for digging of descriptor, device, electronic equipment and storage medium
CN110263140B (en) * 2019-06-20 2021-06-25 北京百度网讯科技有限公司 Method and device for mining subject term, electronic equipment and storage medium
CN110533253A (en) * 2019-09-04 2019-12-03 安徽大学 A kind of scientific research cooperative Relationship Prediction method based on Heterogeneous Information network
CN110674320A (en) * 2019-09-27 2020-01-10 百度在线网络技术(北京)有限公司 Retrieval method and device and electronic equipment
CN113220917A (en) * 2020-02-06 2021-08-06 阿里巴巴集团控股有限公司 Background map recommendation method, device and storage medium
CN113434706A (en) * 2020-03-23 2021-09-24 北京国双科技有限公司 Academic collaboration relation analysis method and device
CN111310058A (en) * 2020-03-27 2020-06-19 北京百度网讯科技有限公司 Information theme recommendation method and device, terminal and storage medium
CN111310058B (en) * 2020-03-27 2023-08-08 北京百度网讯科技有限公司 Information theme recommendation method, device, terminal and storage medium
CN111813918A (en) * 2020-06-18 2020-10-23 国网上海市电力公司 Scientific and technological resource recommendation processing method and device
CN113032671A (en) * 2021-03-17 2021-06-25 北京百度网讯科技有限公司 Content processing method, content processing device, electronic equipment and storage medium
CN113032671B (en) * 2021-03-17 2024-02-23 北京百度网讯科技有限公司 Content processing method, device, electronic equipment and storage medium
CN113034231A (en) * 2021-03-23 2021-06-25 深圳装速配科技有限公司 Multi-supply-chain commodity intelligent recommendation system and method based on SaaS cloud service
CN113034231B (en) * 2021-03-23 2024-04-05 王韶萍 Multi-supply chain commodity intelligent recommendation system and method based on SaaS cloud service
CN113191123A (en) * 2021-04-08 2021-07-30 中广核工程有限公司 Indexing method and device for engineering design archive information and computer equipment
CN113590773A (en) * 2021-06-10 2021-11-02 中国铁道科学研究院集团有限公司科学技术信息研究所 Text theme indexing method, device and equipment and readable storage medium
CN113902526A (en) * 2021-10-19 2022-01-07 平安科技(深圳)有限公司 Artificial intelligence based product recommendation method and device, computer equipment and medium
CN115658851A (en) * 2022-12-27 2023-01-31 药融云数字科技(成都)有限公司 Medical literature retrieval method, system, storage medium and terminal based on theme
CN115658851B (en) * 2022-12-27 2023-04-04 药融云数字科技(成都)有限公司 Medical literature retrieval method, system, storage medium and terminal based on theme
CN116662556A (en) * 2023-08-02 2023-08-29 天河超级计算淮海分中心 Text data processing method integrating user attributes
CN116662556B (en) * 2023-08-02 2023-10-20 天河超级计算淮海分中心 Text data processing method integrating user attributes
CN117436679A (en) * 2023-12-21 2024-01-23 四川物通科技有限公司 Meta-universe resource matching method and system
CN117436679B (en) * 2023-12-21 2024-03-26 四川物通科技有限公司 Meta-universe resource matching method and system

Also Published As

Publication number Publication date
CN103577579B (en) 2015-01-21

Similar Documents

Publication Publication Date Title
CN103577579B (en) Resource recommendation method and system based on potential demands of users
Gan et al. Research characteristics and status on social media in China: A bibliometric and co-word analysis
Elmeleegy et al. Mashup advisor: A recommendation tool for mashup development
CN110597981B (en) Network news summary system for automatically generating summary by adopting multiple strategies
Yan et al. Overlaying communities and topics: An analysis on publication networks
CN103425799A (en) Personalized research direction recommending system and method based on themes
CN103064945A (en) Situation searching method based on body
Clements et al. The influence of personalization on tag query length in social media search
CN102073641A (en) Method, device and program for processing consumer-generated media information
Shen et al. CKGG: A Chinese knowledge graph for high-school geography education and beyond
Schatten et al. An introduction to social semantic web mining & big data analytics for political attitudes and mentalities research
An et al. A heuristic approach on metadata recommendation for search engine optimization
Fei An LDA based model for semantic annotation of Web English educational resources
Haslhofer et al. Semantic tagging on historical maps
Park et al. Extracting search intentions from web search logs
Ashihara et al. Legal information as a complex network: Improving topic modeling through homophily
Lai et al. A prototype of the next-generation journal system for ITS: academic social networking and media based on web 3.0
Shaw et al. MetaBlog: a metadata driven semantics aware approach for blog tagging
ElGindy et al. Capturing place semantics on the geosocial web
Hu et al. A personalised search approach for web service recommendation
Ibrahim et al. A Scientometric Approach for Personalizing Research Paper Retrieval.
Vassilakis et al. Database knowledge enrichment utilizing trending topics from Twitter
Kedia et al. An intelligent algorithm for automatic candidate selection for web service composition
Qiao Online Education Resource Recommendation System of International Finance Course Based on Preference Data Collection
Pan et al. The knowledge map analysis of user profile research based on CiteSpace

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
C53 Correction of patent of invention or patent application
CB03 Change of inventor or designer information

Inventor after: Wang Qinghong

Inventor after: Li Peng

Inventor after: Zhou Yuzhong

Inventor after: Tao Xiujie

Inventor after: Gong Ting

Inventor after: Chen Chuanfu

Inventor after: Wang Ping

Inventor after: Wang Xiaoguang

Inventor after: Ran Congjing

Inventor before: Wang Qinghong

Inventor before: Li Peng

Inventor before: Zhou Yuzhong

Inventor before: Tao Xiujie

Inventor before: Gong Ting

Inventor before: Chen Chuanfu

Inventor before: Wang Ping

Inventor before: Wang Xiaoguang

Inventor before: Ran Congjing

C14 Grant of patent or utility model
GR01 Patent grant