CN109508421A - A kind of literature recommendation method based on term vector - Google Patents

A kind of literature recommendation method based on term vector Download PDF

Info

Publication number
CN109508421A
CN109508421A CN201811415870.XA CN201811415870A CN109508421A CN 109508421 A CN109508421 A CN 109508421A CN 201811415870 A CN201811415870 A CN 201811415870A CN 109508421 A CN109508421 A CN 109508421A
Authority
CN
China
Prior art keywords
user
document
interest
vector
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811415870.XA
Other languages
Chinese (zh)
Other versions
CN109508421B (en
Inventor
后弘毅
杨权
梁栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 28 Research Institute
Original Assignee
CETC 28 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 28 Research Institute filed Critical CETC 28 Research Institute
Priority to CN201811415870.XA priority Critical patent/CN109508421B/en
Publication of CN109508421A publication Critical patent/CN109508421A/en
Application granted granted Critical
Publication of CN109508421B publication Critical patent/CN109508421B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The literature recommendation method based on term vector that the invention discloses a kind of, mainly utilize neural network language model, the feature vector of text is extracted from the literature reading sequence of user, the global interest for extracting user in subsequence is read from the complete reading sequence of user and recently again and reads context interest, the global interest and the current context interest read of user is finally comprehensively considered when recommending, the fact that so as to allow the document of recommendation to meet user demand and preference.

Description

A kind of literature recommendation method based on term vector
Technical field
The literature recommendation method based on term vector that the present invention relates to a kind of.
Background technique
The literature recommendation of early stage mainly uses content-based recommendation algorithm, and the bottom for analyzing document by text label is special Sign, and then the similar document of content is put into recommended article list.Such as ox shore proposes that the individual character based on MFCC and GMM recommends mould Type extracts document text label feature, but the extraction of label data is very time-consuming, and document updates quickly now, all can daily New ten hundreds of documents are released, so the literature recommendation for being based purely on text label feature is gradually eliminated.
Since Tapestry system is collaborative filtering, that is, rapid after solution information Overdosing problems use collaborative filtering It is applied to the recommendation in other fields.Certain famous foreign literature platform is to be remembered user behavior using collaborative filtering Recording playback enters in server, and it is similar " arest neighbors " to find out several interest preferences accordingly, finally likes arest neighbors, but target user The literature recommendation not browsed is to target user.And in current research at home, Wang Jun etc. proposes level literature recommendation system On the one hand system concept takes the literature recommendation of document preference similarity progress collaborative filtering between user, on the other hand, literature content It is similar include multiple dimensions such as theme, emotion, the technique of writing, word, two aspects are connected, advantage both is given full play to, from And it improves and recommends satisfaction.
It is different from the recommendation in other fields, what user may learn when reading document for personal interest or back work Purposes, thus cause the difference of literature recommendation and the recommendation of mainstream electric business.In electric business, film are recommended, because user will pay Bigger economy or time cost (such as to pay and buy article, spend two hours viewing films), user are more willing to actively to system Feedback is made, collecting explicit scoring task can be relatively easy.And document product itself expends that cost is relatively low, and user can't be specially Scoring behavior is carried out to it, system can only carry out user's language ambience information (reading behavior, the user's registration information of such as user) Record.
Summary of the invention
Goal of the invention: the present invention is directed to the shortcomings that traditional literature recommender system, proposes one kind base in literature recommendation algorithm In the method that user's reading list context is added in term vector.
Technical solution: the literature recommendation method based on term vector that the invention discloses a kind of, comprising the following steps:
The following steps are included:
Step 1, it is based on neural network language model, document is read to user and context carries out feature extraction;
Step 2, document sequence signature based on user calculate global interest vector that user reads and context interest to Amount;
Step 3, founding mathematical models calculate user's similarity and document interest index, and the document for realizing that user reads pushes away It recommends.
2, the method according to claim 1, wherein step 1 includes:
Step 1-1, the entire document for obtaining user read sequence, and the every record read in sequence includes document ID, reads Read time, document source;
Step 1-2 reads sequence to the entire document of user and is grouped, obtain according to reading time and document source Subsequence is arranged reading interval time threshold (such as 8 hours), is no more than the identical record of interval time threshold value and document source It can assign to inside the same subsequence, and can be assigned to more than records interval time threshold value or that document source is different different Inside subsequence;
Step 1-3, using in neural network language model Word2vec model (bibliography: Mikolov, T., Chen,K.,Corrado,G.,&Dean,J.(2013,January 17).Efficient Estimation of Word Representations in VectorSpace.arXiv.org.) entire document of all users of processing reads sequence, it obtains The feature vector of every document coarseness handles the subsequence of all users using Word2vec language model, obtains every text Offer fine-grained feature vector, wherein there is similar feature vector with the similar document for reading context.
For the feature vector that step 1-3 is obtained, the dimension of feature vector is adjusted according to the demand to efficiency and accuracy, Then increase feature vector dimension if necessary to more accurate recommendation results, if necessary to higher computational efficiency then reduce feature to Measure dimension (being arranged according to the actual situation).
Step 2 includes:
The entire document of user is read all document coarseness feature vectors in sequence and is averaged, obtained by step 2-1 To the global interest vector of user;
Step 2-2 is averaged all document fine granularity feature vectors in user literature reading subsequence, is used The reading context interest vector at family.
Step 3 includes:
Step 3-1 reads the similarity between sequence calculating user according to the global interest vector and entire document of user;
Step 3-2 calculates target user to the interest number of document;
Step 3-3 is ranked up all documents using the calculated result of step 3-1 and step 3-2, top n result Recommend target user.
Step 3-1 includes: the phase read between sequence calculating user according to the global interest vector and entire document of user Like degree, its calculation formula is:
Wherein, μ indicate target user, ν indicate database in another user, sim (μ, ν) indicate target user μ and The similarity of another user ν, M in databaseμIndicate the literature collection that user μ is read, MνIndicate what user ν was read Literature collection, m indicate MμAnd MνA document in intersection,Indicate that the cosine of user's overall situation interest vector is similar Degree, λ and θ are weight coefficient (being greater than 0).
Step 3-2 includes: the interest for calculating target user to document, calculation formula are as follows:
Wherein, pi(μ, m) indicates interest of the target user μ to document m, Uμ,kIt indicates and target user μ most like k The set of user, UmIndicate that the user for reading document m gathers,Indicate the reading context interest vector of user μ, Indicate the fine granularity feature vector of document m,It indicatesWithCosine similarity, ω andIt is weight system Number (is greater than 0).
Thought of the present invention for the first time based on term vector Word2vec, using neural network language model Skip-gram from The different grain size feature that document is obtained in the complete reading sequence at family and subsequence is expressed as coarseness feature vector and thin Grain size characteristic vector extracts difficult problem for document feature and provides a kind of reliable solution.According to the complete of user It reads sequence and reads the document feature vector in subsequence recently, obtain the global interest of user and read context interest, The extraction of context, which is read, for user and models difficult problem provides a kind of feasible thinking.Proposing one kind can integrate Consider user's overall situation interest and read the recommended method of context interest, enables to the document recommended more to meet user current Preference, to reduce the searching cost of user and improve the satisfaction of user.
Detailed description of the invention
The present invention is done with reference to the accompanying drawings and detailed description and is further illustrated, it is of the invention above-mentioned or Otherwise advantage will become apparent.
Fig. 1 is the recommender system configuration diagram of the literature recommendation method of word-based vector model of the invention.
Fig. 2 is the pre- flow gauge signal of user's document preference of the literature recommendation method of word-based vector model of the invention Figure.
Specific embodiment
The present invention will be further described with reference to the accompanying drawings and embodiments.
As depicted in figs. 1 and 2, the present invention specifically includes the following steps:
Step 1,25000 entire documents for obtaining 1000 users read sequence history record, read every in sequence Item record includes document ID, reading time, document source;
Step 2, according to reading time and document source, it is 8 hours that reading interval time threshold, which is arranged, to the complete of user Literature reading sequence is grouped, and obtains 3300 subsequences, reading time interval is shorter than 8 hours and the identical note in document source Record can be assigned to inside the same subsequence, and the record that reading time interval is longer than 8 hours or document source is different can be assigned to Inside different subsequences;
Step 3, it is read using the entire document of all users of Word2vec model treatment in neural network language model Sequence obtains the feature vector of every document coarseness, and the subsequence of all users is handled using Word2vec language model, is obtained To the fine-grained feature vector of every document, wherein there is similar feature vector with the similar document for reading context;
Step 4, according to the demand to efficiency and accuracy, say that the dimension of feature vector is converted into 16 dimensions.
Step 5, the entire document of user is read all document coarseness feature vectors in sequence to be averaged, is obtained The global interest vector of user;
Step 6, all document fine granularity feature vectors in the nearest literature reading subsequence of user are averaged, are obtained To the reading context interest vector of user.
Step 7, the similarity between sequence calculating user is read according to the global interest vector and entire document of user, Calculation formula are as follows:
Wherein, μ indicates that target user, ν indicate another user in database, MμIndicate the document that user μ is read Set, MνIndicate the literature collection that user ν is read,Indicate the cosine similarity of user's overall situation interest vector, λ It is weight coefficient with θ, value is 1 herein;
Step 8, interest of the target user to document, calculation formula are calculated are as follows:
Wherein, μ indicates target user, Uμ,kIndicate the set with k μ most like user, UmDocument m was read in expression User set,Indicate the reading context interest vector of user μ,Indicate the fine granularity feature vector of document m,Indicate both cosine similarity, ω andIt is weight coefficient, value is 1 herein;
Step 9, for target user, using the calculated result in step 8 all documents are carried out with the row of user interest degree The maximum preceding 3-5 result of interest-degree is recommended target user in the reading process of user by sequence, realizes that target user reads Extension in experience is read and the function of personalized recommendation.
The literature recommendation method based on term vector that the present invention provides a kind of, implements method and the way of the technical solution There are many diameter, the above is only a preferred embodiment of the present invention, it is noted that for those skilled in the art For, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications should also regard For protection scope of the present invention.All undefined components in this embodiment can be implemented in the prior art.

Claims (7)

1. a kind of literature recommendation method based on term vector, which comprises the following steps:
Step 1, it is based on neural network language model, document is read to user and context carries out feature extraction;
Step 2, the document sequence signature based on user calculates the global interest vector and context interest vector that user reads;
Step 3, founding mathematical models calculate user's similarity and document interest index, realize the literature recommendation that user reads.
2. the method according to claim 1, wherein step 1 includes:
Step 1-1, the entire document for obtaining user read sequence, when reading every record in sequence including document ID, reading Between, document source;
Step 1-2 reads sequence to the entire document of user and is grouped, obtain sub- sequence according to reading time and document source Column are arranged reading interval time threshold, can assign to same height no more than the identical record of interval time threshold value and document source It inside sequence, and is more than that records interval time threshold value or that document source is different can be assigned to inside different subsequences;
Step 1-3 reads sequence using the entire document of all users of Word2vec model treatment in neural network language model Column, are obtained the feature vector of every document coarseness, the subsequence of all users are handled using Word2vec language model, is obtained The fine-grained feature vector of every document, wherein there is similar feature vector with the similar document for reading context.
3. according to the method described in claim 2, it is characterized in that, for the feature vector that step 1-3 is obtained, according to efficiency With the dimension of the demand adjustment feature vector of accuracy, then increase feature vector dimension if necessary to more accurate recommendation results, Feature vector dimension is then reduced if necessary to higher computational efficiency.
4. according to the method described in claim 3, it is characterized in that, step 2 includes:
The entire document of user is read all document coarseness feature vectors in sequence and is averaged, used by step 2-1 The global interest vector at family;
Step 2-2 is averaged all document fine granularity feature vectors in user literature reading subsequence, obtains user's Read context interest vector.
5. according to the method described in claim 4, it is characterized in that, step 3 includes:
Step 3-1 reads the similarity between sequence calculating user according to the global interest vector and entire document of user;
Step 3-2 calculates target user to the interest number of document;
Step 3-3 is ranked up all documents using the calculated result of step 3-1 and step 3-2, and top n result is recommended To target user.
6. according to the method described in claim 5, it is characterized in that, step 3-1 include: according to the global interest vector of user and Entire document reads the similarity between sequence calculating user, its calculation formula is:
Wherein, μ indicates that target user, ν indicate that another user in database, sim (μ, ν) indicate target user μ and data The similarity of another user ν, M in libraryμIndicate the literature collection that user μ is read, MνIndicate the document that user ν is read Set, m indicate MμAnd MνA document in intersection,Indicate the cosine similarity of user's overall situation interest vector, λ It is weight coefficient with θ.
7. according to the method described in claim 6, it is characterized in that, step 3-2 includes: to calculate target user to the emerging of document Interest, calculation formula are as follows:
Wherein, pi(μ, m) indicates interest of the target user μ to document m, Uμ,kIt indicates with k target user μ most like user's Set, UmIndicate that the user for reading document m gathers,Indicate the reading context interest vector of user μ,Indicate text The fine granularity feature vector of m is offered,It indicatesWithCosine similarity, ω and θ are weight coefficients.
CN201811415870.XA 2018-11-26 2018-11-26 Word vector-based document recommendation method Active CN109508421B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811415870.XA CN109508421B (en) 2018-11-26 2018-11-26 Word vector-based document recommendation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811415870.XA CN109508421B (en) 2018-11-26 2018-11-26 Word vector-based document recommendation method

Publications (2)

Publication Number Publication Date
CN109508421A true CN109508421A (en) 2019-03-22
CN109508421B CN109508421B (en) 2020-11-13

Family

ID=65750530

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811415870.XA Active CN109508421B (en) 2018-11-26 2018-11-26 Word vector-based document recommendation method

Country Status (1)

Country Link
CN (1) CN109508421B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929209A (en) * 2019-12-06 2020-03-27 北京百度网讯科技有限公司 Method and device for sending information
CN114281961A (en) * 2021-11-15 2022-04-05 北京智谱华章科技有限公司 Scientific and technological literature interest assessment method and device based on biodynamics model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929928A (en) * 2012-09-21 2013-02-13 北京格致璞科技有限公司 Multidimensional-similarity-based personalized news recommendation method
US20130325769A1 (en) * 2008-12-12 2013-12-05 Atigeo Llc Providing recommendations using information determined for domains of interest
CN105279288A (en) * 2015-12-04 2016-01-27 深圳大学 Online content recommending method based on deep neural network
CN107357793A (en) * 2016-05-10 2017-11-17 腾讯科技(深圳)有限公司 Information recommendation method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130325769A1 (en) * 2008-12-12 2013-12-05 Atigeo Llc Providing recommendations using information determined for domains of interest
CN102929928A (en) * 2012-09-21 2013-02-13 北京格致璞科技有限公司 Multidimensional-similarity-based personalized news recommendation method
CN105279288A (en) * 2015-12-04 2016-01-27 深圳大学 Online content recommending method based on deep neural network
CN107357793A (en) * 2016-05-10 2017-11-17 腾讯科技(深圳)有限公司 Information recommendation method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929209A (en) * 2019-12-06 2020-03-27 北京百度网讯科技有限公司 Method and device for sending information
CN110929209B (en) * 2019-12-06 2023-06-20 北京百度网讯科技有限公司 Method and device for transmitting information
CN114281961A (en) * 2021-11-15 2022-04-05 北京智谱华章科技有限公司 Scientific and technological literature interest assessment method and device based on biodynamics model

Also Published As

Publication number Publication date
CN109508421B (en) 2020-11-13

Similar Documents

Publication Publication Date Title
CN107944913B (en) High-potential user purchase intention prediction method based on big data user behavior analysis
CN109064285B (en) Commodity recommendation sequence and commodity recommendation method
WO2021119119A1 (en) System and method for a personalized search and discovery engine
CN107180093B (en) Information searching method and device and timeliness query word identification method and device
CN109241203B (en) Clustering method for user preference and distance weighting by fusing time factors
EP3603092A1 (en) Using machine learning to recommend live-stream content
CN108805598B (en) Similarity information determination method, server and computer-readable storage medium
CN104935963A (en) Video recommendation method based on timing sequence data mining
KR101755409B1 (en) Contents recommendation system and contents recommendation method
JP2007122683A (en) Information processing device, information processing method and program
CN109472286A (en) Books in University Library recommended method based on interest-degree model Yu the type factor
CN111310038B (en) Information recommendation method and device, electronic equipment and computer-readable storage medium
JP2008117222A (en) Information processor, information processing method, and program
TW201104466A (en) Digital data processing method for personalized information retrieval and computer readable storage medium and information retrieval system thereof
CN110377840A (en) A kind of music list recommended method and system based on user's shot and long term preference
CN106599047B (en) Information pushing method and device
JP6767342B2 (en) Search device, search method and search program
CN107885852A (en) A kind of APP based on APP usage records recommends method and system
KR101660463B1 (en) Contents recommendation system and contents recommendation method
JP5481295B2 (en) Object recommendation device, object recommendation method, object recommendation program, and object recommendation system
CN109508421A (en) A kind of literature recommendation method based on term vector
JP6928044B2 (en) Providing equipment, providing method and providing program
CN117056575B (en) Method for data acquisition based on intelligent book recommendation system
CN114581165A (en) Product recommendation method, device, computer storage medium and system
KR20110043369A (en) Association analysis method for music recommendation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: No.1 Lingshan South Road, Qixia District, Nanjing, Jiangsu Province, 210000

Applicant after: THE 28TH RESEARCH INSTITUTE OF CHINA ELECTRONICS TECHNOLOGY Group Corp.

Address before: 210007 No. 1 East Street, alfalfa garden, Jiangsu, Nanjing

Applicant before: THE 28TH RESEARCH INSTITUTE OF CHINA ELECTRONICS TECHNOLOGY Group Corp.

GR01 Patent grant
GR01 Patent grant