CN109508421A - A kind of literature recommendation method based on term vector - Google Patents
A kind of literature recommendation method based on term vector Download PDFInfo
- Publication number
- CN109508421A CN109508421A CN201811415870.XA CN201811415870A CN109508421A CN 109508421 A CN109508421 A CN 109508421A CN 201811415870 A CN201811415870 A CN 201811415870A CN 109508421 A CN109508421 A CN 109508421A
- Authority
- CN
- China
- Prior art keywords
- user
- document
- interest
- vector
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The literature recommendation method based on term vector that the invention discloses a kind of, mainly utilize neural network language model, the feature vector of text is extracted from the literature reading sequence of user, the global interest for extracting user in subsequence is read from the complete reading sequence of user and recently again and reads context interest, the global interest and the current context interest read of user is finally comprehensively considered when recommending, the fact that so as to allow the document of recommendation to meet user demand and preference.
Description
Technical field
The literature recommendation method based on term vector that the present invention relates to a kind of.
Background technique
The literature recommendation of early stage mainly uses content-based recommendation algorithm, and the bottom for analyzing document by text label is special
Sign, and then the similar document of content is put into recommended article list.Such as ox shore proposes that the individual character based on MFCC and GMM recommends mould
Type extracts document text label feature, but the extraction of label data is very time-consuming, and document updates quickly now, all can daily
New ten hundreds of documents are released, so the literature recommendation for being based purely on text label feature is gradually eliminated.
Since Tapestry system is collaborative filtering, that is, rapid after solution information Overdosing problems use collaborative filtering
It is applied to the recommendation in other fields.Certain famous foreign literature platform is to be remembered user behavior using collaborative filtering
Recording playback enters in server, and it is similar " arest neighbors " to find out several interest preferences accordingly, finally likes arest neighbors, but target user
The literature recommendation not browsed is to target user.And in current research at home, Wang Jun etc. proposes level literature recommendation system
On the one hand system concept takes the literature recommendation of document preference similarity progress collaborative filtering between user, on the other hand, literature content
It is similar include multiple dimensions such as theme, emotion, the technique of writing, word, two aspects are connected, advantage both is given full play to, from
And it improves and recommends satisfaction.
It is different from the recommendation in other fields, what user may learn when reading document for personal interest or back work
Purposes, thus cause the difference of literature recommendation and the recommendation of mainstream electric business.In electric business, film are recommended, because user will pay
Bigger economy or time cost (such as to pay and buy article, spend two hours viewing films), user are more willing to actively to system
Feedback is made, collecting explicit scoring task can be relatively easy.And document product itself expends that cost is relatively low, and user can't be specially
Scoring behavior is carried out to it, system can only carry out user's language ambience information (reading behavior, the user's registration information of such as user)
Record.
Summary of the invention
Goal of the invention: the present invention is directed to the shortcomings that traditional literature recommender system, proposes one kind base in literature recommendation algorithm
In the method that user's reading list context is added in term vector.
Technical solution: the literature recommendation method based on term vector that the invention discloses a kind of, comprising the following steps:
The following steps are included:
Step 1, it is based on neural network language model, document is read to user and context carries out feature extraction;
Step 2, document sequence signature based on user calculate global interest vector that user reads and context interest to
Amount;
Step 3, founding mathematical models calculate user's similarity and document interest index, and the document for realizing that user reads pushes away
It recommends.
2, the method according to claim 1, wherein step 1 includes:
Step 1-1, the entire document for obtaining user read sequence, and the every record read in sequence includes document ID, reads
Read time, document source;
Step 1-2 reads sequence to the entire document of user and is grouped, obtain according to reading time and document source
Subsequence is arranged reading interval time threshold (such as 8 hours), is no more than the identical record of interval time threshold value and document source
It can assign to inside the same subsequence, and can be assigned to more than records interval time threshold value or that document source is different different
Inside subsequence;
Step 1-3, using in neural network language model Word2vec model (bibliography: Mikolov, T.,
Chen,K.,Corrado,G.,&Dean,J.(2013,January 17).Efficient Estimation of Word
Representations in VectorSpace.arXiv.org.) entire document of all users of processing reads sequence, it obtains
The feature vector of every document coarseness handles the subsequence of all users using Word2vec language model, obtains every text
Offer fine-grained feature vector, wherein there is similar feature vector with the similar document for reading context.
For the feature vector that step 1-3 is obtained, the dimension of feature vector is adjusted according to the demand to efficiency and accuracy,
Then increase feature vector dimension if necessary to more accurate recommendation results, if necessary to higher computational efficiency then reduce feature to
Measure dimension (being arranged according to the actual situation).
Step 2 includes:
The entire document of user is read all document coarseness feature vectors in sequence and is averaged, obtained by step 2-1
To the global interest vector of user;
Step 2-2 is averaged all document fine granularity feature vectors in user literature reading subsequence, is used
The reading context interest vector at family.
Step 3 includes:
Step 3-1 reads the similarity between sequence calculating user according to the global interest vector and entire document of user;
Step 3-2 calculates target user to the interest number of document;
Step 3-3 is ranked up all documents using the calculated result of step 3-1 and step 3-2, top n result
Recommend target user.
Step 3-1 includes: the phase read between sequence calculating user according to the global interest vector and entire document of user
Like degree, its calculation formula is:
Wherein, μ indicate target user, ν indicate database in another user, sim (μ, ν) indicate target user μ and
The similarity of another user ν, M in databaseμIndicate the literature collection that user μ is read, MνIndicate what user ν was read
Literature collection, m indicate MμAnd MνA document in intersection,Indicate that the cosine of user's overall situation interest vector is similar
Degree, λ and θ are weight coefficient (being greater than 0).
Step 3-2 includes: the interest for calculating target user to document, calculation formula are as follows:
Wherein, pi(μ, m) indicates interest of the target user μ to document m, Uμ,kIt indicates and target user μ most like k
The set of user, UmIndicate that the user for reading document m gathers,Indicate the reading context interest vector of user μ,
Indicate the fine granularity feature vector of document m,It indicatesWithCosine similarity, ω andIt is weight system
Number (is greater than 0).
Thought of the present invention for the first time based on term vector Word2vec, using neural network language model Skip-gram from
The different grain size feature that document is obtained in the complete reading sequence at family and subsequence is expressed as coarseness feature vector and thin
Grain size characteristic vector extracts difficult problem for document feature and provides a kind of reliable solution.According to the complete of user
It reads sequence and reads the document feature vector in subsequence recently, obtain the global interest of user and read context interest,
The extraction of context, which is read, for user and models difficult problem provides a kind of feasible thinking.Proposing one kind can integrate
Consider user's overall situation interest and read the recommended method of context interest, enables to the document recommended more to meet user current
Preference, to reduce the searching cost of user and improve the satisfaction of user.
Detailed description of the invention
The present invention is done with reference to the accompanying drawings and detailed description and is further illustrated, it is of the invention above-mentioned or
Otherwise advantage will become apparent.
Fig. 1 is the recommender system configuration diagram of the literature recommendation method of word-based vector model of the invention.
Fig. 2 is the pre- flow gauge signal of user's document preference of the literature recommendation method of word-based vector model of the invention
Figure.
Specific embodiment
The present invention will be further described with reference to the accompanying drawings and embodiments.
As depicted in figs. 1 and 2, the present invention specifically includes the following steps:
Step 1,25000 entire documents for obtaining 1000 users read sequence history record, read every in sequence
Item record includes document ID, reading time, document source;
Step 2, according to reading time and document source, it is 8 hours that reading interval time threshold, which is arranged, to the complete of user
Literature reading sequence is grouped, and obtains 3300 subsequences, reading time interval is shorter than 8 hours and the identical note in document source
Record can be assigned to inside the same subsequence, and the record that reading time interval is longer than 8 hours or document source is different can be assigned to
Inside different subsequences;
Step 3, it is read using the entire document of all users of Word2vec model treatment in neural network language model
Sequence obtains the feature vector of every document coarseness, and the subsequence of all users is handled using Word2vec language model, is obtained
To the fine-grained feature vector of every document, wherein there is similar feature vector with the similar document for reading context;
Step 4, according to the demand to efficiency and accuracy, say that the dimension of feature vector is converted into 16 dimensions.
Step 5, the entire document of user is read all document coarseness feature vectors in sequence to be averaged, is obtained
The global interest vector of user;
Step 6, all document fine granularity feature vectors in the nearest literature reading subsequence of user are averaged, are obtained
To the reading context interest vector of user.
Step 7, the similarity between sequence calculating user is read according to the global interest vector and entire document of user,
Calculation formula are as follows:
Wherein, μ indicates that target user, ν indicate another user in database, MμIndicate the document that user μ is read
Set, MνIndicate the literature collection that user ν is read,Indicate the cosine similarity of user's overall situation interest vector, λ
It is weight coefficient with θ, value is 1 herein;
Step 8, interest of the target user to document, calculation formula are calculated are as follows:
Wherein, μ indicates target user, Uμ,kIndicate the set with k μ most like user, UmDocument m was read in expression
User set,Indicate the reading context interest vector of user μ,Indicate the fine granularity feature vector of document m,Indicate both cosine similarity, ω andIt is weight coefficient, value is 1 herein;
Step 9, for target user, using the calculated result in step 8 all documents are carried out with the row of user interest degree
The maximum preceding 3-5 result of interest-degree is recommended target user in the reading process of user by sequence, realizes that target user reads
Extension in experience is read and the function of personalized recommendation.
The literature recommendation method based on term vector that the present invention provides a kind of, implements method and the way of the technical solution
There are many diameter, the above is only a preferred embodiment of the present invention, it is noted that for those skilled in the art
For, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications should also regard
For protection scope of the present invention.All undefined components in this embodiment can be implemented in the prior art.
Claims (7)
1. a kind of literature recommendation method based on term vector, which comprises the following steps:
Step 1, it is based on neural network language model, document is read to user and context carries out feature extraction;
Step 2, the document sequence signature based on user calculates the global interest vector and context interest vector that user reads;
Step 3, founding mathematical models calculate user's similarity and document interest index, realize the literature recommendation that user reads.
2. the method according to claim 1, wherein step 1 includes:
Step 1-1, the entire document for obtaining user read sequence, when reading every record in sequence including document ID, reading
Between, document source;
Step 1-2 reads sequence to the entire document of user and is grouped, obtain sub- sequence according to reading time and document source
Column are arranged reading interval time threshold, can assign to same height no more than the identical record of interval time threshold value and document source
It inside sequence, and is more than that records interval time threshold value or that document source is different can be assigned to inside different subsequences;
Step 1-3 reads sequence using the entire document of all users of Word2vec model treatment in neural network language model
Column, are obtained the feature vector of every document coarseness, the subsequence of all users are handled using Word2vec language model, is obtained
The fine-grained feature vector of every document, wherein there is similar feature vector with the similar document for reading context.
3. according to the method described in claim 2, it is characterized in that, for the feature vector that step 1-3 is obtained, according to efficiency
With the dimension of the demand adjustment feature vector of accuracy, then increase feature vector dimension if necessary to more accurate recommendation results,
Feature vector dimension is then reduced if necessary to higher computational efficiency.
4. according to the method described in claim 3, it is characterized in that, step 2 includes:
The entire document of user is read all document coarseness feature vectors in sequence and is averaged, used by step 2-1
The global interest vector at family;
Step 2-2 is averaged all document fine granularity feature vectors in user literature reading subsequence, obtains user's
Read context interest vector.
5. according to the method described in claim 4, it is characterized in that, step 3 includes:
Step 3-1 reads the similarity between sequence calculating user according to the global interest vector and entire document of user;
Step 3-2 calculates target user to the interest number of document;
Step 3-3 is ranked up all documents using the calculated result of step 3-1 and step 3-2, and top n result is recommended
To target user.
6. according to the method described in claim 5, it is characterized in that, step 3-1 include: according to the global interest vector of user and
Entire document reads the similarity between sequence calculating user, its calculation formula is:
Wherein, μ indicates that target user, ν indicate that another user in database, sim (μ, ν) indicate target user μ and data
The similarity of another user ν, M in libraryμIndicate the literature collection that user μ is read, MνIndicate the document that user ν is read
Set, m indicate MμAnd MνA document in intersection,Indicate the cosine similarity of user's overall situation interest vector, λ
It is weight coefficient with θ.
7. according to the method described in claim 6, it is characterized in that, step 3-2 includes: to calculate target user to the emerging of document
Interest, calculation formula are as follows:
Wherein, pi(μ, m) indicates interest of the target user μ to document m, Uμ,kIt indicates with k target user μ most like user's
Set, UmIndicate that the user for reading document m gathers,Indicate the reading context interest vector of user μ,Indicate text
The fine granularity feature vector of m is offered,It indicatesWithCosine similarity, ω and θ are weight coefficients.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811415870.XA CN109508421B (en) | 2018-11-26 | 2018-11-26 | Word vector-based document recommendation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811415870.XA CN109508421B (en) | 2018-11-26 | 2018-11-26 | Word vector-based document recommendation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109508421A true CN109508421A (en) | 2019-03-22 |
CN109508421B CN109508421B (en) | 2020-11-13 |
Family
ID=65750530
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811415870.XA Active CN109508421B (en) | 2018-11-26 | 2018-11-26 | Word vector-based document recommendation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109508421B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110929209A (en) * | 2019-12-06 | 2020-03-27 | 北京百度网讯科技有限公司 | Method and device for sending information |
CN114281961A (en) * | 2021-11-15 | 2022-04-05 | 北京智谱华章科技有限公司 | Scientific and technological literature interest assessment method and device based on biodynamics model |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102929928A (en) * | 2012-09-21 | 2013-02-13 | 北京格致璞科技有限公司 | Multidimensional-similarity-based personalized news recommendation method |
US20130325769A1 (en) * | 2008-12-12 | 2013-12-05 | Atigeo Llc | Providing recommendations using information determined for domains of interest |
CN105279288A (en) * | 2015-12-04 | 2016-01-27 | 深圳大学 | Online content recommending method based on deep neural network |
CN107357793A (en) * | 2016-05-10 | 2017-11-17 | 腾讯科技(深圳)有限公司 | Information recommendation method and device |
-
2018
- 2018-11-26 CN CN201811415870.XA patent/CN109508421B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130325769A1 (en) * | 2008-12-12 | 2013-12-05 | Atigeo Llc | Providing recommendations using information determined for domains of interest |
CN102929928A (en) * | 2012-09-21 | 2013-02-13 | 北京格致璞科技有限公司 | Multidimensional-similarity-based personalized news recommendation method |
CN105279288A (en) * | 2015-12-04 | 2016-01-27 | 深圳大学 | Online content recommending method based on deep neural network |
CN107357793A (en) * | 2016-05-10 | 2017-11-17 | 腾讯科技(深圳)有限公司 | Information recommendation method and device |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110929209A (en) * | 2019-12-06 | 2020-03-27 | 北京百度网讯科技有限公司 | Method and device for sending information |
CN110929209B (en) * | 2019-12-06 | 2023-06-20 | 北京百度网讯科技有限公司 | Method and device for transmitting information |
CN114281961A (en) * | 2021-11-15 | 2022-04-05 | 北京智谱华章科技有限公司 | Scientific and technological literature interest assessment method and device based on biodynamics model |
Also Published As
Publication number | Publication date |
---|---|
CN109508421B (en) | 2020-11-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107944913B (en) | High-potential user purchase intention prediction method based on big data user behavior analysis | |
CN109064285B (en) | Commodity recommendation sequence and commodity recommendation method | |
WO2021119119A1 (en) | System and method for a personalized search and discovery engine | |
CN107180093B (en) | Information searching method and device and timeliness query word identification method and device | |
CN109241203B (en) | Clustering method for user preference and distance weighting by fusing time factors | |
EP3603092A1 (en) | Using machine learning to recommend live-stream content | |
CN108805598B (en) | Similarity information determination method, server and computer-readable storage medium | |
CN104935963A (en) | Video recommendation method based on timing sequence data mining | |
KR101755409B1 (en) | Contents recommendation system and contents recommendation method | |
JP2007122683A (en) | Information processing device, information processing method and program | |
CN109472286A (en) | Books in University Library recommended method based on interest-degree model Yu the type factor | |
CN111310038B (en) | Information recommendation method and device, electronic equipment and computer-readable storage medium | |
JP2008117222A (en) | Information processor, information processing method, and program | |
TW201104466A (en) | Digital data processing method for personalized information retrieval and computer readable storage medium and information retrieval system thereof | |
CN110377840A (en) | A kind of music list recommended method and system based on user's shot and long term preference | |
CN106599047B (en) | Information pushing method and device | |
JP6767342B2 (en) | Search device, search method and search program | |
CN107885852A (en) | A kind of APP based on APP usage records recommends method and system | |
KR101660463B1 (en) | Contents recommendation system and contents recommendation method | |
JP5481295B2 (en) | Object recommendation device, object recommendation method, object recommendation program, and object recommendation system | |
CN109508421A (en) | A kind of literature recommendation method based on term vector | |
JP6928044B2 (en) | Providing equipment, providing method and providing program | |
CN117056575B (en) | Method for data acquisition based on intelligent book recommendation system | |
CN114581165A (en) | Product recommendation method, device, computer storage medium and system | |
KR20110043369A (en) | Association analysis method for music recommendation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: No.1 Lingshan South Road, Qixia District, Nanjing, Jiangsu Province, 210000 Applicant after: THE 28TH RESEARCH INSTITUTE OF CHINA ELECTRONICS TECHNOLOGY Group Corp. Address before: 210007 No. 1 East Street, alfalfa garden, Jiangsu, Nanjing Applicant before: THE 28TH RESEARCH INSTITUTE OF CHINA ELECTRONICS TECHNOLOGY Group Corp. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |