CN106250545A - A kind of multimedia recommendation method and system searching for content based on user - Google Patents
A kind of multimedia recommendation method and system searching for content based on user Download PDFInfo
- Publication number
- CN106250545A CN106250545A CN201610653997.XA CN201610653997A CN106250545A CN 106250545 A CN106250545 A CN 106250545A CN 201610653997 A CN201610653997 A CN 201610653997A CN 106250545 A CN106250545 A CN 106250545A
- Authority
- CN
- China
- Prior art keywords
- user
- similarity
- multimedia
- query vector
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/43—Querying
- G06F16/435—Filtering based on additional data, e.g. user or group profiles
Abstract
The present invention discloses a kind of multimedia recommendation method and system searching for content based on user, mainly carry out composing power to the characteristic item of user's rating matrix according to the inquiry multimedia class matrix set up, user's rating matrix after being improved, obtains user's overall similarity then in conjunction with user's similarity based on user's query vector and user's Similarity measures based on user's rating matrix.The present invention is by extracting and quantifying the user contained in the user's search key record fancy grade information to specific multiple media types, introduce user's query vector collection, and as targeted customer to multimedia item purpose initial score, thus solving the cold start-up of tradition collaborative filtering and the problem that feature is sparse.Additionally user's similarity and user's similarity based on user's rating matrix of user's query vector are combined by the present invention, are calculated user's overall similarity, solve the problem that the recommendation precision of tradition collaborative filtering is not enough.
Description
Technical field
The present invention relates to commending contents field, particularly relate to a kind of based on user search for content multimedia recommendation method and
System.
Background technology
Personalized recommendation system is widely used in the recommendation of the commodity such as books, paper, music and film and content at present,
And the internal structure of personalized recommendation system also there occurs great variety.Along with the development in pluralism of content recommendation, original
Recommended technology is also fast-developing on the ground that grows with each passing hour, such as collaborative filtering, based on the recommended technology such as content information and transfer learning
Arise at the historic moment under such overall situation.
Personalized recommendation system can be divided into three types according to the difference of content recommendation:
First kind commending system is applied on e-commerce website, and it recommends to meet the commodity of self hobby to user, this
Class commending system is referred to as Personalized Recommendation System in E-commerce or electricity business's commending system, the large-scale electricity such as Amazon, Taobao and Jingdone district
Business website all uses this kind of commending system;
Equations of The Second Kind commending system is applied on the most popular social networks, and it recommends, to user, the social relations that it is relevant
Or may be interested in interest circle the relevant information such as user, group and news, as Facebook, Sina's microblogging and everybody
The social networks doors such as net all use this kind of commending system;
3rd class commending system is applied to vertically share in class portal website, and it recommends user possible interested to user
Classification and content, as the consumer vertical gates of multimedia such as Netflix, TIME dotCom and Semen Sojae Preparatum net use this kind of commending system per family.
Collaborative Filtering Recommendation Algorithm is broadly divided into collaborative filtering based on user (User-based) and based on project
(Item-based) collaborative filtering two kinds, both maximum differences are that the selection range of the neighbouring set of target is just the opposite.
Collaborative filtering based on user (User-based) is to obtain targeted customer by the similarity calculating between user
Proximal subscribers collection, and the scoring collected according to closest user predict targeted customer to the unknown purpose mark, then prediction
Higher project of marking feeds back to targeted customer as recommended project;Project-based collaborative filtering is then by calculating
Similarity between disparity items, then selects the project liked with targeted customer to have the project of higher similarity as in recommendation
Hold and feed back to targeted customer.
Although traditional Collaborative Filtering Recommendation System algorithm completes recommendation function to a certain extent, but there is also with
Lower problem:
1, during Similarity measures based on user's evaluating matrix, feature is sparse and cold start-up problem ratio is more prominent;
2, at present overwhelming majority personalized recommendation systems be primarily directed to user be explicitly entered data such as sex, age,
Address, hobby label etc. carry out the correlation computations of similarity, and do not consider that the implicit expression input data of user are to Similarity Measure
Impact;
3, collaborative filtering self does not possess adaptivity, and the null value in user's scoring item can not be by algorithm self
Adjust and realize weights and heavily compose, the problem such as cause recommending that the decline of precision even cannot be recommended.
Therefore, prior art has yet to be improved and developed.
Summary of the invention
In view of above-mentioned the deficiencies in the prior art, it is an object of the invention to provide a kind of many matchmakers searching for content based on user
Body recommends method and system, it is intended to solve the problem that existing recommendation method exists cold start-up, feature is sparse and precision is not enough.
Technical scheme is as follows:
A kind of multimedia recommendation method searching for content based on user, wherein, including:
Step A, the search key record collecting user and marking record, and generate user according to search key record
Query vector collection, and generate user's rating matrix according to marking record;
Step B, set up multimedia class label dictionary, according to multimedia class label dictionary, user's query vector collection is turned
It is changed to inquiry-multimedia class matrix, then according to described inquiry-multimedia class matrix characteristic item to user's rating matrix
Carry out composing power, the user's rating matrix after being improved;
Each user's query vector that step C, traverse user query vector are concentrated, is calculated based on user's query vector
User's similarity;And it is calculated user's similarity based on user's rating matrix according to the user's rating matrix after improving;
Step D, according to described user's similarity based on user's query vector and user based on user's rating matrix
Similarity measures obtains user's overall similarity, and if choosing and targeted customer's similarity maximum according to user's overall similarity
Dry individual user collects as the neighbour user of targeted customer;
Step E, the evaluation of the user's assessment item non-to targeted customer concentrated according to neighbour user, it was predicted that targeted customer couple
The evaluation of its non-assessment item, and sort from high to low according to scoring, select the forward some content of multimedia of ranking as pushing away
Recommend result and be pushed to targeted customer.
The described multimedia recommendation method searching for content based on user, wherein, described step B specifically includes:
B1, definition multimedia category;
B2, set up multimedia class label dictionary;
B3, according to multimedia class label dictionary, user's query vector collection is converted to inquiry-multimedia class matrix;
B4, calculate user according to the classified inquiry information in described inquiry-multimedia class matrix to different classes of many
The preference of media;
B5, according to described preference, the characteristic item of user's rating matrix carried out composes the user after power is improved and mark
Matrix.
The described multimedia recommendation method searching for content based on user, wherein, in described step C, be calculated based on
The step of user's similarity of user's query vector specifically includes:
Word element representation in C1, each user's query vector concentrated by user's query vector is 8 coding forms;
C2, travel through all word elements of each user's query vector, record each word element place branch node layer
Node total number and the synonym item number of its place paragraph row;
C3, according to described node total number and synonym item number, calculate the similarity between the senses of a dictionary entry of two word elements;
C4, similarity according to the described senses of a dictionary entry, calculate the similarity between corresponding word element;
C5, travel through whole user's query vectors, record TongYiCi CiLin and this synonym of each user's query vector
The number that in set of words, each synonym occurs, in conjunction with the similarity between word element, calculates any two user's query vector
Between similarity, and as user similarity Sim based on user's query vectors(U1,U2)。
The described multimedia recommendation method searching for content based on user, wherein, in described step C, marks based on user
User's similarity of matrix is calculated as follows:
Wherein, (r11,r12,r13,..,r1m) it is user U1Scoring vector, (r21,r22,r23,..,r2m) it is user U2's
Scoring vector,Represent user U respectively1、U2Average score value to multimedia projects.
The described multimedia recommendation method searching for content based on user, wherein, in described step D, user is the most similar
Property computing formula is as follows:
Simuser(U1, U2)=ω × SimE(U1, U2)+(1-ω)×Sims(U1, U2), ω is adjustment factor, and ω <
0.5。
The described multimedia recommendation method searching for content based on user, wherein, in described step E, targeted customer i is to item
The prediction scoring r of mesh jijCalculate according to below equation:
Wherein,Represent that multimedia projects are provided by targeted customer i
Scoring expected value, Simi,αRepresent the similarity between the user α that targeted customer i and neighbour user concentrate, rαjRepresent user α
Scoring to project j,Represent the scoring expected value that multimedia projects are provided by user α.
A kind of multimedia recommendation system searching for content based on user, wherein, including:
Initialization module, for collecting search key record and the marking record of user, and remembers according to search key
Record generates user's query vector collection, and generates user's rating matrix according to marking record;
Matrix improves module, is used for setting up multimedia class label dictionary, according to multimedia class label dictionary by user
Query vector collection is converted to inquiry-multimedia class matrix, then marks user according to described inquiry-multimedia class matrix
The characteristic item of matrix carries out composing power, the user's rating matrix after being improved;
Similarity computing module, each user's query vector concentrated for traverse user query vector, it is calculated base
User's similarity in user's query vector;And be calculated based on user's rating matrix according to the user's rating matrix after improving
User's similarity;
Neighbour user collection choose module, for according to described user's similarity based on user's query vector and based on
User's Similarity measures of family rating matrix obtains user's overall similarity, and chooses and target use according to user's overall similarity
Several users of family similarity maximum collect as the neighbour user of targeted customer;
Recommending module, the evaluation of the user's assessment item non-to targeted customer for concentrating according to neighbour user, it was predicted that mesh
Mark user's evaluation to its non-assessment item, and sort from high to low according to scoring, select in some multimedias that ranking is forward
Hold and be pushed to targeted customer as recommendation results.
The described multimedia recommendation system searching for content based on user, wherein, described matrix improves module and specifically includes:
Definition unit, is used for defining multimedia category;
Unit set up by tag along sort dictionary, is used for setting up multimedia class label dictionary;
Matrix conversion unit, for being converted to inquiry-many according to multimedia class label dictionary by user's query vector collection
Media categories matrix;
Preference computing unit, for calculating according to the classified inquiry information in described inquiry-multimedia class matrix
Go out user to different classes of multimedia preference;
Matrix improves unit, is changed for the characteristic item of user's rating matrix being carried out tax power according to described preference
User's rating matrix after entering.
The described multimedia recommendation system searching for content based on user, wherein, described similarity computing module is specifically wrapped
Include:
Coded representation unit, the word element representation in each user's query vector concentrated by user's query vector is
8 coding forms;
Record unit, for traveling through all word elements of each user's query vector, records each word element place
The node total number of branch node layer and the synonym item number of its place paragraph row;
Senses of a dictionary entry computing unit, for according to described node total number and synonym item number, calculating the senses of a dictionary entry of two word elements
Between similarity;
Word computing unit, for the similarity according to the described senses of a dictionary entry, calculates the similarity between corresponding word element;
Similarity calculation unit, for traveling through whole user's query vectors, records the synonym of each user's query vector
The number that in set of words and this TongYiCi CiLin, each synonym occurs, in conjunction with the similarity between word element, calculates arbitrarily
Similarity between two user's query vectors, and as user similarity Sim based on user's query vectors(U1,U2)。
The described multimedia recommendation system searching for content based on user, wherein, in described similarity computing module, based on
User's similarity of user's rating matrix is calculated as follows:
Wherein, (r11,r12,r13,..,r1m) it is user U1Scoring vector, (r21,r22,r23,..,r2m) it is user U2's
Scoring vector,Represent user U respectively1、U2Average score value to multimedia projects.
Beneficial effect: the user that the present invention passes through to extract and quantify to contain in user's search key record is to specific many matchmakers
The fancy grade information of body type, introduces user's query vector collection, and as targeted customer to multimedia item purpose initial score,
Thus solving the cold start-up of tradition collaborative filtering and the problem that feature is sparse.Additionally the present invention is by user's query vector
User's similarity and user's similarity based on user's rating matrix combine, be calculated user's overall similarity, solve
The problem that the recommendation precision of traditional collaborative filtering of having determined is not enough.
Accompanying drawing explanation
Fig. 1 is the flow chart of a kind of multimedia recommendation method preferred embodiment searching for content based on user of the present invention.
Fig. 2 is the example of user's rating matrix in the present invention.
Fig. 3 is the example that in the present invention, user searches for record.
Fig. 4 is the example of user's query vector collection in the present invention.
Fig. 5 is the selected parts example of a separated film label dictionary in the present invention.
Fig. 6 is the example of an inquiry-separated film matrix in the present invention.
Fig. 7 is the example in the present invention before user's rating matrix improvement.
Fig. 8 is the example in the present invention after user's rating matrix improvement.
Fig. 9 is the example of coding rule table in the present invention.
Figure 10 is the structured flowchart of a kind of multimedia recommendation system preferred embodiment searching for content based on user of the present invention.
Detailed description of the invention
The present invention provides a kind of multimedia recommendation method and system searching for content based on user, for making the mesh of the present invention
, technical scheme and effect clearer, clear and definite, the present invention is described in more detail below.Should be appreciated that described herein
Specific embodiment only in order to explain the present invention, be not intended to limit the present invention.
Referring to Fig. 1, Fig. 1 is a kind of multimedia recommendation method preferred embodiment searching for content based on user of the present invention
Flow chart, as it can be seen, comprising:
Step S1, the search key record collecting user and marking record, and generate use according to search key record
Family query vector collection, and generate user's rating matrix according to marking record;
Step S2, set up multimedia class label dictionary, according to multimedia class label dictionary by user's query vector collection
Be converted to inquiry-multimedia class matrix, then according to described inquiry-multimedia class matrix feature to user's rating matrix
Item carries out composing power, the user's rating matrix after being improved;
Step S3, traverse user query vector concentrate each user's query vector, be calculated based on user inquiry to
User's similarity of amount;And it is similar to be calculated user based on user's rating matrix according to the user's rating matrix after improving
Property;
Step S4, according to described user's similarity based on user's query vector and user based on user's rating matrix
Similarity measures obtains user's overall similarity, and if choosing and targeted customer's similarity maximum according to user's overall similarity
Dry individual user collects as the neighbour user of targeted customer;
Step S5, the evaluation of the user's assessment item non-to targeted customer concentrated according to neighbour user, it was predicted that targeted customer
Evaluation to its non-assessment item, and sort from high to low according to scoring, select some content of multimedia conducts that ranking is forward
Recommendation results is pushed to targeted customer.
In described step S1, collecting search key record and the marking record of user, marking record therein refers to
User's marking to content of multimedia, it belongs to and is explicitly entered.In the present invention, subsequent embodiment all illustrates as a example by film,
Also can implement by the way of the present invention clearly for other such as content of multimedia such as music, picture.
Marking record according to user generates user's rating matrix, and the example of user's rating matrix is as shown in Figure 2.Its
In, user gathers U={u1,u2,…,uN, movie collection I={I1,I2,…,IM, rijRepresent user uiTo film IjSatisfaction
Degree scoring, whole user's rating matrix is expressed as the matrix of a n*m.Wherein, rijMark value be limited in specific integer
In interval, such as can be rijBeing set to the integer of 0 to 5, wherein 0 represent favorable rating the unknown, 1 representative is least liked, and 5 represent
Likeing best, the favorable rating of other score values is between 1 to 5.
In addition to collecting the above-mentioned information being explicitly entered, the present invention also collects the information of user concealed input, and implicit expression is defeated
Enter information to refer to from user's accessing system to the mark information completed left by whole operating process, such as search key note
Record, location, ip address, browse record and pay close attention to collection record etc..This kind of input information indirect reflects user for whole net
The concern preference of information of standing, reflects the self information of user simultaneously the most indirectly.Implicit expression input information is transparent for user.By
Evaluation information in recessiveness input typically exhibits non-linear, and tradition Collaborative Filtering Recommendation System can not directly process, it is therefore desirable to
Extra cross-cutting technology carries out data analysis and extraction to this kind of collateral information, and its process is complicated and time overhead is relatively big, this
It it is current implicit expression input information the most adopted universal immediate cause.Mode of the present invention is then with traditional approach the most not
With, the present invention is to extract user to search for record, searches for record based on the user extracted and generates user's query vector collection, utilizes
Described user's query vector collection, as reference, solves the cold start-up of tradition collaborative filtering and the problem that feature is sparse.
Specifically, described user search for record refer to user when browsing web sites or use internet television in website or
The set of keywords keyed in the search engine that intelligent television provides.Fig. 3 is a typical film personalized recommendation system
User searches for record example.
Assume that user's set is U={u1,u2,…,uN, the user query vector Q that each of which user is correspondingx={ Q1,
Q2,…,Qy, wherein for each ux, the user query vector Q of its correspondencexDimension be different, certain film recommend
User's query vector collection in system a period of time is as shown in Figure 4.
Each user's locating vector dimension also differs, and the user's query vector even having is sky.On the one hand user's inquiry
Record and film classification there is no direct numerical relation, and the degree of correlation calculated between inquiry record and project category is brought by this
Challenge.On the other hand the user's similarity calculation method that can not use tradition collaborative filtering carries out user preferences similarity
Calculating.So the present invention combines user's similarity based on user's rating matrix and user's query vector in subsequent step and enters
Row calculates, and can improve recommendation accuracy, describe in detail the most later.
Further, described step S2 specifically includes:
S21, definition multimedia category;
As a example by film, according to the separated film rule of MoviesLens data set, film is divided into 18 kinds, is respectively
Action (action), Adventure (risk), Animation (animation), Children's (child), Comedy (drama),
Crime (crime), Documentary (record), Drama (drama), Fantasy (magic), Film-Noir (black),
Horror (terrified), Musical (music), Mystery (suspense), Romance (romantic), Sci-Fi (science fiction), Thriller
(terrible), War (war) and Western (west).
S22, set up multimedia class label dictionary;
In this step, the film search label on network is first crawled;The source of film search label is mainly Semen Sojae Preparatum net
Searching for column with the film of TIME dotCom, wherein the popular label of Semen Sojae Preparatum net updates once every other month, and the hot topic of TIME dotCom
Label then can update once every 20 days.
Then film search label is carried out duplicate removal, classification processes;Specifically can choose nearly 1 year in Semen Sojae Preparatum net and TIME dotCom heat
The label that door tab bar occurred, rejects the part wherein repeated, remainder about 2000 strip label record, then marks these
Label carry out duplicate removal and classification processes, and finally leave 18 class film about 1000, labels of search.
Finally according to " Chinese thesaurus extended edition ", film search label is carried out synonym expansion, form final version
Separated film label dictionary, comprises about 1400 labels, and separated film label dictionary selected parts are as shown in Figure 5.Wherein, " synonym
Word woods extended edition " (follow-up be called for short word woods or Chinese thesaurus) be that Harbin Institute of Technology's information retrieval experiment room is according to People's Daily's language material
About the word frequency statistics of general word in storehouse, include nearly 70,000 lexical items.It by all words according to tree-like hierarchy structure group
It is woven in together, is divided into three big layers.The biggest class 14, middle class 97, group 1400.Each little apoplexy due to endogenous wind is correlated with according to semanteme again
Property is divided into multiple group, has again several rows vocabulary in each group.
S23, according to multimedia class label dictionary, user's query vector collection is converted to inquiry-multimedia class matrix;
Such as, according to the film label frequency occurred in separated film label dictionary counting user query vector, thus will
User's query vector collection is converted to inquiry-separated film matrix.
Such as one user's query vector be (religion, crime, human nature, love, France, horror film, Dream Work Pictures, Wu Yusen,
Risk), then its key word of the inquiry belongs to label numerical value (this of Action, Fantasy, Horror, Romance and War type
In only wherein 5 types in selected parts from 18 type labels) be 2,1,2,2 and 3 respectively.One inquiry-separated film matrix
Example as shown in Figure 6.
S24, calculate user matchmaker many to inhomogeneity according to the classified inquiry information in described inquiry-multimedia class matrix
The preference of body;
Such as, user can just be calculated to 18 class films according to the classified inquiry information of user in inquiry-classification matrix
Preference.
Assume that inquiry-classification matrix is T={T1,T2,…,Tp, 18 class separated film set are M, then user uiTo film
Type MTjPreference L (ui,MTj) i.e. feature weight value can calculate by below equation:
Wherein: rijIt is user UiUser query vector TiIn comprise film types MTjLabel frequency, | M | refers to film
Classification number in classification set;Wherein L (ui,MTj) value be limited between [0,5], increment units is 0.5;λ is regulation
Coefficient, common value is 0.5.
S25, according to described preference, the characteristic item of user's rating matrix carried out composes the user after power is improved and comment
Sub matrix.
Using the preference of aforementioned calculating as the marking to film item of not marking of the user in original user rating matrix, i.e.
The characteristic item being originally null value is composed power again, the user's rating matrix after being improved.
As a example by user's rating matrix of Fig. 7, the movie collection that wherein these 10 users evaluate is made up of 10 films, point
Be not " special-shaped ", " batman's forward pass 2 knight at night ", " Infernal Affairs ", " Police Story ", " when happiness is knocked at the door ", " depopulated zone ",
" Roman Holiday ", " rescue common soldier ", " thousand seek with thousand " and " girl that those years have the heels of together with us ", its original user comments
Sub matrix is the most sparse.
The user preference that user's query vector is reflected is calculated in conjunction with separated film label dictionary and formula 1, will be original
Scoring item again give feature weight, the user's rating matrix after being improved, as shown in Figure 8.
In described step S3, calculating two kinds of user's similaritys, one is user's similarity based on user's query vector;
Another is user's similarity based on user's rating matrix.
For user similarity Sim based on user's rating matrixE(U1,U2), its computing formula is as follows:
In above-mentioned formula 2, (r11,r12,r13,..,r1m) it is user U1Scoring vector, (r21,r22,r23,..,r2m) it is
User U2Scoring vector,Represent user U respectively1、U2Average score value to multimedia projects.With in Fig. 2
As a example by film,Represent user U1To film I1、I2、I3…ImAverage score value.
For user's similarity calculation method based on user's query vector, specifically include:
Word element representation in S31, each user's query vector concentrated by user's query vector is 8 coding forms;
According to " Chinese thesaurus extended edition ", the element in user's query vector is all shown as 8 coding forms, and
It is then removed from affiliated vector by the element not occurred in " Chinese thesaurus extended edition ".
In " Chinese thesaurus extended edition ", the vocabulary of each group row all uses 8 coded representations.Coding rule table such as Fig. 9
Shown in, big class is expressed as a capitalization, and middle class is expressed as a lower case, and group is two decimal numbers, little apoplexy due to endogenous wind
Group be expressed as a capitalization, a line in paragraph is expressed as two decimal numbers.Then three kinds are used as the 8th
Different symbols represents the degree of association of word in paragraph row, "=" represent that in row, lexical semantic is equal or synonym;" # " represents in row
Vocabulary is similar, i.e. relative words;"@" represents that in row, vocabulary does not only have related term but also do not have synonym, is isolated word.Such as
Exactly " frivolous cursory floating light extreme frivolity insolent act rashly flighty and impatient impractical " this line paragraph vocabulary of " Ee29C01 " coded representation
OK.
Owing in Chinese thesaurus, the organizational form of word is level tree, so the similarity of two word elements
Can represent by the distance of two term node, and the distance of term node can describe with the 8 of word codings.Pass through
Judge that 8 word codings judge that two words, whether in same layer branch, start successively from word encodes from ground floor
Judge that the affiliated hierarchical relationship of two words, such as " Ga01A04=" and " Ga01A05=" two coding just belongs to word woods
5th branch
S32, travel through all word elements of each user's query vector, record each word element place branch node layer
Node total number n and the synonym item number m of its place paragraph row;
S33, according to described node total number n and synonym item number m, calculate between the senses of a dictionary entry of two word elements is similar
Degree;
In Chinese thesaurus, owing to a word element often exists multiple meaning, so a word element can
Can there is multiple coding item, the i.e. senses of a dictionary entry.Obviously, the similarity of word can be obtained by the similarity of the senses of a dictionary entry.Assume two
The senses of a dictionary entry is w respectively1And w2, its similarity is Simsense(w1,w2), senses of a dictionary entry Similarity measures formula based on Chinese thesaurus is such as
Under:
(1) if two senses of a dictionary entry are not on same one tree, then
Simsense(w1, w2)=f
(2) if two senses of a dictionary entry are with in ground floor branch, then
(3) if two senses of a dictionary entry are with in second layer branch, then
(4) if two senses of a dictionary entry are with in third layer branch, then
(5) if two senses of a dictionary entry are with in the 4th layer of branch, then
(6) if two senses of a dictionary entry are with in layer 5 branch, then
In above-mentioned formula, a, b, c, d, e, f are similarity adjustment factor respectively, and experience value is as follows: a=0.5, b=
0.75, c=0.82, d=0.90, e=0.96, f=0.05.n1~n5The node total number of Shi Ge branch layer, m is word element institute
Synonym item number at paragraph row;It is respectively used to calculate the node density of same tree node,With
In the senses of a dictionary entry density calculating same paragraph row.
S34, similarity according to the described senses of a dictionary entry, calculate the similarity between corresponding word element;
The similarity of two word elements is its each mathematic(al) mean of similarity between the senses of a dictionary entry.
Assume that two word elements are respectively W1And W2, W1Senses of a dictionary entry set expression beW2Senses of a dictionary entry set expression beWithRepresent the item number of corresponding set, then word element similarity based on Chinese thesaurus respectively
Computing formula is as follows:
S35, travel through whole user's query vectors, record TongYiCi CiLin and this synonym of each user's query vector
The number that in set of words, each synonym occurs, in conjunction with the similarity between word element, calculates any two user's query vector
Between similarity, and as user similarity Sim based on user's query vectors(U1,U2)。
Specifically, travel through whole user's query vector, record the TongYiCi CiLin of each user's query vectorAnd the number that in this set, each synonym occursCalculate between any two user's query vector is similar
Degree
When there is synonym between two user's query vectors, the similarity between user's query vector and two vectors
In simultaneous synonym quantity proportional;When there is not synonym between two user's query vectors, user inquiry to
The synonym quantity being individually present in similarity between amount and two vectors is inverse ratio.
Two user's query vectorsWithIts dimension
Being respectively n and m, the computing formula of similarity is as follows:
(1) whenWhen there is identical synonym:
(2) whenWhen there is not identical synonym:
Refer toThe common TongYiCi CiLin existed;With
Refer to respectivelyThe middle quantity TongYiCi CiLin more than 1;
Refer to the word number of elements that this TongYiCi CiLin comprises;Refer to that in this TongYiCi CiLin, each word element goes out
Existing number of times;∝, β are to regulate parameter accordingly, and usual value is 0.5.
In step s 4, it is Sim based on user's rating matrix calculated user similarityE(U1, U2), based on user
Query vector calculated user similarity is SimS(U1, U2), then user's overall similarity Simuser(U1, U2) computing formula
As follows:
Simuser(U1, U2)=ω × SimE(U1, U2) and+(1-ω) × SimS(U1, U2)
ω is adjustment factor, generally takes the numerical value less than 0.5, i.e. ω < 0.5, takes ω=0.35 here.
Then according to user overall similarity Simuser(U1, U2) choose several use maximum with targeted customer's similarity
Family collects as the neighbour user of targeted customer;
In step s 5, after obtaining neighbour user's collection of targeted customer, it is possible to the user concentrated according to neighbour user
The evaluation of assessment item non-to targeted customer, it was predicted that the evaluation of its non-assessment item is given a mark by targeted customer, according to scoring score value
Sorting from high to low, before picking out, several films are pushed to user as recommendation results.
Such as, the neighbour user of targeted customer i integrates as Un={ u1,u2,…,uk, the targeted customer i pre-test and appraisal to project j
Divide and calculate according to below equation:
Wherein,Represent targeted customer i to multimedia projects (as
All films) the scoring expected value that is given, Simi,αRepresent between targeted customer i and the user α that neighbour user concentrates is similar
Degree, rαjRepresent user α to project j (i.e. the scoring of film j),Represent the scoring expectation that multimedia projects are provided by user α
Value.
Based on said method, the present invention also provides for a kind of searching for the multimedia recommendation system of content based on user and preferably implementing
Example, as shown in Figure 10, comprising:
Initialization module 100, for collecting search key record and the marking record of user, and according to search key
Record generates user's query vector collection, and generates user's rating matrix according to marking record;
Matrix improves module 200, is used for setting up multimedia class label dictionary, will use according to multimedia class label dictionary
Family query vector collection is converted to inquiry-multimedia class matrix, then comments user according to described inquiry-multimedia class matrix
The characteristic item of sub matrix carries out composing power, the user's rating matrix after being improved;
Similarity computing module 300, each user's query vector concentrated for traverse user query vector, it is calculated
User's similarity based on user's query vector;And be calculated according to the user's rating matrix after improving and to mark square based on user
User's similarity of battle array;
Module 400 chosen by neighbour user's collection, for according to described user's similarity based on user's query vector and base
User's Similarity measures in user's rating matrix obtains user's overall similarity, and chooses and mesh according to user's overall similarity
Several users of mark user's similarity maximum collect as the neighbour user of targeted customer;
Recommending module 500, the evaluation of the user's assessment item non-to targeted customer for concentrating according to neighbour user, it was predicted that
Targeted customer's evaluation to its non-assessment item, and sort from high to low according to scoring, select some multimedias that ranking is forward
Content is pushed to targeted customer as recommendation results.
Further, described matrix improvement module 200 specifically includes:
Definition unit, is used for defining multimedia category;
Unit set up by tag along sort dictionary, is used for setting up multimedia class label dictionary;
Matrix conversion unit, for being converted to inquiry-many according to multimedia class label dictionary by user's query vector collection
Media categories matrix;
Preference computing unit, for calculating according to the classified inquiry information in described inquiry-multimedia class matrix
Go out user to different classes of multimedia preference;
Matrix improves unit, is changed for the characteristic item of user's rating matrix being carried out tax power according to described preference
User's rating matrix after entering.
Further, described similarity computing module 300 specifically includes:
Coded representation unit, the word element representation in each user's query vector concentrated by user's query vector is
8 coding forms;
Record unit, for traveling through all word elements of each user's query vector, records each word element place
The node total number of branch node layer and the synonym item number of its place paragraph row;
Senses of a dictionary entry computing unit, for according to described node total number and synonym item number, calculating the senses of a dictionary entry of two word elements
Between similarity;
Word computing unit, for the similarity according to the described senses of a dictionary entry, calculates the similarity between corresponding word element;
Similarity calculation unit, for traveling through whole user's query vectors, records the synonym of each user's query vector
The number that in set of words and this TongYiCi CiLin, each synonym occurs, in conjunction with the similarity between word element, calculates arbitrarily
Similarity between two user's query vectors, and as user similarity Sim based on user's query vectorS(U1,U2)。
Further, in described similarity computing module 300, user's similarity based on user's rating matrix is as follows
Calculate:
Wherein, (r11,r12,r13,..,r1m) it is user U1Scoring vector, (r21,r22,r23,..,r2m) it is user U2's
Scoring vector,Represent user U respectively1、U2Average score value to multimedia projects.
Ins and outs about above-mentioned modular unit are described in detail in method above, therefore repeat no more.
In sum, the user that the present invention passes through to extract and quantify to contain in user's search key record is to specific many matchmakers
The fancy grade information of body type, introduces user's query vector collection, and as targeted customer to multimedia item purpose initial score,
Thus solving the cold start-up of tradition collaborative filtering and the problem that feature is sparse.Additionally the present invention is by user's query vector
User's similarity and user's similarity based on user's rating matrix combine, be calculated user's overall similarity, solve
The problem that the recommendation precision of traditional collaborative filtering of having determined is not enough.
It should be appreciated that the application of the present invention is not limited to above-mentioned citing, for those of ordinary skills, can
To be improved according to the above description or to convert, all these modifications and variations all should belong to the guarantor of claims of the present invention
Protect scope.
Claims (10)
1. the multimedia recommendation method searching for content based on user, it is characterised in that including:
Step A, the search key record collecting user and marking record, and generate user's inquiry according to search key record
Vector set, and generate user's rating matrix according to marking record;
Step B, set up multimedia class label dictionary, according to multimedia class label dictionary, user's query vector collection is converted to
Inquiry-multimedia class matrix, is then carried out the characteristic item of user's rating matrix according to described inquiry-multimedia class matrix
Compose power, the user's rating matrix after being improved;
Each user's query vector that step C, traverse user query vector are concentrated, is calculated use based on user's query vector
Family similarity;And it is calculated user's similarity based on user's rating matrix according to the user's rating matrix after improving;
Step D, similar according to described user's similarity based on user's query vector and user based on user's rating matrix
Property be calculated user's overall similarity, and choose several maximum with targeted customer's similarity according to user's overall similarity
User collects as the neighbour user of targeted customer;
Step E, the evaluation of the user's assessment item non-to targeted customer concentrated according to neighbour user, it was predicted that targeted customer is to it not
The evaluation of assessment item, and sort from high to low according to scoring, select the forward some content of multimedia of ranking as recommending knot
Fruit is pushed to targeted customer.
The multimedia recommendation method searching for content based on user the most according to claim 1, it is characterised in that described step
B specifically includes:
B1, definition multimedia category;
B2, set up multimedia class label dictionary;
B3, according to multimedia class label dictionary, user's query vector collection is converted to inquiry-multimedia class matrix;
B4, calculate user to different classes of multimedia according to the classified inquiry information in described inquiry-multimedia class matrix
Preference;
B5, the characteristic item of user's rating matrix carries out composing according to described preference the user after power is improved mark square
Battle array.
The multimedia recommendation method searching for content based on user the most according to claim 1, it is characterised in that described step
In C, the step being calculated user's similarity based on user's query vector specifically includes:
Word element representation in C1, each user's query vector concentrated by user's query vector is 8 coding forms;
C2, travel through all word elements of each user's query vector, record the joint of each word element place branch node layer
Point sum and the synonym item number of its place paragraph row;
C3, according to described node total number and synonym item number, calculate the similarity between the senses of a dictionary entry of two word elements;
C4, similarity according to the described senses of a dictionary entry, calculate the similarity between corresponding word element;
C5, travel through whole user's query vectors, record TongYiCi CiLin and this synset of each user's query vector
The number that in conjunction, each synonym occurs, in conjunction with the similarity between word element, calculates between any two user's query vector
Similarity, and as user similarity Sim based on user's query vectors(U1,U2)。
The multimedia recommendation method searching for content based on user the most according to claim 3, it is characterised in that described step
In C, user's similarity based on user's rating matrix is calculated as follows:
Wherein, (r11,r12,r13,..,r1m) it is user U1Scoring vector, (r21,r22,r23,..,r2m) it is user U2Scoring
Vector,Represent user U respectively1、U2Average score value to multimedia projects.
The multimedia recommendation method searching for content based on user the most according to claim 4, it is characterised in that described step
In D, user's overall similarity computing formula is as follows:
Simuser(U1, U2)=ω × SimE(U1, U2)+(1-ω)×SimS(U1, U2), ω is adjustment factor, and ω < 0.5.
The multimedia recommendation method searching for content based on user the most according to claim 5, it is characterised in that described step
In E, the targeted customer i prediction scoring r to project jijCalculate according to below equation:
Wherein,Represent what multimedia projects were provided by targeted customer i
Scoring expected value, Simi,αRepresent the similarity between the user α that targeted customer i and neighbour user concentrate, rαjRepresent user α couple
The scoring of project j,Represent the scoring expected value that multimedia projects are provided by user α.
7. the multimedia recommendation system searching for content based on user, it is characterised in that including:
Initialization module, for collecting search key record and the marking record of user and raw according to search key record
Become user's query vector collection, and generate user's rating matrix according to marking record;
Matrix improves module, is used for setting up multimedia class label dictionary, user is inquired about according to multimedia class label dictionary
Vector set is converted to inquiry-multimedia class matrix, then according to described inquiry-multimedia class matrix to user's rating matrix
Characteristic item carry out compose power, the user's rating matrix after being improved;
Similarity computing module, for traverse user query vector concentrate each user's query vector, be calculated based on
User's similarity of family query vector;And it is calculated use based on user's rating matrix according to the user's rating matrix after improving
Family similarity;
Module chosen by neighbour user's collection, for according to described user's similarity based on user's query vector and commenting based on user
User's Similarity measures of sub matrix obtains user's overall similarity, and chooses and targeted customer's phase according to user's overall similarity
Collect like spending several the maximum users neighbour user as targeted customer;
Recommending module, the evaluation of the user's assessment item non-to targeted customer for concentrating according to neighbour user, it was predicted that target is used
The family evaluation to its non-assessment item, and sort from high to low according to scoring, select the forward some content of multimedia of ranking and make
It is pushed to targeted customer for recommendation results.
The multimedia recommendation system searching for content based on user the most according to claim 7, it is characterised in that described matrix
Improvement module specifically includes:
Definition unit, is used for defining multimedia category;
Unit set up by tag along sort dictionary, is used for setting up multimedia class label dictionary;
Matrix conversion unit, for being converted to inquiry-multimedia according to multimedia class label dictionary by user's query vector collection
Classification matrix;
Preference computing unit, for calculating use according to the classified inquiry information in described inquiry-multimedia class matrix
Family is to different classes of multimedia preference;
Matrix improves unit, for carrying out the characteristic item of user's rating matrix according to described preference after tax power improved
User's rating matrix.
The multimedia recommendation system searching for content based on user the most according to claim 7, it is characterised in that described similar
Property computing module specifically includes:
Coded representation unit, the word element representation in each user's query vector concentrated by user's query vector is 8
Coding form;
Record unit, for traveling through all word elements of each user's query vector, records each word element place branch
The node total number of node layer and the synonym item number of its place paragraph row;
Senses of a dictionary entry computing unit, for according to described node total number and synonym item number, calculating between the senses of a dictionary entry of two word elements
Similarity;
Word computing unit, for the similarity according to the described senses of a dictionary entry, calculates the similarity between corresponding word element;
Similarity calculation unit, for traveling through whole user's query vectors, records the synset of each user's query vector
Close and the number of each synonym appearance in this TongYiCi CiLin, in conjunction with the similarity between word element, calculate any two
Similarity between user's query vector, and as user similarity Sim based on user's query vectors(U1,U2)。
The multimedia recommendation system searching for content based on user the most according to claim 7, it is characterised in that described phase
Like in property computing module, user's similarity based on user's rating matrix is calculated as follows:
Wherein, (r11,r12,r13,..,r1m) it is user U1Scoring vector, (r21,r22,r23,..,r2m) it is user U2Scoring
Vector,Represent user U respectively1、U2Average score value to multimedia projects.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610653997.XA CN106250545A (en) | 2016-08-10 | 2016-08-10 | A kind of multimedia recommendation method and system searching for content based on user |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610653997.XA CN106250545A (en) | 2016-08-10 | 2016-08-10 | A kind of multimedia recommendation method and system searching for content based on user |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106250545A true CN106250545A (en) | 2016-12-21 |
Family
ID=58078223
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610653997.XA Pending CN106250545A (en) | 2016-08-10 | 2016-08-10 | A kind of multimedia recommendation method and system searching for content based on user |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106250545A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106991199A (en) * | 2017-06-07 | 2017-07-28 | 上海理工大学 | The commending system score in predicting of probability is inclined to recommending method based on user behavior |
CN107391531A (en) * | 2017-04-11 | 2017-11-24 | 阿里巴巴集团控股有限公司 | A kind of vegetable recommends method and apparatus |
CN107423320A (en) * | 2017-03-30 | 2017-12-01 | 青岛大学 | A kind of medical domain under big data framework is from media platform data push method |
CN107507049A (en) * | 2017-06-30 | 2017-12-22 | 昆明理工大学 | Method is recommended in a kind of online service towards inconsistent user's interpretational criteria |
CN108182264A (en) * | 2018-01-09 | 2018-06-19 | 武汉大学 | A kind of ranking based on cross-cutting ranking recommended models recommends method |
CN108647724A (en) * | 2018-05-11 | 2018-10-12 | 国网电子商务有限公司 | A kind of user's recommendation method and device based on simulated annealing |
CN108897887A (en) * | 2018-07-10 | 2018-11-27 | 华南师范大学 | A kind of teaching resource recommended method of knowledge based map and user's similarity |
CN109544306A (en) * | 2018-11-30 | 2019-03-29 | 苏州大学 | A kind of cross-cutting recommended method and device based on user behavior sequence signature |
CN110147463A (en) * | 2019-04-03 | 2019-08-20 | 华南理工大学 | A kind of music method for pushing, system, device and storage medium |
CN110326253A (en) * | 2016-12-30 | 2019-10-11 | 罗伯特·博世有限公司 | For carrying out the method and system of fuzzy keyword searching to encryption data |
CN110442977A (en) * | 2019-08-08 | 2019-11-12 | 广州华建工智慧科技有限公司 | Mobile terminal BIM model intelligent buffer method based on construction process network recommendation |
CN111159493A (en) * | 2019-12-25 | 2020-05-15 | 乐山师范学院 | Network data similarity calculation method and system based on feature weight |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102063433A (en) * | 2009-11-16 | 2011-05-18 | 华为技术有限公司 | Method and device for recommending related items |
-
2016
- 2016-08-10 CN CN201610653997.XA patent/CN106250545A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102063433A (en) * | 2009-11-16 | 2011-05-18 | 华为技术有限公司 | Method and device for recommending related items |
Non-Patent Citations (1)
Title |
---|
植伟良: "基于搜索的协同过滤算法在电影推荐系统中的研究与应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110326253A (en) * | 2016-12-30 | 2019-10-11 | 罗伯特·博世有限公司 | For carrying out the method and system of fuzzy keyword searching to encryption data |
CN110326253B (en) * | 2016-12-30 | 2021-11-09 | 罗伯特·博世有限公司 | Method and system for fuzzy keyword search of encrypted data |
CN107423320A (en) * | 2017-03-30 | 2017-12-01 | 青岛大学 | A kind of medical domain under big data framework is from media platform data push method |
CN107423320B (en) * | 2017-03-30 | 2023-06-09 | 青岛大学 | Medical field self-media platform data pushing method under big data architecture |
CN107391531A (en) * | 2017-04-11 | 2017-11-24 | 阿里巴巴集团控股有限公司 | A kind of vegetable recommends method and apparatus |
CN106991199A (en) * | 2017-06-07 | 2017-07-28 | 上海理工大学 | The commending system score in predicting of probability is inclined to recommending method based on user behavior |
CN106991199B (en) * | 2017-06-07 | 2020-07-14 | 上海理工大学 | User behavior tendency probability-based recommendation system score prediction and recommendation method |
CN107507049A (en) * | 2017-06-30 | 2017-12-22 | 昆明理工大学 | Method is recommended in a kind of online service towards inconsistent user's interpretational criteria |
CN108182264A (en) * | 2018-01-09 | 2018-06-19 | 武汉大学 | A kind of ranking based on cross-cutting ranking recommended models recommends method |
CN108182264B (en) * | 2018-01-09 | 2022-04-01 | 武汉大学 | Ranking recommendation method based on cross-domain ranking recommendation model |
CN108647724A (en) * | 2018-05-11 | 2018-10-12 | 国网电子商务有限公司 | A kind of user's recommendation method and device based on simulated annealing |
CN108897887B (en) * | 2018-07-10 | 2020-10-16 | 华南师范大学 | Teaching resource recommendation method based on knowledge graph and user similarity |
CN108897887A (en) * | 2018-07-10 | 2018-11-27 | 华南师范大学 | A kind of teaching resource recommended method of knowledge based map and user's similarity |
CN109544306A (en) * | 2018-11-30 | 2019-03-29 | 苏州大学 | A kind of cross-cutting recommended method and device based on user behavior sequence signature |
CN109544306B (en) * | 2018-11-30 | 2021-09-21 | 苏州大学 | Cross-domain recommendation method and device based on user behavior sequence characteristics |
CN110147463A (en) * | 2019-04-03 | 2019-08-20 | 华南理工大学 | A kind of music method for pushing, system, device and storage medium |
CN110442977A (en) * | 2019-08-08 | 2019-11-12 | 广州华建工智慧科技有限公司 | Mobile terminal BIM model intelligent buffer method based on construction process network recommendation |
CN110442977B (en) * | 2019-08-08 | 2023-09-29 | 广州华建工智慧科技有限公司 | Mobile terminal BIM model intelligent caching method based on building construction procedure network recommendation |
CN111159493A (en) * | 2019-12-25 | 2020-05-15 | 乐山师范学院 | Network data similarity calculation method and system based on feature weight |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106250545A (en) | A kind of multimedia recommendation method and system searching for content based on user | |
CN106802956B (en) | Movie recommendation method based on weighted heterogeneous information network | |
CN104156450B (en) | A kind of Item Information based on user network data recommends method | |
Yin et al. | Building taxonomy of web search intents for name entity queries | |
CN104484431B (en) | A kind of multi-source Personalize News webpage recommending method based on domain body | |
CN104935963B (en) | A kind of video recommendation method based on timing driving | |
CN102982042B (en) | A kind of personalization content recommendation method, platform and system | |
CN103631929B (en) | A kind of method of intelligent prompt, module and system for search | |
CN106354862A (en) | Multidimensional individualized recommendation method in heterogeneous network | |
Parra-Santander et al. | Improving collaborative filtering in social tagging systems for the recommendation of scientific articles | |
Benouaret et al. | A package recommendation framework for trip planning activities | |
CN106777051A (en) | A kind of many feedback collaborative filtering recommending methods based on user's group | |
CN107220365A (en) | Accurate commending system and method based on collaborative filtering and correlation rule parallel processing | |
CN103744956B (en) | A kind of diversified expanding method of key word | |
CN103955535A (en) | Individualized recommending method and system based on element path | |
CN106934071A (en) | Recommendation method and device based on Heterogeneous Information network and Bayes's personalized ordering | |
CN107239512B (en) | A kind of microblogging comment spam recognition methods of combination comment relational network figure | |
CN102411754A (en) | Personalized recommendation method based on commodity property entropy | |
CN102063433A (en) | Method and device for recommending related items | |
CN103559622A (en) | Characteristic-based collaborative filtering recommendation method | |
CN107833117A (en) | A kind of Bayes's personalized ordering for considering label information recommends method | |
CN104615779A (en) | Method for personalized recommendation of Web text | |
CN109190030A (en) | Merge the implicit feedback recommended method of node2vec and deep neural network | |
CN103150667B (en) | A kind of personalized recommendation method based on body construction | |
CN104899246A (en) | Collaborative filtering recommendation method of user rating neighborhood information based on fuzzy mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20161221 |