CN106528633B

CN106528633B - A kind of video society attention rate improvement method recommended based on keyword

Info

Publication number: CN106528633B
Application number: CN201610884840.8A
Authority: CN
Inventors: 周仁杰; 万健; 夏冬晨; 张纪林; 殷昱煜; 张伟; 任祖杰; 贾刚勇
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2016-10-11
Filing date: 2016-10-11
Publication date: 2019-07-02
Anticipated expiration: 2036-10-11
Also published as: CN106528633A

Abstract

The video society attention rate improvement method based on keyword recommendation that the invention discloses a kind of.The method that the present invention uses semantic dependency and deep learning, recommends the keyword of video, improves the degree of social concern of video.Method is found and the maximally related several semantic key words of initial key word justice according to the initial key word of user first based on the semantic dependency between keyword and to the deep learning of video content；Then entity key is excavated using deep learning analysis video content；Finally according to this two parts keyword of certain standard sorted, the maximally related keyword of user is recommended.The keyword that the present invention recommends has taken into account the degree of correlation of keyword and video content and has attracted the potentiality of the degree of social concern, improves the degree of social concern of video, is a kind of not only efficient but also practical video key recommended method.The present invention can be used for online social media analysis, and data mining and video tab recommend field.

Description

A kind of video society attention rate improvement method recommended based on keyword

Technical field

The invention belongs to the analysis of online social media, data mining, video tab recommended technology field are specifically designed one The video society attention rate improvement method that kind is recommended based on keyword.

Background technique

In traditional internet application, search engine is the important tool that user has found Web content.Therefore, needle at present It is mainly what Search Engine-Oriented proposed to the method for improving the Web content degree of social concern.And in social media, especially It is as YouTube, Flickr and youku.com in multimedia sharing website, other than search engine, recommender system is also One important sources of the degree of social concern.It thus needs while excavating search engine and recommender system improves the latent of the degree of social concern Power could more effectively improve the degree of social concern of Social Media content.

Although search engine has enough interior tolerant users to go to find, but with the information content of internet be skyrocketed through with And people require search engine higher and higher, search engine also shows certain limitation, if coverage rate is low, as a result not Accurately, incoherent result etc. is returned.Although recommender system can recommend out the degree of correlation it is very high as a result, still recommendation results Range also suffered from certain limitation, and very big difference can be also presented in the recommendation results of different recommender systems.

Deep learning is as current new technology, in terms of also can be applied to the degree of social concern for improving video.Depth It practises the application in terms of video and is mainly manifested in extraction video content information, and can ensure the standard for extracting video content information True property.

Summary of the invention

For appeal problem, the invention discloses it is a kind of based on keyword recommend video society attention rate improvement method, The keyword that this method is recommended can combine with the degree of correlation of video content and attract the potentiality of the degree of social concern.

The technical solution used to solve the technical problems of the present invention is that:

A kind of video society attention rate improvement method recommended based on keyword, this method are realized using following steps:

Step 1 --- obtain video initial key word:

According to given video, the video initial title keyword provided when in conjunction with user's uploaded videos is provided and video Relevant K initial key word.

Step 2 --- initial key word WordNet semantic extension:

Semantic similar keyword is looked on WordNet according to initial key word, is extended to the preliminary semantic pass WordNet Keyword set.

Step 3 --- major video sharing website extends again:

It using preliminary semantic key words set, is scanned in major video sharing website, extraction can attract more The keyword of the degree of social concern is extended to final semantic key words set.

Step 4 --- extract video entities keyword set:

By deep learning technology, mining content of video information forms video entities keyword set.

Step 5 --- sort key set of words:

The degree of correlation and the degree of social concern for considering keyword, according to keyword frequency of occurrence and keyword and initial key Two keyword sets of two aspect sequencing video semanteme of average degree of correlation and entity of set of words, confirmation can finally be provided to user's Keyword set.

The invention has the advantages that:

1, the present invention carries out extension semantically to initial key word by WordNet semantic dictionary, due to WordNet language Adopted dictionary itself has carried out good summarizing to entry in terms of semanteme, therefore the semantic key words set expanded guarantees With initial video title in correlation semantically, the quality of semantic hierarchies is improved, the diversity of keyword is also promoted.

2, the present invention further expands semantic key words set by major video sharing website, according to similar or similar The video of theme can usually search this thought on multiple websites, in conjunction with video sharing site search engine and recommend system The two abilities of uniting extend semantic key words set, and the semantic key words set is not only related to video content, but also increases The diversity and the degree of social concern of keyword.

3, the present invention is analyzed and is identified to video content by deep learning technology, can be collected true with video The most proper entity information of real content improves the authenticity and accuracy that be finally supplied to the keyword of user.

4, it is arranged in terms of the average degree of correlation two that the present invention passes through keyword frequency of occurrence and keyword and initial key word Sequence set of keywords, had not only measured the degree of correlation of keyword Yu initial key set of words, but also had measured the society pass of keyword Note degree.

5, the present invention can be used for online social media analysis, the field of data mining, it is particularly possible to be used for video tab Recommendation field.

Detailed description of the invention

Fig. 1 is the overall framework figure that keyword of the present invention is recommended.

Fig. 2 is the flow chart of keyword rank of the present invention.

Specific embodiment

The present invention is further described with specific implementation application process with reference to the accompanying drawing:

Execute step referring to Fig.1 to illustrate implementation process of the invention:

Step 1 --- obtain video initial key word:

According to given video, user can provide user oneself in video upload interface and think to close in uploaded videos Accurate video initial title is fitted, extracts K keyword in video initial title as initial title keyword set X.

Step 2 --- initial key word WordNet semantic extension:

According to initial key set of words, for each keyword, input in WordNet semantic dictionary, output and this Several relevant entries of a keywords semantics choose semantically maximally related 2-3 keyword, finally constitute and be based on The preliminary semantic key words set of WordNet extension.The Video Key word extended in this way not only ensure that and initial video mark Topic improves the quality of semantic hierarchies, and promote the diversity of keyword in correlation semantically.

Step 3 --- major video sharing website extends again:

It can usually be searched on multiple video sharing websites according to similar or similar topic video, so we Similar or similar topic video information can be collected on multiple video sharing websites.Video sharing website is utilized in we Search engine and the big ability of recommender system two, following two can be divided into based on major video sharing website extending video keyword Step:

1) search engine is searched for

For the preliminary semantic key words set of WordNet extension, 2-3 group keyword is formed, is shared in major video Plain engine is searched using website on website every group of keyword is carried out searching element, extract before the ranking that each site search goes out 10 view Frequently, the key word in title for collecting these videos is added in semantic key words set.

2) recommender system is recommended

For, by searching the video in the forefront that plain engine search goes out, collection video website passes through these videos in the first step The associated video that recommender system is recommended, likewise, the key word in title of these associated videos is added to semantic pass by we here In keyword set.

By two above step, we are adequately utilized the search capability of the search engine of video sharing website and push away The recommendation ability for recommending system, the semantic key words set extended by the two abilities is not only related to video content, and Also add the diversity and the degree of social concern of keyword.

Step 4 --- extract video entities keyword set:

According to the duration of video, the extraction key frame of video of our fixed length forms key frame of video collection.Key frame of video As input, it is input to and has used in the trained deep learning frame Caffe of ImageNet, export corresponding key frame of video Entity information recognition result is added in video entities keyword set.Analysis by deep learning technology to video content And identification, we can collect the entity information most proper with video true content, improve to a certain extent final It is supplied to the authenticity and accuracy of the keyword of user.

Step 5 --- sort key set of words:

NGD similarity distance calculates as follows:

Wherein h (t) and h (X_i) indicate using in Google engine search keyword t and initial title keyword set X Keyword X_iThe searching bar number returned respectively, h (t, X_i) indicate to simultaneously scan for the searching bar number of the two keywords return, N is indicated The webpage number (the webpage number that Google engine may search in the case where not inputting any search condition) of Google index.If away from From value closer to 0, indicate that both keyword is more related semantically；If distance value is closer to infinitely great, both keyword It is more uncorrelated semantically.

TF-SIM sort algorithm is as follows:

Wherein T_tIndicate that the number that keyword t occurs, X indicate initial title keyword set, n indicates that initial title is crucial The number of keyword in set of words X.

The distribution of semantic and entity key number calculates as follows:

T_n=T_s+δT_s (3)

Wherein T_nExpression needs to recommend the keyword number of user, T_sIndicate the key extracted from semantic key words set Word number, δ T_sIndicate the keyword number extracted from entity key set, δ value is rule of thumb set as 0.5.

It is divided into following four step referring to the process keyword set relevancy ranking of Fig. 2:

1) keyword frequency calculates: pressing frequency of occurrence sort key set of words to the keyword in keyword set, and remembers Record the frequency of occurrence of each keyword.One is obtained without duplicate keyword set.

2) NGD distance calculates: is calculated and initial title keyword set using formula (1) without duplicate keyword set Average degree of correlation.

3) TF-SIM (similarity value) sequence calculates: the keyword frequency of occurrence and keyword being calculated by first two steps Formula (2) algorithmic formula, sort key set of words are substituted into the average degree of correlation of initial key set of words.

4) it final keyword extraction: is calculated by formula (3), obtains the consequently recommended keyword set to user.

Keyword order standard should comprehensively consider the degree of correlation and the degree of social concern, and formula (2) weighs from two factors Amount: the average degree of correlation of keyword frequency of occurrence and keyword and initial key set of words.First factor is gone out by keyword Occurrence number has measured the potentiality that keyword attracts the degree of social concern.Second factor is by calculating the phase with initial key set of words Guan Du has measured keyword with the video degree of correlation.It conditions each other between two factors, if keyword frequency of occurrence is more, but It is that the degree of correlation is low, then score is affected certainly, vice versa.By the calculating of the two factors, final determination is recommended The best keyword of user.

Above-described embodiment is not for limitation of the invention, and the present invention is not limited only to above-described embodiment, as long as meeting The present invention claims all belong to the scope of protection of the present invention.

Claims

1. a kind of video society attention rate improvement method recommended based on keyword, which is characterized in that this method uses following step It is rapid to realize:

Step 1. obtains video initial key word: according to given video, the video provided when in conjunction with user's uploaded videos is initial Title keyword extracts K initial key word relevant to video, constitutes initial key set of words X；

Step 2. initial key word WordNet semantic extension: by K initial key word obtained above respectively on WordNet It looks for semantic similar keyword and is extended to the preliminary semantic key words set of WordNet in conjunction with initial key set of words；

Step 3. major video sharing website extends again: utilizing the preliminary semantic key words set of WordNet, shares in major video It is scanned in website, the keyword that can attract more degree of social concern is extracted, in conjunction with the preliminary semantic key words collection of WordNet It closes, is extended to the final semantic key words set of WordNet；

Step 4. extracts video entities keyword set: by deep learning technology, it is real to form video for mining content of video information Body keyword set；

Step 5. sort key set of words: considering the degree of correlation and the degree of social concern of keyword, according to keyword frequency of occurrence with And two keyword sets of two aspect sequencing video semanteme of average degree of correlation and entity of keyword and initial key set of words, really Recognize the keyword set that can finally be provided to user.

2. a kind of video society attention rate improvement method recommended based on keyword according to claim 1, feature are existed In: in step 3, according to similar or similar topic video information thought can be searched on multiple websites, utilize step The rapid 2 preliminary semantic key words of WordNet obtained, scan on major video website, by video sharing website itself Two abilities of search engine and recommender system of carrying, extract the keyword that can attract more degree of social concern, are extended to most Whole semantic key words set.

3. a kind of video society attention rate improvement method recommended based on keyword according to claim 1, feature are existed In: in step 4, the key frame of the extraction video of fixed length, is backstage with ImageNet pictures, according to deep learning technology, Video content information is excavated, video entities keyword set is formed.

4. a kind of video society attention rate improvement method recommended based on keyword according to claim 1, feature are existed It is as follows in step 5 specific operation process:

5.1 keyword frequencies calculate: in the final semantic key words set of WordNet and video entities keyword set Keyword presses frequency of occurrence rearrangement respectively, and integration obtains respectively without duplicate new keywords set, and records each key The frequency of occurrence of word；

5.2 NGD distance values calculate: what step 5.1 obtained is calculated and step 1 without duplicate key set of words using formula (1) The NGD distance value of the initial key set of words of acquisition:

Wherein h (t) and h (X_i) indicate step 5.1 obtain without keyword t in duplicate key set of words and initial key set of words X In keyword X_iThe searching bar number returned in search engine G respectively, h (t, X_i) indicate that simultaneously scanning for the two keywords returns The searching bar number returned, N indicate the webpage number that search engine G can be indexed；

5.3 TF-SIM sequence calculates: the keyword frequency of occurrence and keyword and initial key word being calculated by first two steps The NGD distance value of set X substitutes into formula (2) algorithmic formula, is carried out according to the TF-SIM similarity value being calculated according to size Rearrangement, constitutes new keyword set；

Wherein T_tIndicate that the number that keyword t occurs, X indicate initial key set of words, n indicates crucial in initial key set of words X The number of word；

5.4 final keyword extractions: it is calculated by formula (3) and constitutes WordNet in the keyword set for recommending user The keyword number that final semantic key words set and video entities keyword set respectively provide is most related needed for final acquisition Keyword:

T_n=T_s+δT_s (3)

Wherein T_nExpression needs to recommend the keyword number of user, T_sIndicate the keyword number extracted from semantic key words set, δT_sIndicate the keyword number extracted from entity key set, δ is empirical value.