CN105654125A - Method for calculating video similarity - Google Patents

Method for calculating video similarity Download PDF

Info

Publication number
CN105654125A
CN105654125A CN201511008475.6A CN201511008475A CN105654125A CN 105654125 A CN105654125 A CN 105654125A CN 201511008475 A CN201511008475 A CN 201511008475A CN 105654125 A CN105654125 A CN 105654125A
Authority
CN
China
Prior art keywords
video
formula
brief introduction
participles
participle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201511008475.6A
Other languages
Chinese (zh)
Inventor
邢建平
田欣玉
宋宪明
刘绪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN201511008475.6A priority Critical patent/CN105654125A/en
Publication of CN105654125A publication Critical patent/CN105654125A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Library & Information Science (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method for calculating video similarity. Particularly the method comprises the steps of (1) extracting the preliminary text characteristic of a video A; (2), extracting the recessive characteristic of the video A through an LDA topic model; and (3) calculating the text characteristic vA of the video A, calculating the text characteristic vB of a video B through steps (1)-(3), and calculating the similarity between the video A and the video B. According to the method of the invention, a user characteristic portrait is constructed through analyzing historical data of a user. For realizing a purpose of high efficiency, the user characteristic portrait is constructed through calculation of an offline processing device. Therefore the user characteristic can be periodically obtained. Finally the user characteristic portrait is combined in a related video recommendation, thereby realizing a purpose of individual recommendation. The method of the invention has advantages of correcting a traditional similar video calculating method according to user comments, improving conversion rate of related videos, improving user experience and creating tremendous economic benefit for video suppliers.

Description

A kind of computational methods of video similarity
Technical field
The present invention relates to the computational methods of a kind of video similarity, belong to computer data digging technology field.
Background technology
Along with the fast development of Internet technology, at big data age, people can obtain and get more and more with the video resource contacted, but find the time that the video oneself liked spends also to get more and more. In video portal website, there is the service that associated video is recommended in capital, more video resource can be presented to user by this service, and video recommendations technology has been widely applied in Online Video system, and wherein associated video is recommended to have changed into one of main path that the user discover that video. Specifically, after user enters into the details page of a video or a video-see terminates, system can show some list of videos relevant to this video. Improve the video click rate of user, and in certain degree, improve the paying conversion ratio of user. It is improve the part that personalized service is indispensable that associated video calculates.
Usually, alternative videos is ranked up by the label that associated video promotion expo foundation and target video match, and some is based on the number of tags matched, and some is then based on the tag match algorithm of Weight. The computational methods of existing similar video are many to be calculated based on video, is not weighted from the dimension of user. Hisense's TV user daily record data by analysis, the conversion ratio of associated video, less than 10%, is learnt by analysis, and the computational methods of the similar video used by inline system are comparatively single, the theme of video is not weighted with the comment of user, result in the conversion ratio of similar video relatively low.
Summary of the invention
For the deficiencies in the prior art, the invention provides the computational methods of a kind of video similarity;
The present invention builds user characteristics portrait by the historical data (behavioral datas of viewing, comment etc.) of user is analyzed, in order to reach efficient purpose, it is good in advance that what user characteristics was drawn a portrait is established by processed offline device, therefore, can periodically obtain user characteristics, finally user characteristics portrait is dissolved in associated video recommendation, thus having reached the purpose of personalized recommendation.
Traditional similar video computational methods are modified by the present invention according to the comment of user, while improving associated video conversion ratio and promoting Consumer's Experience, bring huge economic benefit also to video supplier.
Terminological interpretation
Text feature, refers to the ultimate unit for representing text;
The technical scheme is that
A kind of computational methods of video similarity, concrete steps include:
(1) the preliminary text feature of video A is extracted
1. the brief introduction of described video A is carried out Chinese word segmentation;
2. the frequency of each participle that 1. calculation procedure obtains, computing formula is such as shown in formula I:
β a , d = c o u n t ( a , d ) c o u n t ( d ) - - - ( I )
In formula I, ��a,dReferring to participle a frequency in the brief introduction d of video A, (a, d) refers to the participle a number of times occurred in the brief introduction d of video A to count, and count (d) refers to the quantity of all participles in the brief introduction d of video A;
3. the participle a inverse document frequency �� occurred in the brief introduction C of all videos in whole data base is calculateda,C, computing formula is such as shown in formula II:
β a , C = l o g ( n c o u n t ( a , C ) ) - - - ( I I ) ;
In formula II, n refers to that the sum of the brief introduction C of all videos in whole data base, count (a, C) refer to the quantity of the brief introduction of the video occurring participle a in the brief introduction C of all videos in whole data base;
Step is 3. in order to punish the vocabulary that in the brief introduction C of all videos in whole data base, the frequency of occurrences is higher, the effect in the brief introduction of certain video of frequency this vocabulary of more high explanation is more poor, such as " " this word, the number of times occurred in the brief introduction C of all videos in whole data base is a lot, and the contribution of the brief introduction of video is more little.
4. participle a weight beta in the brief introduction of described video A is calculated��, computing formula is such as shown in formula III:
����=����,d*����,C(��)
5. the preliminary text feature of video A is calculated: ��A={ a: ��a,b:��b... ... }; Wherein, and a, b ... ... } refer to all participles of described video A, { ��a,��b... ... } refer to the weight that all participles of described video A are corresponding;
(2) LDA topic model is adopted to extract the recessive character of video A
6. the brief introduction of described video A is carried out Chinese word segmentation;
7. all participles step 6. obtained are placed on corpus;
8. described corpus step 7. obtained inputs described LDA topic model, it is intended that theme number, output: video A degree of association V on each designated keytvAnd the degree of association V that all participles is on each designated keyat; Such as, table 1 and table 2;
Table 1
Table 2
9. participle a weight �� in the brief introduction of video A is calculateda, computing formula is such as shown in formula IV:
��a=Vat*Vtv(��)
10. the recessive character calculating video A is ��A={ a: ��a,b:��b... ... }, wherein, a, b ... ... } refer to all participles of described video A, { ��a,��b... ... } refer to the weight that all participles of described video A are corresponding;
(3) the text feature v of video A is calculatedA, shown in computing formula such as formula (V):
vA=�� ��A*(1-��)��A(��)
In formula (V), �� be similar video conversion ratio maximum time value;
(4) the B text feature v of video is calculated by step (1)-(3)B, and calculate the similarity between video A, video B, computing formula is such as shown in formula VI:
s i m ( v A , v B ) = c o s ( v A → , v B → ) = v A → · v B → | v A → | * | v B → | - - - ( V I ) .
The invention have the benefit that
1, the present invention builds user characteristics portrait by the historical data (behavioral datas of viewing, comment etc.) of user is analyzed, in order to reach efficient purpose, it is good in advance that what user characteristics was drawn a portrait is established by processed offline device, therefore, can periodically obtain user characteristics, finally user characteristics portrait is dissolved in associated video recommendation, thus having reached the purpose of personalized recommendation.
2, traditional similar video computational methods are modified by the present invention according to the comment of user, while improving associated video conversion ratio and promoting Consumer's Experience, bring huge economic benefit also to video supplier.
Accompanying drawing explanation
Fig. 1 is the FB(flow block) of the computational methods of video similarity of the present invention;
Fig. 2 is the flow chart of the recessive character adopting LDA topic model to extract video of the present invention.
Detailed description of the invention
Below in conjunction with Figure of description and embodiment, the present invention is further qualified, but is not limited to this.
Embodiment
A kind of computational methods of video similarity, concrete steps include:
(1) the preliminary text feature of video A is extracted
1. the brief introduction of described video A is carried out Chinese word segmentation;
2. the frequency of each participle that 1. calculation procedure obtains, computing formula is such as shown in formula I:
β a , d = c o u n t ( a , d ) c o u n t ( d ) - - - ( I )
In formula I, ��a,dReferring to participle a frequency in the brief introduction d of video A, (a, d) refers to the participle a number of times occurred in the brief introduction d of video A to count, and count (d) refers to the quantity of all participles in the brief introduction d of video A;
3. the participle a inverse document frequency �� occurred in the brief introduction C of all videos in whole data base is calculateda,C, computing formula is such as shown in formula II:
β a , C = l o g ( n c o u n t ( a , C ) ) - - - ( I I ) ;
In formula II, n refers to that the sum of the brief introduction C of all videos in whole data base, count (a, C) refer to the quantity of the brief introduction of the video occurring participle a in the brief introduction C of all videos in whole data base;
Step is 3. in order to punish the vocabulary that in the brief introduction C of all videos in whole data base, the frequency of occurrences is higher, the effect in the brief introduction of certain video of frequency this vocabulary of more high explanation is more poor, such as " " this word, the number of times occurred in the brief introduction C of all videos in whole data base is a lot, and the contribution of the brief introduction of video is more little.
4. participle a weight beta in the brief introduction of described video A is calculated��, computing formula is such as shown in formula III:
����=����,d*����,C(��)
5. the preliminary text feature of video A is calculated: ��A={ a: ��a,b:��b... ... }; Wherein, and a, b ... ... } refer to all participles of described video A, { ��a,��b... ... } refer to the weight that all participles of described video A are corresponding;
(2) LDA topic model is adopted to extract the recessive character of video A
6. the brief introduction of described video A is carried out Chinese word segmentation;
7. all participles step 6. obtained are placed on corpus;
8. described corpus step 7. obtained inputs described LDA topic model, it is intended that theme number, output: video A degree of association V on each designated keytvAnd the degree of association V that all participles is on each designated keyat; Such as, table 1 and table 2;
Table 1
Table 2
9. participle a weight �� in the brief introduction of video A is calculateda, computing formula is such as shown in formula IV:
��a=Vat*Vtv(��)
10. the recessive character calculating video A is ��A={ a: ��a,b:��b... ... }, wherein, a, b ... ... } refer to all participles of described video A, { ��a,��b... ... } refer to the weight that all participles of described video A are corresponding;
(3) the text feature v of video A is calculatedA, shown in computing formula such as formula (V):
vA=�� ��A*(1-��)��A(��)
In formula (V), �� be similar video conversion ratio maximum time value;
(4) the B text feature v of video is calculated by step (1)-(3)B, and calculate the similarity between video A, video B, computing formula is such as shown in formula VI:
s i m ( v A , v B ) = c o s ( v A → , v B → ) = v A → · v B → | v A → | * | v B → | - - - ( V I ) .
The FB(flow block) of the computational methods of described video similarity is as shown in Figure 1;
The flow chart of the described recessive character adopting LDA topic model to extract video is as shown in Figure 2.

Claims (1)

1. the computational methods of a video similarity, it is characterised in that concrete steps include:
(1) the preliminary text feature of video A is extracted
1. the brief introduction of described video A is carried out Chinese word segmentation;
2. the frequency of each participle that 1. calculation procedure obtains, computing formula is such as shown in formula I:
β a , d = c o u n t ( a , d ) c o u n t ( d ) (��)
In formula I, ��a,dReferring to participle a frequency in the brief introduction d of video A, (a, d) refers to the participle a number of times occurred in the brief introduction d of video A to count, and count (d) refers to the quantity of all participles in the brief introduction d of video A;
3. the participle a inverse document frequency �� occurred in the brief introduction C of all videos in whole data base is calculateda,C, computing formula is such as shown in formula II:
β a , C = l o g ( n c o u n t ( a , C ) ) (II);
In formula II, n refers to that the sum of the brief introduction C of all videos in whole data base, count (a, C) refer to the quantity of the brief introduction of the video occurring participle a in the brief introduction C of all videos in whole data base;
4. participle a weight beta in the brief introduction of described video A is calculated��, computing formula is such as shown in formula III:
����=����,d*����,C(��)
5. the preliminary text feature of video A is calculated: ��A={ a: ��a,b:��b... ... }; Wherein, and a, b ... ... } refer to all participles of described video A, { ��a,��b... ... } refer to the weight that all participles of described video A are corresponding;
(2) LDA topic model is adopted to extract the recessive character of video A
6. the brief introduction of described video A is carried out Chinese word segmentation;
7. all participles step 6. obtained are placed on corpus;
8. described corpus step 7. obtained inputs described LDA topic model, it is intended that theme number, output: video A degree of association V on each designated keytvAnd the degree of association V that all participles is on each designated keyat;
9. participle a weight �� in the brief introduction of video A is calculateda, computing formula is such as shown in formula IV:
��a=Vat*Vtv(��)
10. the recessive character calculating video A is ��A={ a: ��a,b:��b... ... }, wherein, a, b ... ... } refer to all participles of described video A, { ��a,��b... ... } refer to the weight that all participles of described video A are corresponding;
(3) the text feature v of video A is calculatedA, shown in computing formula such as formula (V):
vA=�� ��A*(1-��)��A(��)
In formula (V), �� be similar video conversion ratio maximum time value;
(4) the B text feature v of video is calculated by step (1)-(3)B, and calculate the similarity between video A, video B, computing formula is such as shown in formula VI:
s i m ( v A , v B ) = c o s ( v A → , v B → ) = v A → · v B → | v A → | * | v B → | (��)��
CN201511008475.6A 2015-12-29 2015-12-29 Method for calculating video similarity Pending CN105654125A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201511008475.6A CN105654125A (en) 2015-12-29 2015-12-29 Method for calculating video similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201511008475.6A CN105654125A (en) 2015-12-29 2015-12-29 Method for calculating video similarity

Publications (1)

Publication Number Publication Date
CN105654125A true CN105654125A (en) 2016-06-08

Family

ID=56477121

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201511008475.6A Pending CN105654125A (en) 2015-12-29 2015-12-29 Method for calculating video similarity

Country Status (1)

Country Link
CN (1) CN105654125A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126619A (en) * 2016-06-20 2016-11-16 中山大学 A kind of video retrieval method based on video content and system
CN106599182A (en) * 2016-12-13 2017-04-26 飞狐信息技术(天津)有限公司 Feature engineering recommendation method and device based on spark streaming real-time streams and video website
CN111133453A (en) * 2017-08-04 2020-05-08 诺基亚技术有限公司 Artificial neural network
CN111897999A (en) * 2020-07-27 2020-11-06 九江学院 LDA-based deep learning model construction method for video recommendation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110270845A1 (en) * 2010-04-29 2011-11-03 International Business Machines Corporation Ranking Information Content Based on Performance Data of Prior Users of the Information Content
US20110302124A1 (en) * 2010-06-08 2011-12-08 Microsoft Corporation Mining Topic-Related Aspects From User Generated Content
CN102640152A (en) * 2009-12-09 2012-08-15 国际商业机器公司 Method of searching for document data files based on keywords, and computer system and computer program thereof
CN103164463A (en) * 2011-12-16 2013-06-19 国际商业机器公司 Method and device for recommending labels
US20140129210A1 (en) * 2012-11-06 2014-05-08 Palo Alto Research Center Incorporated System And Method For Extracting And Reusing Metadata To Analyze Message Content

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102640152A (en) * 2009-12-09 2012-08-15 国际商业机器公司 Method of searching for document data files based on keywords, and computer system and computer program thereof
US20110270845A1 (en) * 2010-04-29 2011-11-03 International Business Machines Corporation Ranking Information Content Based on Performance Data of Prior Users of the Information Content
US20110302124A1 (en) * 2010-06-08 2011-12-08 Microsoft Corporation Mining Topic-Related Aspects From User Generated Content
CN103164463A (en) * 2011-12-16 2013-06-19 国际商业机器公司 Method and device for recommending labels
US20140129210A1 (en) * 2012-11-06 2014-05-08 Palo Alto Research Center Incorporated System And Method For Extracting And Reusing Metadata To Analyze Message Content

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈晓美: "网络评论观点知识发现研究", 《中国博士学位论文全文数据库信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126619A (en) * 2016-06-20 2016-11-16 中山大学 A kind of video retrieval method based on video content and system
CN106599182A (en) * 2016-12-13 2017-04-26 飞狐信息技术(天津)有限公司 Feature engineering recommendation method and device based on spark streaming real-time streams and video website
CN106599182B (en) * 2016-12-13 2019-10-11 飞狐信息技术(天津)有限公司 Feature Engineering recommended method and device, video website based on spark streaming real-time streams
CN111133453A (en) * 2017-08-04 2020-05-08 诺基亚技术有限公司 Artificial neural network
CN111897999A (en) * 2020-07-27 2020-11-06 九江学院 LDA-based deep learning model construction method for video recommendation
CN111897999B (en) * 2020-07-27 2023-06-16 九江学院 Deep learning model construction method for video recommendation and based on LDA

Similar Documents

Publication Publication Date Title
CN109299994B (en) Recommendation method, device, equipment and readable storage medium
CN104951518B (en) One kind recommends method based on the newer context of dynamic increment
CN106095749A (en) A kind of text key word extracting method based on degree of depth study
CN108334489B (en) Text core word recognition method and device
CN104199874A (en) Webpage recommendation method based on user browsing behaviors
CN104850617B (en) Short text processing method and processing device
CN105243087A (en) IT (Information Technology) information aggregation reading personalized recommendation method
CN105654125A (en) Method for calculating video similarity
CN110851731B (en) Collaborative filtering recommendation method for user attribute coupling similarity and interest semantic similarity
CN109145180B (en) Enterprise hot event mining method based on incremental clustering
CN110851700B (en) Probability matrix decomposition cold start recommendation method integrating attributes and semantics
CN106600213B (en) Intelligent management system and method for personal resume
Nhlabano et al. Impact of text pre-processing on the performance of sentiment analysis models for social media data
CN103970801A (en) Method and device for recognizing microblog advertisement blog articles
CN105701182A (en) Information pushing method and apparatus
CN112084320A (en) Test question recommendation method and device and intelligent equipment
CN108664558A (en) A kind of Web TV personalized ventilation system method towards large-scale consumer
CN107688621B (en) Method and system for optimizing file
CN103714120A (en) System for extracting interesting topics from url (uniform resource locator) access records of users
CN113627797B (en) Method, device, computer equipment and storage medium for generating staff member portrait
CN102306178A (en) Video recommendation method and device
CN108932247A (en) A kind of method and device optimizing text search
CN108830735B (en) Online interpersonal relationship analysis method and system
US20220222715A1 (en) System and method for detecting and analyzing discussion points from written reviews
CN110597982A (en) Short text topic clustering algorithm based on word co-occurrence network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160608