CN105654125A

CN105654125A - Method for calculating video similarity

Info

Publication number: CN105654125A
Application number: CN201511008475.6A
Authority: CN
Inventors: 邢建平; 田欣玉; 宋宪明; 刘绪
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2015-12-29
Filing date: 2015-12-29
Publication date: 2016-06-08

Abstract

The invention relates to a method for calculating video similarity. Particularly the method comprises the steps of (1) extracting the preliminary text characteristic of a video A; (2), extracting the recessive characteristic of the video A through an LDA topic model; and (3) calculating the text characteristic vA of the video A, calculating the text characteristic vB of a video B through steps (1)-(3), and calculating the similarity between the video A and the video B. According to the method of the invention, a user characteristic portrait is constructed through analyzing historical data of a user. For realizing a purpose of high efficiency, the user characteristic portrait is constructed through calculation of an offline processing device. Therefore the user characteristic can be periodically obtained. Finally the user characteristic portrait is combined in a related video recommendation, thereby realizing a purpose of individual recommendation. The method of the invention has advantages of correcting a traditional similar video calculating method according to user comments, improving conversion rate of related videos, improving user experience and creating tremendous economic benefit for video suppliers.

Description

A kind of computational methods of video similarity

Technical field

The present invention relates to the computational methods of a kind of video similarity, belong to computer data digging technology field.

Background technology

Along with the fast development of Internet technology, at big data age, people can obtain and get more and more with the video resource contacted, but find the time that the video oneself liked spends also to get more and more. In video portal website, there is the service that associated video is recommended in capital, more video resource can be presented to user by this service, and video recommendations technology has been widely applied in Online Video system, and wherein associated video is recommended to have changed into one of main path that the user discover that video. Specifically, after user enters into the details page of a video or a video-see terminates, system can show some list of videos relevant to this video. Improve the video click rate of user, and in certain degree, improve the paying conversion ratio of user. It is improve the part that personalized service is indispensable that associated video calculates.

Usually, alternative videos is ranked up by the label that associated video promotion expo foundation and target video match, and some is based on the number of tags matched, and some is then based on the tag match algorithm of Weight. The computational methods of existing similar video are many to be calculated based on video, is not weighted from the dimension of user. Hisense's TV user daily record data by analysis, the conversion ratio of associated video, less than 10%, is learnt by analysis, and the computational methods of the similar video used by inline system are comparatively single, the theme of video is not weighted with the comment of user, result in the conversion ratio of similar video relatively low.

Summary of the invention

For the deficiencies in the prior art, the invention provides the computational methods of a kind of video similarity;

The present invention builds user characteristics portrait by the historical data (behavioral datas of viewing, comment etc.) of user is analyzed, in order to reach efficient purpose, it is good in advance that what user characteristics was drawn a portrait is established by processed offline device, therefore, can periodically obtain user characteristics, finally user characteristics portrait is dissolved in associated video recommendation, thus having reached the purpose of personalized recommendation.

Traditional similar video computational methods are modified by the present invention according to the comment of user, while improving associated video conversion ratio and promoting Consumer's Experience, bring huge economic benefit also to video supplier.

Terminological interpretation

Text feature, refers to the ultimate unit for representing text;

The technical scheme is that

A kind of computational methods of video similarity, concrete steps include:

(1) the preliminary text feature of video A is extracted

1. the brief introduction of described video A is carried out Chinese word segmentation;

2. the frequency of each participle that 1. calculation procedure obtains, computing formula is such as shown in formula I:

β_{a, d} = \frac{c o u n t (a, d)}{c o u n t (d)} - - - (I)

In formula I, ��_a,dReferring to participle a frequency in the brief introduction d of video A, (a, d) refers to the participle a number of times occurred in the brief introduction d of video A to count, and count (d) refers to the quantity of all participles in the brief introduction d of video A;

3. the participle a inverse document frequency �� occurred in the brief introduction C of all videos in whole data base is calculated_a,C, computing formula is such as shown in formula II:

β_{a, C} = l o g (\frac{n}{c o u n t (a, C)}) - - - (I I);

In formula II, n refers to that the sum of the brief introduction C of all videos in whole data base, count (a, C) refer to the quantity of the brief introduction of the video occurring participle a in the brief introduction C of all videos in whole data base;

Step is 3. in order to punish the vocabulary that in the brief introduction C of all videos in whole data base, the frequency of occurrences is higher, the effect in the brief introduction of certain video of frequency this vocabulary of more high explanation is more poor, such as " " this word, the number of times occurred in the brief introduction C of all videos in whole data base is a lot, and the contribution of the brief introduction of video is more little.

4. participle a weight beta in the brief introduction of described video A is calculated_��, computing formula is such as shown in formula III:

��_��=��_��,d*��_��,C(��)

5. the preliminary text feature of video A is calculated: ��_A={ a: ��_a,b:��_b... ... }; Wherein, and a, b ... ... } refer to all participles of described video A, { ��_a,��_b... ... } refer to the weight that all participles of described video A are corresponding;

(2) LDA topic model is adopted to extract the recessive character of video A

6. the brief introduction of described video A is carried out Chinese word segmentation;

7. all participles step 6. obtained are placed on corpus;

8. described corpus step 7. obtained inputs described LDA topic model, it is intended that theme number, output: video A degree of association V on each designated key_tvAnd the degree of association V that all participles is on each designated key_at; Such as, table 1 and table 2;

Table 1

Table 2

9. participle a weight �� in the brief introduction of video A is calculated_a, computing formula is such as shown in formula IV:

��_a=V_at*V_tv(��)

10. the recessive character calculating video A is ��_A={ a: ��_a,b:��_b... ... }, wherein, a, b ... ... } refer to all participles of described video A, { ��_a,��_b... ... } refer to the weight that all participles of described video A are corresponding;

(3) the text feature v of video A is calculated_A, shown in computing formula such as formula (V):

v_A=�� _A*(1-��)��_A(��)

In formula (V), �� be similar video conversion ratio maximum time value;

(4) the B text feature v of video is calculated by step (1)-(3)_B, and calculate the similarity between video A, video B, computing formula is such as shown in formula VI:

s i m (v_{A}, v_{B}) = c o s (\overset{&RightArrow;}{v_{A}}, \overset{&RightArrow;}{v_{B}}) = \frac{\overset{&RightArrow;}{v_{A}} \cdot \overset{&RightArrow;}{v_{B}}}{| \overset{&RightArrow;}{v_{A}} | * | \overset{&RightArrow;}{v_{B}} |} - - - (V I) .

The invention have the benefit that

1, the present invention builds user characteristics portrait by the historical data (behavioral datas of viewing, comment etc.) of user is analyzed, in order to reach efficient purpose, it is good in advance that what user characteristics was drawn a portrait is established by processed offline device, therefore, can periodically obtain user characteristics, finally user characteristics portrait is dissolved in associated video recommendation, thus having reached the purpose of personalized recommendation.

2, traditional similar video computational methods are modified by the present invention according to the comment of user, while improving associated video conversion ratio and promoting Consumer's Experience, bring huge economic benefit also to video supplier.

Accompanying drawing explanation

Fig. 1 is the FB(flow block) of the computational methods of video similarity of the present invention;

Fig. 2 is the flow chart of the recessive character adopting LDA topic model to extract video of the present invention.

Detailed description of the invention

Below in conjunction with Figure of description and embodiment, the present invention is further qualified, but is not limited to this.

Embodiment

A kind of computational methods of video similarity, concrete steps include:

(1) the preliminary text feature of video A is extracted

β_{a, d} = \frac{c o u n t (a, d)}{c o u n t (d)} - - - (I)

β_{a, C} = l o g (\frac{n}{c o u n t (a, C)}) - - - (I I);

��_��=��_��,d*��_��,C(��)

(2) LDA topic model is adopted to extract the recessive character of video A

7. all participles step 6. obtained are placed on corpus;

Table 1

Table 2

��_a=V_at*V_tv(��)

v_A=�� _A*(1-��)��_A(��)

In formula (V), �� be similar video conversion ratio maximum time value;

s i m (v_{A}, v_{B}) = c o s (\overset{&RightArrow;}{v_{A}}, \overset{&RightArrow;}{v_{B}}) = \frac{\overset{&RightArrow;}{v_{A}} \cdot \overset{&RightArrow;}{v_{B}}}{| \overset{&RightArrow;}{v_{A}} | * | \overset{&RightArrow;}{v_{B}} |} - - - (V I) .

The FB(flow block) of the computational methods of described video similarity is as shown in Figure 1;

The flow chart of the described recessive character adopting LDA topic model to extract video is as shown in Figure 2.

Claims

1. the computational methods of a video similarity, it is characterised in that concrete steps include:

(1) the preliminary text feature of video A is extracted

β_{a, d} = \frac{c o u n t (a, d)}{c o u n t (d)}

(��)

β_{a, C} = l o g (\frac{n}{c o u n t (a, C)})

(II);

��_��=��_��,d*��_��,C(��)

(2) LDA topic model is adopted to extract the recessive character of video A

7. all participles step 6. obtained are placed on corpus;

8. described corpus step 7. obtained inputs described LDA topic model, it is intended that theme number, output: video A degree of association V on each designated key_tvAnd the degree of association V that all participles is on each designated key_at;

��_a=V_at*V_tv(��)

v_A=�� _A*(1-��)��_A(��)

In formula (V), �� be similar video conversion ratio maximum time value;

s i m (v_{A}, v_{B}) = c o s (\overset{&RightArrow;}{v_{A}}, \overset{&RightArrow;}{v_{B}}) = \frac{\overset{&RightArrow;}{v_{A}} \cdot \overset{&RightArrow;}{v_{B}}}{| \overset{&RightArrow;}{v_{A}} | * | \overset{&RightArrow;}{v_{B}} |}

(��)��