Embodiment
The present invention is described in detail below in conjunction with Fig. 1.Advertisement crowd screening technique based on Look-alike of the present invention comprises:
The first step, is mapped as X dimension label vector by the video tab of video media, afterwards by adding up rear average by all label vectors of video, the X obtaining each video ties up video vector.
Video media generally all has the video tab resource of 1,000,000 grades, use the degree of deep learning tool Word2Vec of google that each video tab (namely video author is the word of the energy reflecting video theme of Video summary) is mapped as X dimensional vector, and the similarity in vector space can be used for representing video tab similarity semantically.The value of X parameter be generally 10 to 200 between (may there is dimension disaster too greatly in value, and computation complexity is too high, value is too little may not express complete semantic space, and concrete value can get optimal value according to after test of many times), X parameter value 20 in the present embodiment.
Because each video can have one or more video tab, these labels represent related content or the watching focus of this video, general, label on a video is often semantically comparatively similar, so the vector of all labels on a video can be aggregated into a vector by the mode of progressive mean, the value of the respective dimensions of all label vectors of each video carried out cumulative rear average, the X finally obtaining each video ties up video vector.
Second step, carries out cluster to video, obtains similar video cluster result.
Video in each cluster is that similar, the interior perhaps theme of label is close, because the data volume of video is huge, and cluster process needs to carry out Similarity Measure to video vector, we adopt the K-Means algorithm in Distributed Computing Platform Spark in MLlib assembly to complete this cluster process, the value of K cluster numbers depends on the circumstances, in the present embodiment K cluster numbers value 10000.
3rd step, is converted to similar users cluster result by similar video cluster result.
The video that user likes oneself often stays " viewing ", " subscription ", " comment ", and user behaviors logs such as " tops ", these user behaviors logs have built the relation bridge between user and video.By collecting these user behaviors logs, similar video cluster result is converted to similar users cluster result, the user in each user clustering result is that interest is similar, viewing custom is close.
4th step, extracts cluster result from seed user, carries out sequencing of similarity, thus determines user's rank.
After obtaining seed user from advertiser, found N number of cluster at seed user place by similar users cluster result, the user of N number of cluster is extracted, carry out sequencing of similarity, determine that the preceding user of rank is the potential customers satisfied the demands, and carry out advertisement putting.
Below by concrete example, the present invention is described.
Example one, expands the example of screening to 3C crowd's seed
Advertiser plan throw in advertisement to 3C crowd, it provides a small amount of 3C crowd's seed cookie, 3C crowd be a class to interested class higher-end crowds such as science and technology, communication, IT electronic products, this kind of crowd be this advertiser wish orientation target.In the scientific and technological channel of video media, the label of most of video is all relevant to computing machine, communication, electronic product etc., as Microsoft, millet, hammer mobile phone, iphone6, Nexus6, robot etc.The first step, by word2vec by video tab vectorization, the value of the respective dimensions of all label vectors of each video is carried out cumulative rear average, the X finally obtaining each video ties up video vector, the vector similarity of the label that classification is close is high, and the vector similarity of such as millet and hammer mobile phone is much larger than the vector similarity of millet and tourism.Second step, carries out cluster to video, classification or the close video of theme is classified as a class, obtains the cluster result of similar video.3rd step, by the user behaviors log of user in video media, similar video cluster is converted to similar users cluster, the user such as paying close attention to most portable computer is classified as a class, likes the user of automobile to be classified as another kind of.4th step, finds the cluster belonging to these crowds from 3C crowd's seed user, and all users in cluster are done a descending sort, obtains the advertisement crowd similar to 3C seed crowd according to ranking results.
Example two: example tour-pioneers seed being expanded to screening
Advertiser's plan throws in advertisement to tour-pioneers, it provides a small amount of tour-pioneers seed cookie, this kind of crowd has deep love for tourism, pursues a class people of quality of the life, advertiser wish to be directed to more heterogeneous like this type of crowd to reach the object of marketing or brand effect.
In the tourism channel of video media, the label of most of video is all and tourism, and external life etc. are correlated with, and as travelling, overseas trip, avoids heat, and visits, Jiu Zhaigou etc.The first step, by word2vec by video tab vectorization, the value of the respective dimensions of all label vectors of each video carried out cumulative rear average, the X finally obtaining each video ties up video vector.All label lists in video media are shown as the vector of fixing dimension, and the vector similarity of the label that classification is close is high, and the vector similarity of such as Jiu Zhaigou and Zhangjiajie is much larger than the vector similarity of Jiu Zhaigou and animation.Second step, completes cluster to video, and classification or the close video of theme are classified as a class, and such as relevant to scenic spots and historical sites video is got together.3rd step, is converted to similar users cluster by the user behaviors log of user in video media by similar video cluster, such as pays close attention to tourist attractions user and is classified as a class, like the user of animation to be classified as another kind of.4th step, finds the cluster belonging to these crowds from tour-pioneers seed user, and all users in cluster are done a descending sort, obtains the advertisement crowd similar to the seed crowd that travels according to ranking results.
After detailed description preferred embodiment of the present invention; those of ordinary skill in the art can clearly understand; various change and change can be carried out under the protection domain not departing from claim of enclosing and spirit, and the present invention is not also limited to the embodiment of examples cited embodiment in instructions.