CN104751354A - Advertisement cluster screening method - Google Patents

Advertisement cluster screening method Download PDF

Info

Publication number
CN104751354A
CN104751354A CN201510172689.0A CN201510172689A CN104751354A CN 104751354 A CN104751354 A CN 104751354A CN 201510172689 A CN201510172689 A CN 201510172689A CN 104751354 A CN104751354 A CN 104751354A
Authority
CN
China
Prior art keywords
video
user
similar
vector
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510172689.0A
Other languages
Chinese (zh)
Other versions
CN104751354B (en
Inventor
雷龙艳
章岑
朱凯泉
房晓宇
江建博
潘柏宇
卢述奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Heyi Information Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Heyi Information Technology Beijing Co ltd filed Critical Heyi Information Technology Beijing Co ltd
Priority to CN201510172689.0A priority Critical patent/CN104751354B/en
Publication of CN104751354A publication Critical patent/CN104751354A/en
Application granted granted Critical
Publication of CN104751354B publication Critical patent/CN104751354B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an advertisement cluster screening method. The advertisement cluster screening method includes the steps that firstly, based on abundant videos and video label resources, similar videos are clustered; secondly according to behavioral habits, such as watching and booking, acted on the videos by users, a video clustering result is converted into a user clustering result so as to find out user clusters corresponding to a seed; finally, users in the user clusters are selected and ranked to find out potential customers meeting requirements. The advertisement cluster screening method has the advantages that the shortcoming of weak video media user information is overcome, the abundant videos and video label resources are taken full advantage of, and direct searching of similar users is changed to video and video label clustering.

Description

A kind of advertisement crowd screening technique
Technical field
The present invention relates to a kind of advertisement crowd screening technique.
Background technology
In advertisement crowd orientation, a kind of existing method is the seed user information provided according to advertiser, in conjunction with the data that advertising platform is abundanter, for advertiser searches out potential customers similar in behavior.But the primordial seed customer volume that advertiser provides relatively very little, cannot meet the demand that advertiser contacts potential user, therefore can not only rely on seed user to deliver advertisement.
Particularly in the advertisement putting of video media field, because Guest User need not register i.e. having video capable of being watched, and the quantity of information of registered user is also relatively less, therefore, the user profile that video media field obtains relative to Taobao, Jingdone district class shopping website more weak.So, the seed user directly using advertiser to provide to find similar users then precision be comparatively short of.How effectively to throw in advertisement at video media field and become problem demanding prompt solution.
Summary of the invention
We notice, the sharpest edges of video media field are to have abundant video resource, and video have abundant and video tab accurately, and meanwhile, the viewing custom of user can hint obliquely at out by video tab.Therefore, we make full use of the advantage of video media platform, directly do not look for similar users by seed user, but based on abundant video and video tab resource, first cluster is carried out to similar video, then be user clustering result to behavioural habits such as the viewing subscription of video by Video clustering results conversion by user, now just can find the user clustering corresponding with seed, finally the user in these clusters be carried out extracting sequence and find the potential customers satisfied the demands.
By the method that the present invention proposes, the defect that video media user profile is more weak can being avoided, make full use of the video of video media and the abundant advantage of label resources, being converted to cluster first to video and label by directly finding similar users.The present invention also makes full use of the user behaviors log of user in video media to build the relation bridge of user and video or label, and by this relation, the video can liked by user or label classification find the mapping relations of similar video and similar users.
Accompanying drawing explanation
The present invention further describes with reference to the accompanying drawings, wherein:
Fig. 1 is the process flow diagram of the inventive method.
Embodiment
The present invention is described in detail below in conjunction with Fig. 1.Advertisement crowd screening technique based on Look-alike of the present invention comprises:
The first step, is mapped as X dimension label vector by the video tab of video media, afterwards by adding up rear average by all label vectors of video, the X obtaining each video ties up video vector.
Video media generally all has the video tab resource of 1,000,000 grades, use the degree of deep learning tool Word2Vec of google that each video tab (namely video author is the word of the energy reflecting video theme of Video summary) is mapped as X dimensional vector, and the similarity in vector space can be used for representing video tab similarity semantically.The value of X parameter be generally 10 to 200 between (may there is dimension disaster too greatly in value, and computation complexity is too high, value is too little may not express complete semantic space, and concrete value can get optimal value according to after test of many times), X parameter value 20 in the present embodiment.
Because each video can have one or more video tab, these labels represent related content or the watching focus of this video, general, label on a video is often semantically comparatively similar, so the vector of all labels on a video can be aggregated into a vector by the mode of progressive mean, the value of the respective dimensions of all label vectors of each video carried out cumulative rear average, the X finally obtaining each video ties up video vector.
Second step, carries out cluster to video, obtains similar video cluster result.
Video in each cluster is that similar, the interior perhaps theme of label is close, because the data volume of video is huge, and cluster process needs to carry out Similarity Measure to video vector, we adopt the K-Means algorithm in Distributed Computing Platform Spark in MLlib assembly to complete this cluster process, the value of K cluster numbers depends on the circumstances, in the present embodiment K cluster numbers value 10000.
3rd step, is converted to similar users cluster result by similar video cluster result.
The video that user likes oneself often stays " viewing ", " subscription ", " comment ", and user behaviors logs such as " tops ", these user behaviors logs have built the relation bridge between user and video.By collecting these user behaviors logs, similar video cluster result is converted to similar users cluster result, the user in each user clustering result is that interest is similar, viewing custom is close.
4th step, extracts cluster result from seed user, carries out sequencing of similarity, thus determines user's rank.
After obtaining seed user from advertiser, found N number of cluster at seed user place by similar users cluster result, the user of N number of cluster is extracted, carry out sequencing of similarity, determine that the preceding user of rank is the potential customers satisfied the demands, and carry out advertisement putting.
Below by concrete example, the present invention is described.
Example one, expands the example of screening to 3C crowd's seed
Advertiser plan throw in advertisement to 3C crowd, it provides a small amount of 3C crowd's seed cookie, 3C crowd be a class to interested class higher-end crowds such as science and technology, communication, IT electronic products, this kind of crowd be this advertiser wish orientation target.In the scientific and technological channel of video media, the label of most of video is all relevant to computing machine, communication, electronic product etc., as Microsoft, millet, hammer mobile phone, iphone6, Nexus6, robot etc.The first step, by word2vec by video tab vectorization, the value of the respective dimensions of all label vectors of each video is carried out cumulative rear average, the X finally obtaining each video ties up video vector, the vector similarity of the label that classification is close is high, and the vector similarity of such as millet and hammer mobile phone is much larger than the vector similarity of millet and tourism.Second step, carries out cluster to video, classification or the close video of theme is classified as a class, obtains the cluster result of similar video.3rd step, by the user behaviors log of user in video media, similar video cluster is converted to similar users cluster, the user such as paying close attention to most portable computer is classified as a class, likes the user of automobile to be classified as another kind of.4th step, finds the cluster belonging to these crowds from 3C crowd's seed user, and all users in cluster are done a descending sort, obtains the advertisement crowd similar to 3C seed crowd according to ranking results.
Example two: example tour-pioneers seed being expanded to screening
Advertiser's plan throws in advertisement to tour-pioneers, it provides a small amount of tour-pioneers seed cookie, this kind of crowd has deep love for tourism, pursues a class people of quality of the life, advertiser wish to be directed to more heterogeneous like this type of crowd to reach the object of marketing or brand effect.
In the tourism channel of video media, the label of most of video is all and tourism, and external life etc. are correlated with, and as travelling, overseas trip, avoids heat, and visits, Jiu Zhaigou etc.The first step, by word2vec by video tab vectorization, the value of the respective dimensions of all label vectors of each video carried out cumulative rear average, the X finally obtaining each video ties up video vector.All label lists in video media are shown as the vector of fixing dimension, and the vector similarity of the label that classification is close is high, and the vector similarity of such as Jiu Zhaigou and Zhangjiajie is much larger than the vector similarity of Jiu Zhaigou and animation.Second step, completes cluster to video, and classification or the close video of theme are classified as a class, and such as relevant to scenic spots and historical sites video is got together.3rd step, is converted to similar users cluster by the user behaviors log of user in video media by similar video cluster, such as pays close attention to tourist attractions user and is classified as a class, like the user of animation to be classified as another kind of.4th step, finds the cluster belonging to these crowds from tour-pioneers seed user, and all users in cluster are done a descending sort, obtains the advertisement crowd similar to the seed crowd that travels according to ranking results.
After detailed description preferred embodiment of the present invention; those of ordinary skill in the art can clearly understand; various change and change can be carried out under the protection domain not departing from claim of enclosing and spirit, and the present invention is not also limited to the embodiment of examples cited embodiment in instructions.

Claims (6)

1. an advertisement crowd screening technique, is characterized in that:
The first step, is mapped as X dimension label vector by the video tab of video media, afterwards by adding up rear average by all label vectors of video, the X obtaining each video ties up video vector;
Second step, carries out cluster to video, obtains similar video cluster result;
3rd step, is converted to similar users cluster result by similar video cluster result;
4th step, extracts cluster result from seed user, carries out sequencing of similarity, thus determines user's rank.
2. the method for claim 1, wherein use the degree of deep learning tool Word2Vec of google that each video tab is mapped as X dimensional vector in first step.
3. the method for claim 1, wherein the parameter value of X dimension label vector is generally 10-200 in first step.
4. method as claimed in claim 3, wherein, in first step, the parameter value of X dimension label vector can be 20.
5. the method for claim 1, wherein carrying out Similarity Measure to the process need that video carries out cluster to video vector in second step, is adopt the K-Means algorithm in Distributed Computing Platform Spark in MLlib assembly to complete.
6. the method for claim 1, wherein by collecting user behaviors log, similar video cluster result is converted to similar users cluster result in third step, described user behaviors log comprises " viewing ", " subscription ", " comment ", " top ".
CN201510172689.0A 2015-04-13 2015-04-13 A kind of advertisement crowd screening technique Expired - Fee Related CN104751354B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510172689.0A CN104751354B (en) 2015-04-13 2015-04-13 A kind of advertisement crowd screening technique

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510172689.0A CN104751354B (en) 2015-04-13 2015-04-13 A kind of advertisement crowd screening technique

Publications (2)

Publication Number Publication Date
CN104751354A true CN104751354A (en) 2015-07-01
CN104751354B CN104751354B (en) 2018-06-26

Family

ID=53590984

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510172689.0A Expired - Fee Related CN104751354B (en) 2015-04-13 2015-04-13 A kind of advertisement crowd screening technique

Country Status (1)

Country Link
CN (1) CN104751354B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105427129A (en) * 2015-11-12 2016-03-23 腾讯科技(深圳)有限公司 Information delivery method and system
CN107220852A (en) * 2017-05-26 2017-09-29 北京小度信息科技有限公司 Method, device and server for determining target recommended user
CN107886354A (en) * 2017-10-31 2018-04-06 广州云移信息科技有限公司 A kind of method and system for determining marketing target colony
CN108062555A (en) * 2016-11-08 2018-05-22 南京理工大学 Monitoring data early warning system based on Spark streamings cluster
CN108122123A (en) * 2016-11-29 2018-06-05 华为技术有限公司 A kind of method and device for extending potential user
WO2018113370A1 (en) * 2016-12-21 2018-06-28 华为技术有限公司 Method, device, and system for increasing users
CN108415913A (en) * 2017-02-09 2018-08-17 周孟 Crowd's orientation method based on uncertain neighbours
CN109903086A (en) * 2019-02-14 2019-06-18 北京奇艺世纪科技有限公司 A kind of similar crowd's extended method, device and electronic equipment
CN112967100A (en) * 2021-04-02 2021-06-15 杭州网易云音乐科技有限公司 Similar population expansion method, device, computing equipment and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002189924A (en) * 2000-12-21 2002-07-05 Bits Wave Online:Kk Information distributing method, information distribution relay system, and information distribution system
CN103838885A (en) * 2014-03-31 2014-06-04 苏州大学 Advertisement-putting-oriented potential user searching and user model ordering method
CN104462378A (en) * 2014-12-09 2015-03-25 北京国双科技有限公司 Data processing method and device for text recognition

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002189924A (en) * 2000-12-21 2002-07-05 Bits Wave Online:Kk Information distributing method, information distribution relay system, and information distribution system
CN103838885A (en) * 2014-03-31 2014-06-04 苏州大学 Advertisement-putting-oriented potential user searching and user model ordering method
CN104462378A (en) * 2014-12-09 2015-03-25 北京国双科技有限公司 Data processing method and device for text recognition

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105427129A (en) * 2015-11-12 2016-03-23 腾讯科技(深圳)有限公司 Information delivery method and system
CN105427129B (en) * 2015-11-12 2020-09-04 腾讯科技(深圳)有限公司 Information delivery method and system
CN108062555A (en) * 2016-11-08 2018-05-22 南京理工大学 Monitoring data early warning system based on Spark streamings cluster
CN108122123A (en) * 2016-11-29 2018-06-05 华为技术有限公司 A kind of method and device for extending potential user
CN108122123B (en) * 2016-11-29 2021-08-20 华为技术有限公司 Method and device for expanding potential users
WO2018113370A1 (en) * 2016-12-21 2018-06-28 华为技术有限公司 Method, device, and system for increasing users
CN108230001A (en) * 2016-12-21 2018-06-29 华为技术有限公司 The method, apparatus and system of extending user
CN108415913A (en) * 2017-02-09 2018-08-17 周孟 Crowd's orientation method based on uncertain neighbours
CN107220852A (en) * 2017-05-26 2017-09-29 北京小度信息科技有限公司 Method, device and server for determining target recommended user
CN107886354A (en) * 2017-10-31 2018-04-06 广州云移信息科技有限公司 A kind of method and system for determining marketing target colony
CN109903086A (en) * 2019-02-14 2019-06-18 北京奇艺世纪科技有限公司 A kind of similar crowd's extended method, device and electronic equipment
CN109903086B (en) * 2019-02-14 2020-12-18 北京奇艺世纪科技有限公司 Similar crowd expansion method and device and electronic equipment
CN112967100A (en) * 2021-04-02 2021-06-15 杭州网易云音乐科技有限公司 Similar population expansion method, device, computing equipment and medium
CN112967100B (en) * 2021-04-02 2024-03-15 杭州网易云音乐科技有限公司 Similar crowd expansion method, device, computing equipment and medium

Also Published As

Publication number Publication date
CN104751354B (en) 2018-06-26

Similar Documents

Publication Publication Date Title
CN104751354B (en) A kind of advertisement crowd screening technique
KR101419504B1 (en) System and method providing a suited shopping information by analyzing the propensity of an user
US20160170982A1 (en) Method and System for Joint Representations of Related Concepts
US8442849B2 (en) Emotional mapping
CN108885624B (en) Information recommendation system and method
CN106557480B (en) Method and device for realizing query rewriting
CN104750789A (en) Label recommendation method and device
US11836778B2 (en) Product and content association
US20160188726A1 (en) Scalable user intent mining using a multimodal restricted boltzmann machine
US8888497B2 (en) Emotional web
CN105718184A (en) Data processing method and apparatus
CN103577549A (en) Crowd portrayal system and method based on microblog label
CN106294500B (en) Content item pushing method, device and system
CN103177384A (en) Network advertisement putting method based on user interest spectrum
KR20110048065A (en) System and method for online advertising using user social information
CN104462336A (en) Information pushing method and device
CN104077415A (en) Searching method and device
CN108073667B (en) Method for generating user browsing attributes, and non-transitory computer readable medium
CN104572863A (en) Product recommending method and system
CN105447193A (en) Music recommending system based on machine learning and collaborative filtering
KR101804967B1 (en) Method and system to recommend music contents by database composed of user's context, recommended music and use pattern
CN104142990A (en) Search method and device
CN103853789A (en) Method and equipment used for recommending information to user
CN116823410B (en) Data processing method, object processing method, recommending method and computing device
CN113469786A (en) Method and device for recommending articles, computer equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 100080, No. 8 Haidian street, Beijing, Haidian District Steel International Plaza, 6 floor

Patentee after: YOUKU INFORMATION TECHNOLOGY (BEIJING) Co.,Ltd.

Address before: 100080, No. 8 Haidian street, Beijing, Haidian District Steel International Plaza, 6 floor

Patentee before: HEYI INFORMATION TECHNOLOGY (BEIJING) Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200422

Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Patentee after: Alibaba (China) Co.,Ltd.

Address before: 100080, No. 8 Haidian street, Beijing, Haidian District Steel International Plaza, 6 floor

Patentee before: YOUKU INFORMATION TECHNOLOGY (BEIJING) Co.,Ltd.

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180626

Termination date: 20210413