WO2016074492A1 - 基于社交平台的数据挖掘方法及装置 - Google Patents

基于社交平台的数据挖掘方法及装置 Download PDF

Info

Publication number
WO2016074492A1
WO2016074492A1 PCT/CN2015/083804 CN2015083804W WO2016074492A1 WO 2016074492 A1 WO2016074492 A1 WO 2016074492A1 CN 2015083804 W CN2015083804 W CN 2015083804W WO 2016074492 A1 WO2016074492 A1 WO 2016074492A1
Authority
WO
WIPO (PCT)
Prior art keywords
registered user
interest
tag
user
attention
Prior art date
Application number
PCT/CN2015/083804
Other languages
English (en)
French (fr)
Inventor
张一鸣
陈韬
曹欢欢
罗立新
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Priority to US15/525,870 priority Critical patent/US10360230B2/en
Priority to EP15859244.4A priority patent/EP3220289A4/en
Priority to BR112017009666A priority patent/BR112017009666A2/pt
Priority to JP2017525373A priority patent/JP6438135B2/ja
Priority to CA2966757A priority patent/CA2966757C/en
Priority to MX2017006054A priority patent/MX2017006054A/es
Publication of WO2016074492A1 publication Critical patent/WO2016074492A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24575Query processing with adaptation to user needs using context
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0282Rating or review of business operators or products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Definitions

  • the present invention relates to the field of computers, and in particular to a data mining method and apparatus based on a social platform.
  • the cold start problem of the recommendation system is a major challenge in the application of such products as information clients.
  • the cold start problem of the recommendation system refers to the lack of sufficient data for the new user system to capture the user's interest and effectively recommend the content.
  • This problem is one of the many widely used methods in many solutions. It encourages users to log in to the recommendation system using social network service (SNS) accounts, such as Weibo, Tencent QQ, Renren.com and other social accounts.
  • SNS social network service
  • the recommendation system can use the information of the user's social network platform (for example, attention relationship, friend relationship, interest tag, posting content, etc.) to initialize the user's interest model, thereby making effective recommendation.
  • a main object of the present invention is to provide a data mining method and apparatus based on a social platform, which solves the problem in the prior art that the newly registered users do not have historical browsing records, and the targeted information cannot be provided.
  • a data mining method based on a social platform includes: acquiring an interest tag dictionary of a registered user on the information client; acquiring a first object in the social platform that has a relationship of interest with a registered user on the information client, and reading between the registered user and the first object Relationship information; determining, according to the first object that the registered user has the relationship of interest, the first attention set corresponding to the registered user; constructing the interest model according to the interest tag dictionary of the registered user and the first attention set, wherein the interest model is used And corresponding to the corresponding relationship between the registered user and the interest tag having the same first attention set; obtaining a second object that the newly registered user has a concern relationship with in the social platform, and reading the newly registered user and the second object Relationship information between the second registered object and the second registered object of the new registered user; the second focused set is matched with the interest model, and the recommended interest of the newly registered user is determined according to the interest model
  • a data mining device based on a social platform includes: a first obtaining module, configured to acquire an interest tag dictionary of a registered user on the information client.
  • a second obtaining module configured to acquire a first object in the social platform that has a relationship of interest with the registered user on the information client, and read relationship information between the registered user and the first object; Determining, according to the first object that the registered user has the relationship of interest, the first set of attention corresponding to the registered user; the first processing module, configured to construct the interest model according to the interest tag dictionary of the registered user and the first attention set, The interest model is used to represent the correspondence between the registered user and the interest tag having the same first attention set; and the third obtaining module is configured to acquire the second registered user on the information client and have the second relationship with the interest in the social platform.
  • a first object that has a relationship of interest with a registered user on the information client in the social platform is obtained by acquiring an interest tag dictionary of the registered user on the information client, and reading the registered user and the first object Relationship information; determining, according to the first object that the registered user has the relationship of interest, the first attention set corresponding to the registered user; constructing the interest model according to the interest tag dictionary of the registered user and the first attention set, wherein the interest The model is used to represent the correspondence between the registered user and the interest tag having the same first attention set; acquiring the second object of the newly registered user on the information client with the concern relationship in the social platform, and reading the newly registered user and the first Relationship information between the two objects; determining, according to the second object that the newly registered user has the relationship of interest, a second set of attention with the newly registered user; matching the second set of interest with the interest model, and determining the new registered user according to the interest model Recommended interest tags, which solve the problem in the prior art because new registered users have no history. Record
  • FIG. 2 is a flow chart of a preferred social platform-based data mining according to a first embodiment of the present invention
  • FIG. 3 is a schematic flowchart of generating a registered user set by matching a registered user through a attention set in the microblog;
  • FIG. 4 is a schematic structural diagram of a social platform-based data mining apparatus according to Embodiment 2 of the present invention.
  • FIG. 5 is a schematic structural diagram of a preferred social platform-based data mining apparatus according to Embodiment 2 of the present invention.
  • FIG. 6 is a schematic structural diagram of a preferred social platform-based data mining apparatus according to Embodiment 2 of the present invention.
  • the embodiment of the invention provides a data mining method based on a social platform.
  • FIG. 1 is a flow chart of a social platform-based data mining method according to a first embodiment of the present invention. As shown in Figure 1, the method includes the following steps:
  • Step S11 Obtain an interest tag dictionary of the registered user on the information client.
  • the interest tag dictionary corresponding to each registered user is obtained by collecting the historical browsing behavior of the registered users.
  • Step S13 Acquire a first object in the social platform that has a relationship of interest with a registered user on the information client, and read relationship information between the registered user and the first object.
  • an object having a relationship of interest with the registered user is determined by reading the relationship information of the registered user on the social platform.
  • the relationship of concern may be a friend relationship in Tencent QQ software, a relationship of interest in Weibo, or a friend relationship in Renren.
  • Step S15 Determine, according to the first object that the registered user has the relationship of interest, the first attention set corresponding to the registered user.
  • the first object of interest of each registered user is determined by sorting the first objects respectively having a relationship of interest to each registered user.
  • Step S17 Construct an interest model according to the interest tag dictionary of the registered user and the first attention set, wherein the interest model is used to represent the correspondence between the registered users and the interest tags having the same first attention set.
  • step S17 of the present application by analyzing the attention set of each registered user, the registered users having different first attention sets are classified into the registered user sets corresponding to the plurality of first attention sets respectively. And generating a user collection tag dictionary corresponding to the first attention set by using the interest tag dictionary of the registered user in the registered user set. Thereby, the correspondence between the first attention set and the interest tag is determined.
  • Step S19 Acquire a second object that the newly registered user has a concern relationship with the newly registered user in the social platform, and read relationship information between the newly registered user and the second object.
  • the second object that has a relationship of interest with the newly registered user is determined by reading the relationship information of the newly registered user on the social platform.
  • the relationship of concern may be a friend relationship in Tencent QQ software, a relationship of interest in Weibo, or a friend relationship in Renren.
  • Step S21 Determine a second focus set of the newly registered user according to the second object that the newly registered user has the relationship of interest.
  • step S21 of the present application the second object of interest of the newly registered user is sorted, thereby determining the second set of attention of the newly registered user.
  • step S23 the second attention set is matched with the interest model, and the recommended interest tag of the newly registered user is determined according to the interest model.
  • the second set of attention of the newly registered user is matched with the plurality of first attention sets in the interest model, and the first attention set matching the second attention set of the newly registered user is obtained, thereby
  • the first set of concerns determines the interest tags of the newly registered users.
  • the registered users having the same first attention set in the social platform are grouped by using the step S11 to the step S23, and the registered user set corresponding to the first attention set is obtained, and the registered information is registered according to the information client.
  • the user's interest tag dictionary is obtained, and a user set tag dictionary corresponding to the registered user set is obtained.
  • an interest model with a correspondence between the first focus set and the user set label dictionary is constructed.
  • the relationship in the social platform reflects the similarity of interest of the user.
  • the social platform takes Weibo as an example to filter the content of the microblog attention list of registered users on the information client, and selects the attention objects whose number of fans exceeds a certain value or selects the number of followers of the number of fans to form a target object.
  • the first focus collection In the same screening manner, the microblog attention list of all registered users is filtered, and the first attention set corresponding to each registered user is obtained, and the registered users having the same first attention set are classified into several A collection of registered users, each set of registered users having a different first set of concerns.
  • a user collection tag dictionary corresponding to each registered user set is obtained by collecting the interest tag dictionary of the registered users in the registered user set.
  • the new user's attention list is also filtered in the same manner, and the filtered second attention set and a plurality of registered users are selected.
  • the first set of concerns of the set is matched to determine the set of registered users to which the new user belongs, and the user set tag dictionary corresponding to the set of registered users, that is, the recommended interest tag of the newly registered user.
  • the present invention solves the problem in the prior art that the newly registered users do not have historical browsing records, and the targeted information cannot be provided. It achieves the effect of providing targeted information to users through the attention of newly registered users on social platforms.
  • the method before obtaining the interest tag dictionary of the registered user on the information client in step S11, the method includes:
  • Step S101 obtaining recommendation information.
  • Step S103 extracting an interest tag of the recommended information from the content of the recommendation information.
  • Step S105 Acquire historical behavior data of the registered user, where the historical behavior data is used to record the operation behavior of the registered user on the recommended information.
  • Step S107 determining a tag weight value of the interest tag according to the historical behavior data.
  • Step S109 determining an interest tag dictionary corresponding to the registered user according to the tag weight value.
  • the content of all the recommended information in the information client is analyzed through steps S101 to S109, and the interest tag is extracted for each piece of recommendation information according to the content of the recommended information.
  • the operation behavior of the registered user is recorded, and according to the operation behavior of the recommended information, the interest tag corresponding to the recommended information is weighted, and the corresponding user is calculated.
  • the weight value of the interest tag is greater than the threshold, the tag is added to the interest tag dictionary corresponding to the user.
  • the recommendation service in the information client will be tagged with the recommendation information recommended by the client, for example, classification of content: technology, football, basketball, etc., for the corresponding group of people: technical house, Outdoor enthusiasts, teenagers, etc., keywords for content: iPhone, tank contest, essence Kunststoff, etc.
  • interest tags are sometimes manually edited, and sometimes the algorithm automatically analyzes the recommendations to identify them.
  • the method for calculating the label weight value of the interest tag may include:
  • V ⁇ i Ti ⁇ wi
  • Ti represents the interest tag vector of the i-th user action
  • wi represents the weight of the i-th user action
  • the step of constructing the interest model according to the registered user's interest tag dictionary and the first attention set in step S17 includes:
  • the first attention set is filtered to obtain a third attention set corresponding to the registered user, wherein the screening method at least includes: a data screening method, an index screening method, a condition screening method, and an information screening method.
  • Step S173 Matching the registered users by the third attention set to generate a registered user set, wherein the registered user set includes the registered users having the same third attention set.
  • Step S175 Generate a user set tag dictionary corresponding to the registered user set according to the interest tag dictionary of the registered user included in the registered user set.
  • the first attention set of the registered user is first filtered by using the step S171 to the step S175, and the first attention set may be filtered according to the quantity of interest and/or the number of friends and/or the activity level, etc., and is inactive.
  • the user with less friends is removed from the first set of concerns, and a filtered third focused set is generated.
  • the filtered registered users are matched by the third attention set, and the registered users of the third attention set whose matching degree is greater than a preset threshold or the third attention set are identical are divided into the same registered user set.
  • the registered user set can be composed of many.
  • the third attention set can also be defined by a person, and according to the third set of attention defined by the person, the registered users are grouped and divided into different registered user sets.
  • a user set tag dictionary corresponding to the currently registered user set is generated according to the content of the interest tag dictionary corresponding to each registered user among the registered user sets.
  • FIG. 3 is a schematic diagram of a process of generating a registered user set by matching a registered user through a focused set in the microblog.
  • the number of fans is used as a filter condition, and the users with less fans in the list are filtered and filtered.
  • a third focus set is generated based on the filtered attention list.
  • the registered users are classified and divided, and the registered users having the common third attention set are divided into a registered user set to achieve the purpose of the similar interest user group.
  • the step S175, according to the interest tag dictionary of the registered user included in the registered user set, generating the user set tag dictionary corresponding to the registered user set includes:
  • Step S1751 Acquire a first user number of registered users on the information client and a second user quantity of the registered user set.
  • Step S1753 Calculate an average value of the weight distribution of each interest tag according to the tag weight value and the first user number.
  • Step S1755 Calculate a set weight average value of each interest tag in the user set interest tag dictionary according to the tag weight value of the registered user in the registered user set and the second user number.
  • Step S1757 Calculate the registered user set weight value of the interest tag in the user set interest tag dictionary according to the weight distribution average value and the aggregate weight average value.
  • Step S1759 sequentially compare the registered user set weight value of the interest tag in the user set interest tag dictionary with a preset noise threshold.
  • the interest tag corresponding to the registered user set weight value is retained in the user set tag dictionary
  • the interest tag corresponding to the registered user set weight value is deleted in the user set tag dictionary.
  • the social platform takes the microblog as an example.
  • the interest tag dictionary of the individual users may be merged to obtain the group interest model.
  • the easiest way is to add the user tag vectors directly.
  • the result of this operation is very noisy, because there are many microblogging large followers in some fields, many people are only concerned because of this large size, and the behavior itself cannot reflect its own interests. If you simply add the user's interest tag vectors, meaningful signals are easily overwhelmed by general interest.
  • the analysis focuses on Weibo users of Wang Xing (founder of Meituan.com).
  • N represents the number of all registered users
  • V n represents a user's interest tag weight distribution
  • V'[i] V[i]/V base [i];
  • V'[i] represents the registered user set weight value of the interest tag i
  • V[i] represents the aggregate weight of the interest tag of the interest tag i
  • V base [i] represents the weight of all users on the interest tag i. Distribution average.
  • the interest tag By comparing the registered user set weight value V′ with a preset noise threshold, when the registered user set weight value V′ is smaller than the noise threshold, the interest tag is proved to be a noise tag, and the tag dictionary should be collected from the current user. If the registered user set weight value V' is greater than or equal to the noise threshold, the interest tag is determined to be a non-noise tag, and the tag is retained in the current user set tag dictionary.
  • the second attention set is matched with the interest model in step S23, and the recommended interest tag of the newly registered user is determined according to the interest model, and the steps include:
  • the second attention set is filtered to obtain a fourth attention set corresponding to the newly registered user, wherein the screening method at least includes: a data screening method, an index screening method, a condition screening method, and an information screening method.
  • Step S233 matching the fourth attention set with the third attention set, and determining a registered user set corresponding to the newly registered user.
  • Step S235 Determine a recommended interest tag of the newly registered user according to the user set tag dictionary of the registered user set corresponding to the newly registered user.
  • the second attention set of the newly registered user is first filtered by using the step S231 to the step S235, and the second attention set may be filtered according to the quantity of interest and/or the number of friends and/or the activity level, etc., and is inactive.
  • the user with less friends is removed from the second set of concerns, and a filtered fourth focused set is generated.
  • the screening method may be the same as the screening method used in step 171, and other screening methods may also be used. As long as the purpose of optimizing the second set of concerns can be achieved, there is no limitation on the screening method used.
  • the recommended tag recommended for the new user is determined according to the user collection tag dictionary of the registered user set to which the new registration belongs.
  • the method further includes:
  • step S24 the recommended information is pushed for the newly registered user according to the recommended interest tag.
  • step S24 recommendation information matching the interest tag is pushed to the newly registered user according to the interest tag determined for the newly registered user through the above steps.
  • the present invention effectively combines the social network public data and the recommendation service private data to jointly recommend content for the user. Combining the two types of data helps to more accurately recommend personalized content than using only social network public data or referral service private data.
  • the fusion method proposed by the present invention can also utilize the fusion of two kinds of data for the new user (the intra-site user interest model based on the intra-site data mining is transferred to the newly registered off-site user through the social relationship), which is also an effect that cannot be achieved by the traditional method. .
  • One of the features of the present invention is that the more the recommendation service provider has a large number of users, the better the effect of this method. Because such a recommendation service provider's user group will have a relatively large coverage for the social network user group, it will not appear to be given a social account, and most of its friends or fans are not the users in the station, and it is impossible to mine the interest of the group. This is a significant competitive advantage for today's headline products with billions of users, and for some smaller recommended products is a technical barrier.
  • the embodiment of the present invention further provides a data mining device based on a social platform.
  • the device includes: a first obtaining module 30, a second obtaining module 32, a first determining module 34, and a first processing module 36.
  • the first obtaining module 30 is configured to obtain an interest tag dictionary of the registered user on the information client.
  • the first obtaining module 30 of the present application is configured to analyze and obtain an interest tag dictionary corresponding to each registered user by collecting the historical browsing behavior of the registered user.
  • the second obtaining module 32 is configured to acquire a first object in the social platform that has a relationship of interest with the registered user on the information client, and read relationship information between the registered user and the first object.
  • the second obtaining module 32 of the present application is configured to determine an object having a relationship of interest with the registered user by reading the relationship information of the registered user on the social platform.
  • the relationship of concern may be a friend relationship in Tencent QQ software, a relationship of interest in Weibo, or a friend relationship in Renren.
  • the first determining module 34 is configured to determine, according to the first object that the registered user has the relationship of interest, the first attention set corresponding to the registered user.
  • the first determining module 34 of the present application is configured to sort the first objects that have a relationship of interest to each registered user, thereby determining a first attention set of each registered user.
  • the first processing module 36 is configured to construct an interest model according to the interest tag dictionary of the registered user and the first attention set, wherein the interest model is used to represent the correspondence between the registered user and the interest tag having the same first attention set.
  • the first processing module 36 of the present application is configured to classify the registered users with different first attention sets by using the attention set of each registered user, and respectively classify and correspond to the plurality of first attention sets respectively.
  • the user collection is registered, and a user collection label dictionary corresponding to the first attention set is generated by the interest tag dictionary of the registered user in the registered user collection. Thereby, the correspondence between the first attention set and the interest tag is determined.
  • the third obtaining module 38 is configured to acquire a second object that the newly registered user has a relationship of interest with the newly registered user in the social platform, and read relationship information between the newly registered user and the second object.
  • the third obtaining module 38 of the present application is configured to determine a second object that has a relationship of interest with the newly registered user by reading the relationship information of the newly registered user on the social platform.
  • the relationship of concern may be a friend relationship in Tencent QQ software, a relationship of interest in Weibo, or a friend relationship in Renren.
  • the second determining module 40 is configured to determine, according to the second object that the newly registered user has the relationship of interest, the second set of attention with the newly registered user.
  • the second determining module 40 of the present application is configured to sort the second object that has a relationship of interest to the newly registered user, thereby determining the second focused set of the newly registered user.
  • the second processing module 42 is configured to match the second attention set with the interest model, and determine the recommended interest tag of the newly registered user according to the interest model.
  • the second processing module 42 of the present application is configured to perform matching by using a second set of attention of the newly registered user with a plurality of first attention sets in the interest model to obtain a first attention that matches the second attention set of the newly registered user. A collection, thereby determining an interest tag of the newly registered user through the first set of concerns.
  • the first obtaining module 30, the second obtaining module 32, the first determining module 34, the first processing module 36, the third obtaining module 38, the second determining module 40, and the second processing module 42 are on the social platform.
  • the registered users having the same first attention set are grouped, and the registered user set corresponding to the first attention set is obtained, and the acquired user set is obtained according to the acquisition of the interest tag dictionary of the registered user on the information client.
  • the corresponding user collection tag dictionary is constructed.
  • the recommended interest tag of the newly registered user can be obtained.
  • the relationship in the social platform reflects the similarity of interest of the user.
  • the social platform uses Weibo as an example to filter the content of the microblog attention list of registered users on the information client, and select out the attention objects whose number of fans exceeds a certain value or select the number of followers of the top number of fans. Become a first focus collection.
  • the microblog attention list of all registered users is filtered, and the first attention set corresponding to each registered user is obtained, and the registered users having the same first attention set are classified into several A collection of registered users, each set of registered users having a different first set of concerns.
  • a user collection tag dictionary corresponding to each registered user set is obtained by collecting the interest tag dictionary of the registered users in the registered user set.
  • the new user's attention list is also filtered in the same manner, and the filtered second attention set and a plurality of registered users are selected.
  • the first set of concerns of the set is matched to determine the set of registered users to which the new user belongs, and the user set tag dictionary corresponding to the set of registered users, that is, the recommended interest tag of the newly registered user.
  • the present invention solves the problem in the prior art that the newly registered users do not have historical browsing records, and the targeted information cannot be provided. It achieves the effect of providing targeted information to users through the attention of newly registered users on social platforms.
  • the apparatus further includes: a fourth obtaining module 281, an extracting module 283, a fifth obtaining module 285, a third determining module 287, and a fourth determining module 289.
  • the fourth obtaining module 281 is configured to obtain recommendation information.
  • the extracting module 283 is configured to extract an interest tag of the recommended information from the content of the recommended information.
  • the fifth obtaining module 285 is configured to obtain historical behavior data of the registered user, where the historical behavior data is used to record the operating behavior of the registered user on the recommended information.
  • the third determining module 287 is configured to determine a label weight value of the interest tag according to the historical behavior data.
  • the fourth determining module 289 is configured to determine, according to the tag weight value, an interest tag dictionary corresponding to the registered user.
  • the fourth obtaining module 281, the extracting module 283, the fifth obtaining module 285, the third determining module 287, and the fourth determining module 289 analyze the content of all the recommended information in the information client, according to the recommended information.
  • the content extracts interest tags for each recommendation.
  • the operation behavior of the registered user is recorded, and according to the operation behavior of the recommended information, the interest tag corresponding to the recommended information is weighted, and the corresponding user is calculated.
  • the weight value of the interest tag When the tag weight value is greater than the threshold, the tag is added to the interest tag dictionary corresponding to the user.
  • the recommendation service in the information client will be tagged with the recommendation information recommended by the client, for example, classification of content: technology, football, basketball, etc., for the corresponding group of people: Technical homes, outdoor enthusiasts, teenagers, etc., keywords for content: iPhone, tank contest, essence Kunststoff, etc.
  • interest tags are sometimes manually edited, and sometimes the algorithm automatically analyzes the recommendations to identify them.
  • the method for calculating the label weight value of the interest tag may include:
  • V ⁇ i Ti ⁇ wi
  • Ti represents the interest tag vector of the i-th user action
  • wi represents the weight of the i-th user action
  • the first processing module 36 includes: a first sub-processing module 361, a sub-matching module 363, and a first generating module 365.
  • the first sub-processing module 361 is configured to filter the first attention set to obtain a third attention set corresponding to the registered user, where the screening method at least includes: a data screening method, an indicator screening method, a condition screening method, and Information screening method.
  • the sub-matching module 363 is configured to perform matching on the registered users by using the third attention set to generate a registered user set, where the registered user set includes the registered users having the same third attention set.
  • the first generating module 365 is configured to generate a user set tag dictionary corresponding to the registered user set according to the interest tag dictionary of the registered user included in the registered user set.
  • the first sub-processing module 361, the sub-matching module 363, and the first generating module 365 are used to first filter the first attention set of the registered user, and may be according to the quantity of interest and/or the number of friends and/or the activity level.
  • the condition filters the first attention set, removes the inactive and less friends from the first attention set, and generates the filtered third attention set.
  • the filtered registered users are matched by the third attention set, and the registered users of the third attention set whose matching degree is greater than a preset threshold or the third attention set are identical are divided into the same registered user set.
  • the registered user set can be composed of many.
  • the third attention set can also be defined by a person, and according to the third set of attention defined by the person, the registered users are grouped and divided into different registered user sets.
  • a user set tag dictionary corresponding to the currently registered user set is generated according to the content of the interest tag dictionary corresponding to each registered user among the registered user sets.
  • FIG. 3 is a schematic diagram of a process of generating a registered user set by matching a registered user through a focused set in the microblog.
  • the number of fans is used as a filter condition, and the users with less fans in the list are filtered and filtered.
  • a third focus set is generated based on the filtered attention list.
  • the registered users are classified and divided, and the registered users having the common third attention set are divided into a registered user set to achieve the purpose of the similar interest user group.
  • the first generation module 365 includes: a first sub-acquisition module 3651, a first sub-calculation module 3652, a second sub-calculation module 3653, a third sub-calculation module 3654, and a sub-judgment module. 3655.
  • the first sub-acquisition module 3651 is configured to obtain the first user number of the registered user on the information client and the second user number of the registered user set.
  • the first sub-calculation module 3652 is configured to calculate an average value of the weight distribution of each interest tag according to the tag weight value and the first user quantity.
  • the second sub-computing module 3653 is configured to calculate, according to the tag weight value of the registered user in the registered user set and the second user quantity, an aggregate weight of the set of individual interest tags in the user set interest tag dictionary.
  • the third sub-calculation module 3654 is configured to calculate, according to the weight distribution average value and the aggregate weight average value, the registered user set weight value of the interest tag in the user set interest tag dictionary.
  • the sub-judging module 3655 is configured to sequentially compare the registered user set weight value of the interest tag in the user set interest tag dictionary with a preset noise threshold.
  • the interest tag corresponding to the registered user set weight value is retained in the user set tag dictionary.
  • the interest tag corresponding to the registered user set weight value is deleted in the user set tag dictionary.
  • the first generation module 365 includes: a first sub-acquisition module 3651, a first sub-calculation module 3652, a second sub-calculation module 3653, a third sub-calculation module 3654, and a sub-judgment module 3655 applied in an actual application
  • the social platform takes Weibo as an example. After finding similar interest user groups, the interest tag dictionary of these user individuals can be merged to obtain a group interest model. The easiest way is to add the user tag vectors directly. However, in practical applications, it is found that the result of this operation is very noisy, because there are many microblogging large followers in some fields, many people are only concerned because of this large size, and the behavior itself cannot reflect its own interests.
  • N represents the number of all registered users
  • V n represents a user's interest tag weight distribution
  • V'[i] V[i]/V base [i];
  • V'[i] represents the registered user set weight value of the interest tag i
  • V[i] represents the aggregate weight of the interest tag of the interest tag i
  • V base [i] represents the weight of all users on the interest tag i. Distribution average.
  • the interest tag By comparing the registered user set weight value V′ with a preset noise threshold, when the registered user set weight value V′ is smaller than the noise threshold, the interest tag is proved to be a noise tag, and the tag dictionary should be collected from the current user. If the registered user set weight value V' is greater than or equal to the noise threshold, the interest tag is determined to be a non-noise tag, and the tag is retained in the current user set tag dictionary.
  • the interest tag By comparing the registered user set weight value V′ with a preset noise threshold, when the registered user set weight value V′ is smaller than the noise threshold, the interest tag is proved to be a noise tag, and the tag dictionary should be collected from the current user. If the registered user set weight value V' is greater than or equal to the noise threshold, the interest tag is determined to be a non-noise tag, and the tag is retained in the current user set tag dictionary.
  • the second processing module 42 includes: a second sub-processing module 421, a first sub-determination module 423, and a second sub-determination module 425.
  • the second sub-processing module 421 is configured to filter the second attention set to obtain a fourth attention set corresponding to the newly registered user, where the screening apparatus at least includes: a data screening method, an indicator screening method, a condition screening method, and Information screening method.
  • the first sub-determination module 423 is configured to match the fourth attention set with the third attention set, and determine a registered user set corresponding to the newly registered user.
  • the second sub-determination module 425 is configured to determine a recommended interest tag of the newly registered user according to the user set tag dictionary of the registered user set corresponding to the newly registered user.
  • the second sub-processing module 421, the first sub-determination module 423, and the second sub-determination module 425 first filter the second attention set of the newly registered user, and may select the number of interest and/or the number of friends and/or
  • the second attention set is filtered by the condition such as the activity level, and the inactive and less friends are removed from the second attention set, and the filtered fourth attention set is generated.
  • the screening method may be the same as the screening method used in step 171, and other screening methods may also be used. As long as the purpose of optimizing the second set of concerns can be achieved, there is no limitation on the screening method used.
  • the recommended tag recommended for the new user is determined according to the user collection tag dictionary of the registered user set to which the new registration belongs.
  • the device further includes: a pushing module 43.
  • the pushing module 43 is configured to push the recommended information for the newly registered user according to the recommended interest tag.
  • the push module 43 pushes the recommendation information that matches the interest tag to the newly registered user according to the interest tag determined by the new registered user through the above steps.
  • the present invention effectively combines the social network public data and the recommendation service private data to jointly recommend content for the user. Combining the two types of data helps to more accurately recommend personalized content than using only social network public data or referral service private data.
  • the fusion method proposed by the present invention can also utilize the fusion of two kinds of data for the new user (the intra-site user interest model based on the intra-site data mining is transferred to the newly registered off-site user through the social relationship), which is also an effect that cannot be achieved by the traditional method. .
  • One of the features of the present invention is that the more the recommendation service provider has a large number of users, the better the effect of this method. Because such a recommendation service provider's user group will have a relatively large coverage for the social network user group, it will not appear to be given a social account, and most of its friends or fans are not the users in the station, and it is impossible to mine the interest of the group. This is a significant competitive advantage for today's headline products with billions of users, and for some smaller recommended products is a technical barrier.
  • the various functional units provided by the embodiments of the present application may be operated in a mobile terminal, a computer terminal, or the like, or may be stored as part of a storage medium.
  • embodiments of the present invention may provide a computer terminal, which may be any computer terminal device in a group of computer terminals.
  • a computer terminal may also be replaced with a terminal device such as a mobile terminal.
  • the computer terminal may be located in at least one network device of the plurality of network devices of the computer network.
  • the computer terminal may execute the following program code in the social platform-based data mining method: acquiring the interest tag dictionary of the registered user on the information client; acquiring the registered user in the social platform and the information client a first object having a relationship of interest, and reading relationship information between the registered user and the first object; determining, according to the first object that the registered user has the relationship of interest, the first attention set corresponding to the registered user; Registering the user's interest tag dictionary and the first attention set, constructing an interest model, wherein the interest model is used to represent the correspondence between the registered user and the interest tag having the same first attention set; and the newly registered user on the information client is socializing a second object in the platform having a relationship of interest, and reading relationship information between the newly registered user and the second object; determining a second focus set with the newly registered user according to the second object having the relationship of interest of the newly registered user; Matching the second set of interest to the interest model and determining according to the interest model Registered users of the recommended label interest.
  • the computer terminal can include: one or more processors, memory, and transmission means.
  • the memory can be used to store software programs and modules, such as program instructions/modules corresponding to the social platform-based data mining method in the embodiment of the present invention, and the processor executes various programs by running software programs and modules stored in the memory. Functional application and data processing, that is, implementing the above social platform-based data mining method.
  • the memory may include a high speed random access memory, and may also include non-volatile memory such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory.
  • the memory can further include memory remotely located relative to the processor, which can be connected to the terminal over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the above transmission device is for receiving or transmitting data via a network.
  • Specific examples of the above network may include a wired network and a wireless network.
  • the transmission device includes a Network Interface Controller (NIC) that can be connected to other network devices and routers via a network cable to communicate with the Internet or a local area network.
  • the transmission device is a Radio Frequency (RF) module for communicating with the Internet wirelessly.
  • NIC Network Interface Controller
  • RF Radio Frequency
  • the memory is used to store preset action conditions and information of the preset rights user, and an application.
  • the processor can call the memory stored information and the application by the transmitting device to execute the program code of the method steps of each of the alternative or preferred embodiments of the above method embodiments.
  • the computer terminal can also be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a mobile Internet device (MID), a PAD, and the like.
  • a smart phone such as an Android phone, an iOS phone, etc.
  • a tablet computer such as a Samsung Galaxy Tab, etc.
  • a palm computer such as a Samsung Galaxy Tab, etc.
  • MID mobile Internet device
  • Embodiments of the present invention also provide a storage medium.
  • the foregoing storage medium may be used to save program code executed by the social platform-based data mining method provided by the foregoing method embodiment.
  • the foregoing storage medium may be located in any one of the computer terminal groups in the computer network, or in any one of the mobile terminal groups.
  • the storage medium is configured to store program code for performing the following steps: acquiring an interest tag dictionary of a registered user on the information client; acquiring a registered user in the social platform and the information client a first object having a relationship of interest, and reading relationship information between the registered user and the first object; determining, according to the first object that the registered user has the relationship of interest, the first attention set corresponding to the registered user; Registering the user's interest tag dictionary and the first attention set, constructing an interest model, wherein the interest model is used to represent the correspondence between the registered user and the interest tag having the same first attention set; and the newly registered user on the information client is socializing a second object in the platform having a relationship of interest, and reading relationship information between the newly registered user and the second object; determining a second focus set with the newly registered user according to the second object having the relationship of interest of the newly registered user; Matching the second set of interest with the interest model, and determining the push of the newly registered user according to the interest model Interest tags.
  • the storage medium may also be arranged to store program code for performing various preferred or optional method steps provided by the social platform based data mining method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Computational Linguistics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

一种基于社交平台的数据挖掘方法及装置。其中,该方法包括:获取资讯客户端上已注册用户的兴趣标签字典(S11)和社交平台中与资讯客户端上已注册用户具有关注关系的第一对象(S13);根据已注册用户具有关注关系的第一对象,确定与已注册用户对应的第一关注集合(S15);根据已注册用户的兴趣标签字典和第一关注集合,构建兴趣模型(S17);获取资讯客户端上新注册用户在社交平台中与其具有关注关系的第二对象,并读取新注册用户与第二对象之间的关系信息(S19);根据新注册用户具有关注关系的第二对象,确定新注册用户的第二关注集合(S21);将第二关注集合与兴趣模型进行匹配,确定新注册用户的推荐兴趣标签(S23)。该方法及装置解决了现有技术中因新注册用户没有历史浏览记录,导致的无法提供有针对性的资讯的问题。

Description

基于社交平台的数据挖掘方法及装置 技术领域
本发明涉及计算机领域,具体而言,涉及一种基于社交平台的数据挖掘方法及装置。
背景技术
目前,随着计算机技术的发展和互联网的逐渐普及,越来越多的人通过互联网来获取各种各样的资讯。而相应的,互联网上的资讯数量也随着计算机技术的发展和互联网的普及变得更加丰富起来。
近些年来,移动互联网的快速发展,人们逐渐习惯通过移动终端上的资讯客户端来获取资讯内容。这种方式使得用户在通过网络获取资讯的时间变得更加碎片化。在这种背景下,如何精准的为用户提供有价值,且用户感兴趣的资讯信息变得更加重要。尤其,在为新用户提供有价值且感兴趣的资讯,成为了亟待解决的问题。
在现有的技术当中,推荐系统的冷启动问题是资讯客户端这类产品应用中的一个主要挑战。其中,推荐系统的冷启动问题是指对于新用户系统缺乏足够的数据来捕获用户的兴趣并有效的推荐内容。这个问题在众多解决方案中,有一类被广泛使用的方法,就是鼓励用户用社交网络(Social Network Service:SNS)账号登陆推荐系统,例如:微博、腾讯QQ、人人网等社交账号登陆。推荐系统可以利用用户社交网络平台的信息(例如:关注关系,好友关系,兴趣标签,发布内容等)初始化用户的兴趣模型,从而进行有效推荐。
一方面,单纯利用社交网络平台的公开数据用于内容推荐(公开数据例如:视频,文章,图片,音乐,游戏,软件,好友等)在实际应用中还有不少困难。例如:在社交网络平台的发布内容往往篇幅较短,且内容杂乱,用户的标签内容往往标新立异(例如:不睡懒觉会死星人、密集恐惧症晚期患者等),较难被机器学习算法理解,对改进推荐服务帮助有限。而对于在社交网络上不活跃,社交关系薄弱的用户来说,其社交网络平台上的公开数据在改进推荐效果上作用就更加有限。另一方面,对于比较成熟,用户量较大的内容推荐服务商来说,在长期的运营过程中,往往已经积累了大量的用户行为信息,例如:用户点播的视频,看过或评论过的文章。这部分数据如果能够有效的和社交网络公开数据融合使用,有可能极大的改进用户的推荐效果。然而,现有 的技术,基本将关注点都聚焦在利用社交网络平台提供的公开数据挖掘用户兴趣模型并进行推荐,此种方法实现的难度较大,且准确率较低。
针对现有技术中因新注册用户没有历史浏览记录,导致的无法提供有针对性的资讯的问题,目前尚未提出有效的解决方案。
发明内容
本发明的主要目的在于提供一种基于社交平台的数据挖掘方法及装置,以解决现有技术中因新注册用户没有历史浏览记录,导致的无法提供有针对性的资讯的问题。
为了实现上述目的,根据本发明实施例的一个方面,提供了一种基于社交平台的数据挖掘方法。该方法包括:获取资讯客户端上已注册用户的兴趣标签字典;获取社交平台中与资讯客户端上已注册用户具有关注关系的第一对象,并读取已注册用户与第一对象之间的关系信息;根据已注册用户具有关注关系的第一对象,确定与已注册用户对应的第一关注集合;根据已注册用户的兴趣标签字典和第一关注集合,构建兴趣模型,其中,兴趣模型用于表征具有相同第一关注集合的已注册用户与兴趣标签的对应关系;获取资讯客户端上新注册用户在社交平台中与其具有关注关系的第二对象,并读取新注册用户与第二对象之间的关系信息;根据新注册用户具有关注关系的第二对象,确定与新注册用户的第二关注集合;将第二关注集合与兴趣模型进行匹配,根据兴趣模型确定新注册用户的推荐兴趣标签。
为了实现上述目的,根据本发明实施例的另一方面,提供了一种基于社交平台的数据挖掘装置,该装置包括:第一获取模块,用于获取资讯客户端上已注册用户的兴趣标签字典;第二获取模块,用于获取社交平台中与资讯客户端上已注册用户具有关注关系的第一对象,并读取已注册用户与第一对象之间的关系信息;第一确定模块,用于根据已注册用户具有关注关系的第一对象,确定与已注册用户对应的第一关注集合;第一处理模块,用于根据已注册用户的兴趣标签字典和第一关注集合,构建兴趣模型,其中,兴趣模型用于表征具有相同第一关注集合的已注册用户与兴趣标签的对应关系;第三获取模块,用于获取资讯客户端上新注册用户在社交平台中与其具有关注关系的第二对象,并读取新注册用户与第二对象之间的关系信息;第二确定模块,用于根据新注册用户具有关注关系的第二对象,确定与新注册用户的第二关注集合;第二处理模块,用于将第二关注集合与兴趣模型进行匹配,根据兴趣模型确定新注册用户的推荐兴趣标签。
根据发明实施例,通过获取资讯客户端上已注册用户的兴趣标签字典;获取社交平台中与资讯客户端上已注册用户具有关注关系的第一对象,并读取已注册用户与第一对象之间的关系信息;根据已注册用户具有关注关系的第一对象,确定与已注册用户对应的第一关注集合;根据已注册用户的兴趣标签字典和第一关注集合,构建兴趣模型,其中,兴趣模型用于表征具有相同第一关注集合的已注册用户与兴趣标签的对应关系;获取资讯客户端上新注册用户在社交平台中与其具有关注关系的第二对象,并读取新注册用户与第二对象之间的关系信息;根据新注册用户具有关注关系的第二对象,确定与新注册用户的第二关注集合;将第二关注集合与兴趣模型进行匹配,根据兴趣模型确定新注册用户的推荐兴趣标签,解决了现有技术中因新注册用户没有历史浏览记录,导致的无法提供有针对性的资讯的问题。实现了通过新注册用户在社交平台的关注关系为用户提供有针对性资讯的效果。
附图说明
构成本申请的一部分的附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1是根据本发明实施例一的基于社交平台的数据挖掘的流程图;
图2是根据本发明实施例一的优选的基于社交平台的数据挖掘的流程图;
图3是通过微博中关注集合对已注册用户进行匹配生成已注册用户集合的流程示意图;
图4是根据本发明实施例二的基于社交平台的数据挖掘装置的结构示意图;
图5是根据本发明实施例二的优选的基于社交平台的数据挖掘装置的结构示意图;以及
图6是根据本发明实施例二的优选的基于社交平台的数据挖掘装置的结构示意图。
具体实施方式
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本发明。
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
实施例1
本发明实施例提供了一种基于社交平台的数据挖掘方法。
图1是根据本发明实施例一的基于社交平台的数据挖掘方法的流程图。如图1所示,该方法包括步骤如下:
步骤S11,获取资讯客户端上已注册用户的兴趣标签字典。
本申请上述步骤S11,通过对已注册用户的历史浏览行为的收集,分析得到与每个已注册用户对应的兴趣标签字典。
步骤S13,获取社交平台中与资讯客户端上已注册用户具有关注关系的第一对象,并读取已注册用户与第一对象之间的关系信息。
本申请上述步骤S13,通过读取已注册用户在社交平台上的关注关系信息,确定与已注册用户有关注关系的对象。
在实际应用当中,关注关系可以是腾讯QQ软件中的好友关系,也可以是微博中的关注关系,也可以是人人网中的朋友关系。
步骤S15,根据已注册用户具有关注关系的第一对象,确定与已注册用户对应的第一关注集合。
本申请上述步骤S15,通过分别对每个已注册用户有关注关系的第一对象进行整理,从而确定每个已注册用户的第一关注集合。
步骤S17,根据已注册用户的兴趣标签字典和第一关注集合,构建兴趣模型,其中,兴趣模型用于表征具有相同第一关注集合的已注册用户与兴趣标签的对应关系。
本申请上述步骤S17,通过对每个已注册用户的关注集合进行分析,将具有不同第一关注集合的已注册用户进行分类,分为与若干个第一关注集合分别对应的已注册用户集合,并通过已注册用户集合中的已注册用户的兴趣标签字典,生成与第一关注集合对应的用户集合标签字典。从而确定第一关注集合与兴趣标签的对应关系。
步骤S19,获取资讯客户端上新注册用户在社交平台中与其具有关注关系的第二对象,并读取新注册用户与第二对象之间的关系信息。
本申请上述步骤S19,通过读取新注册用户在社交平台上的关注关系信息,确定与新注册用户有关注关系的第二对象。
在实际应用当中,关注关系可以是腾讯QQ软件中的好友关系,也可以是微博中的关注关系,也可以是人人网中的朋友关系。
步骤S21,根据新注册用户具有关注关系的第二对象,确定新注册用户的第二关注集合。
本申请上述步骤S21,通过对新注册用户有关注关系的第二对象进行整理,从而确定新注册用户的第二关注集合。
步骤S23,将第二关注集合与兴趣模型进行匹配,根据兴趣模型确定新注册用户的推荐兴趣标签。
本申请上述步骤S23,通过新注册用户的关注第二集合与兴趣模型中的若干个第一关注集合进行匹配,得到与新注册用户的第二关注集合相匹配的第一关注集合,从而通过该第一关注集合确定新注册用户的兴趣标签。
具体的,通过步骤S11至步骤S23,对在社交平台中有相同第一关注集合的已注册用户进行分组,得到与该第一关注集合对应的已注册用户集合,根据对资讯客户端上已注册用户的兴趣标签字典的获取,得到与已注册用户集合对应的用户集合标签字典。这样,就构建了一个拥有第一关注集合与用户集合标签字典对应关系的兴趣模型。 在获取新注册用户的第二关注集合之后,直接通过第二关注集合与兴趣模型中的第一关注集合进行匹配,就可以得到新注册用户的推荐兴趣标签。
在实际应用当中,一般可以认为社交平台中的关系反映了用户的兴趣相似性。基于不同的假设,我们可以采用不同的方法在社交平台上找到和一个用户兴趣相似的其它用户。不同的假设适用于不同类型的社交平台,比如,对于腾讯QQ,微信这种强调双向交流的社交平台可以假设好友之间的兴趣是类似的。而对于微博这种强调单向关注的社交平台可以假设拥有共同关注对象的用户兴趣是类似的,例如,两个用户都关注了雷军、黄章,他们很可能都对智能手机感兴趣。
社交平台以微博为例,对在资讯客户端上已注册用户的微博关注列表的内容进行筛选,筛选出粉丝数超过一定数值的关注对象或者筛选出粉丝数前几名的关注对象构成一个第一关注集合。以相同的筛选方式,对所有已注册用户的微博关注列表进行筛选,得到与每个已注册用户对应的第一关注集合,将拥有相同的第一关注集合的已注册用户归为若干个已注册用户集合,每个已注册用户集合具有不同的第一关注集合。通过收集已注册用户集合中的已注册用户的兴趣标签字典,得到与各个已注册用户集合对应的用户集合标签字典。当一个新注册用户注册资讯客户端后并授权资讯客户端调用微博公开数据之后,对该新用户的关注列表也进行同样方式的筛选,将筛选后的第二关注集合与若干个已注册用户集合的第一关注集合进行匹配,从而确定新用户所属的已注册用户集合,得到该已注册用户集合对应的用户集合标签字典,即新注册用户的推荐兴趣标签。
综上所述,本发明解决了现有技术中因新注册用户没有历史浏览记录,导致的无法提供有针对性的资讯的问题。实现了通过新注册用户在社交平台的关注关系为用户提供有针对性资讯的效果。
优选的,本申请提供的优选实施例中如图2所示,在步骤S11获取资讯客户端上已注册用户的兴趣标签字典之前,方法包括:
步骤S101,获取推荐资讯。
步骤S103,从推荐资讯的内容提取推荐资讯的兴趣标签。
步骤S105,获取已注册用户的历史行为数据,其中,历史行为数据用于记录已注册用户对推荐资讯的操作行为。
步骤S107,根据历史行为数据,确定兴趣标签的标签权重值。
步骤S109,根据标签权重值,确定与已注册用户对应的兴趣标签字典。
具体的,通过步骤S101至步骤S109,对资讯客户端中的所有推荐资讯的内容进行分析,根据推荐资讯的内容为每条推荐资讯提取兴趣标签。当已注册用户对推荐资讯进行操作时,记录已注册用户的操作行为,根据对推荐资讯的操作行为,对与该条推荐资讯对应的兴趣标签进行加权计算,计算得出与已注册用户对应的兴趣标签的权重值。当标签权重值大于阈值时,将该标签加入到与该用户对应的兴趣标签字典当中。
在实际应用当中,在资讯客户端中的推荐服务对客户端推荐的推荐资讯内容会打上兴趣标签,例如:针对内容的分类:科技、足球、篮球等,针对对应的人群的分类:技术宅、户外爱好者、青少年等,针对内容的关键词:iPhone,坦克大赛,拜仁慕尼黑等。这些兴趣标签有时是人工编辑的,有时是算法自动分析推荐资讯识别的。
在推荐服务可推荐的所有推荐资讯有兴趣标签的情况下,通过记录已注册用户使用推荐服务的行为数据,例如:浏览内容,点击/收藏/评论内容等,并根据与资讯内容对应的兴趣标签得到用户的兴趣标签字典。这个兴趣标签字典描述了用户有哪些兴趣标签,每个兴趣标签的权重是多少。这个兴趣标签字典可以作为兴趣模型在后续步骤中使用。
具体的,兴趣标签的标签权重值的计算方法可以包括:
首先,对于每种用户动作act设定一个权重w,比如点击记1分,浏览但是没有点击记-0.2分,收藏记5分。
给定一个用户动作序列[act1,act2,…,act3],用户的兴趣标签权重值计算如下:
V=ΣiTi·wi;
其中Ti代表第i个用户动作的兴趣标签向量,wi代表第i个用户动作的权重。
优选的,本申请提供的优选实施例中,步骤S17根据已注册用户的兴趣标签字典和第一关注集合,构建兴趣模型的步骤包括:
步骤S171,对第一关注集合进行筛选,得到与已注册用户对应的第三关注集合,其中,筛选方法至少包括:数据筛选法、指标筛选法、条件筛选法和信息筛选法。
步骤S173,通过第三关注集合对已注册用户进行匹配,生成已注册用户集合,其中,已注册用户集合包括拥有相同第三关注集合的已注册用户。
步骤S175,根据已注册用户集合中包含的已注册用户的兴趣标签字典,生成与已注册用户集合对应的用户集合标签字典。
具体的,通过步骤S171至步骤S175,首先对已注册用户的第一关注集合进行筛选,可以按关注数量和/或好友数量和/或活跃度等条件将第一关注集合进行筛选,将不活跃、好友少的用户从第一关注集合中去除,生成经过筛选的第三关注集合。
将经过筛选的已注册用户通过第三关注集合进行匹配,将第三关注集合的匹配度大于预先设置的阈值或者第三关注集合完全相同的已注册用户划分入相同的已注册用户集合。根据第三关注集合的内容差异,已注册用户集合可以由很多个。当然,第三关注集合也可以由人为定义,根据人为定义的第三关注集合,将已注册用户进行分组,分入不同的已注册用户集合当中。
根据已注册用户集合当中与各个已注册用户对应的兴趣标签字典的内容,生成与当前已注册用户集合对应的用户集合标签字典。
上述社交平台以微博为例,如图3所示,图3是通过微博中关注集合对已注册用户进行匹配生成已注册用户集合的流程示意图。
根据对已注册用户的关注列表进行获取,以粉丝数量作为筛选条件,将关注列表中粉丝数量较少的用户筛选过滤。根据筛选过的关注列表生成第三关注集合。当然,对于微博来说,也可以人为对第三关注集合进行定义。例如,将微博中的特定用户按照用户类别进行划分,可以将李开复、雷军、周鸿祎、李彦宏等计算机互联网领域的用户化为一个第三关注集合,可以讲何炅、谢娜、戴军等娱乐传媒领域的用户化为一个第三关注集合,还可以将魏克星、李娜、刘翔等体育运动领域的用户划分为一个第三关注集合。
根据第三关注集合,将已注册的用户进行分类划分,将拥有共同第三关注集合的已注册用户划分到一个已注册用户集合当中,以达到类似兴趣用户群体的目的。
优选的,本申请提供的优选实施例中,步骤S175根据已注册用户集合中包含的已注册用户的兴趣标签字典,生成与已注册用户集合对应的用户集合标签字典的步骤包括:
步骤S1751,获取资讯客户端上已注册用户的第一用户数量和已注册用户集合的第二用户数量。
步骤S1753,根据标签权重值和第一用户数量,计算各个兴趣标签的权重分布平均值。
步骤S1755,根据已注册用户集合中的已注册用户的标签权重值和第二用户数量,计算用户集合兴趣标签字典中的各个兴趣标签的集合权重平均值。
步骤S1757,根据权重分布平均值和集合权重平均值,计算得出兴趣标签在用户集合兴趣标签字典中的已注册用户集合权重值。
步骤S1759,依次将兴趣标签在用户集合兴趣标签字典中的已注册用户集合权重值与预先设定的噪声阈值进行比较。
当兴趣标签在用户集合兴趣标签字典中的已注册用户集合权重值大于预先设定的噪声阈值时,在用户集合标签字典中保留与已注册用户集合权重值对应的兴趣标签;
当兴趣标签在用户集合兴趣标签字典中的已注册用户集合权重值小于或等于预先设定的噪声阈值时,在用户集合标签字典中删除与已注册用户集合权重值对应的兴趣标签。
具体的,步骤S1751至步骤S1759在实际应用当中,社交平台以微博为例,在找到类似兴趣用户群体后,可以合并这些用户个体的兴趣标签字典获得群体兴趣模型。最简单方法就是把用户标签向量直接相加。但是在实际应用当中,发现这样做的结果有很大噪声,因为某些领域的微博大号关注者非常多,很多人仅仅是因为这个大号有名气而关注,关注行为本身无法反映自身兴趣,如果简单的把这些用户的兴趣标签向量加和,有意义的信号就容易被普遍兴趣淹没。举个实际实验中的例子,分析关注王兴(美团网创始人)的微博用户,我们发现权重最大的兴趣标签不是“互联网”,“O2O”,而是“娱乐”,“社会新闻”。这是因为“娱乐”和“社会新闻”是普遍的兴趣标签,很多有这两个标签的用户因为王兴是美团网的创始人关注了他,但其实对“互联网”和“O2O”没那么关注。最终我们如果不加区别的考虑所有这些用户,就会得到“娱乐”和“社会新闻”权重比“互联网”,“O2O”更高的结果。
如何去除背景噪声是有效挖掘群体兴趣的核心技术。在实践中,我们首先需要统计全体站的已注册用户的权重分布平均值:
Figure PCTCN2015083804-appb-000001
其中N表示所有注册用户的数量,Vn表示一个用户的兴趣标签权重分布;
通过上述公式,进而求得全体用户在兴趣标签i上的权重分布平均值Vbase[i];
然后对关注关系中拥有某一相同条件的已注册用户集合,(比如:在微博中,所有关注集合中,关注“王兴”的已注册用户的集合),给定这个已注册用户集合群体兴趣标签向量V,分别求得用于去除噪声的已注册用户集合权重值V’:
V’[i]=V[i]/Vbase[i];
其中V’[i]表示兴趣标签i的已注册用户集合权重值,V[i]表示兴趣标签i的兴趣标签的集合权重平均值,Vbase[i]表示全体用户在兴趣标签i上的权重分布平均值。
通过对已注册用户集合权重值V’和预先设定的噪声阈值进行比较,当已注册用户集合权重值V’小于该噪声阈值时,证明此兴趣标签为噪声标签,应当从当前用户集合标签字典中剔除出去;而当已注册用户集合权重值V’大于等于该噪声阈值时,判断该兴趣标签为非噪声标签,将该标签保留在当前的用户集合标签字典当中。
优选的,本申请提供的优选实施例中,在步骤S23将第二关注集合与兴趣模型进行匹配,根据兴趣模型确定新注册用户的推荐兴趣标签中,步骤包括:
步骤S231,对第二关注集合进行筛选,得到与新注册用户对应的第四关注集合,其中,筛选方法至少包括:数据筛选法、指标筛选法、条件筛选法和信息筛选法。
步骤S233,将第四关注集合与第三关注集进行匹配,确定与新注册用户对应的已注册用户集合。
步骤S235,根据与新注册用户对应的已注册用户集合的用户集合标签字典,确定新注册用户的推荐兴趣标签。
具体的,通过步骤S231至步骤S235,首先对新注册用户的第二关注集合进行筛选,可以按关注数量和/或好友数量和/或活跃度等条件将第二关注集合进行筛选,将不活跃、好友少的用户从第二关注集合中去除,生成经过筛选的第四关注集合。其中,筛选的方法可以与步骤171中所使用的筛选方法相同,也可以使用其他筛选方法。只要可以达到优化第二关注集合的目的,对所使用的筛选方法不做限制。
然后将第四关注集合与各个第三关注集合进行匹配,当新注册用户的第四关注集合与第三关注集合的匹配度大于预先设置的阈值或者第三关注集合完全相同时,确定该新注册用户与该第三关注集合匹配。从而确定该新注册用户所属的已注册用户集合。
根据新注册所属的已注册用户集合的用户集合标签字典,确定对该新用户推荐的推荐标签。
在实际应用当中,挖掘出一个与新注册用户兴趣相似用户群的群体兴趣模型后,我们可以按照一定权重融合这个群体兴趣模型和用户个体兴趣模型,然后根据融合后的兴趣模型来推荐内容。具体来说,给定一个融合后的兴趣模型(兴趣标签向量),我们可以按照每一个兴趣标签的权重等比例的推荐一些该标签下最优质的内容。
需要说明的是,对于新用户,我们没有任何该用户的站内动作数据,也就无从获得其个体兴趣模型。但是如果这个新用户是用社交平台的网络账号登陆资讯客户端的,我们可以获取该新注册用户社交平台上的社交关系,通过挖掘他的站内兴趣相似用户群,通过利用这个群体兴趣模型给用户推荐内容,就可以实现有针对性的推荐资讯。实际中,这种做法比随机推荐或者推荐最热门的内容效果更好。
优选的,本申请提供的优选实施例中,在步骤S23将第二关注集合与兴趣模型进行匹配,根据兴趣模型确定新注册用户的推荐兴趣标签之后,方法还包括:
步骤S24,根据推荐兴趣标签,为新注册用户推送推荐资讯。
具体的,通过步骤S24,根据通过上述步骤为新注册用户确定的兴趣标签,向新注册用户推送与兴趣标签匹配的推荐资讯。
从技术方案可以看出,本发明有效的结合了社交网络公开数据和推荐服务私有数据共同为用户推荐内容。同仅使用社交网络公开数据或推荐服务私有数据相比,融合两种数据有助于更精准的推荐个性化内容。而且本发明提出的融合方法对于新用户也可以利用两种数据的融合(基于站内数据挖掘的站内用户兴趣模型通过社交关系转移到新注册的站外用户身上),这个也是传统方法无法达到的效果。
本发明的一个特点是越是拥有大量用户的推荐服务商,这种方法的效果会越好。因为这样的推荐服务商其用户群体对于社交网络用户群体的覆盖面会比较大,不至于出现任给一个社交账号,其好友或者粉丝大部分都不是站内用户,无法挖掘群体兴趣的情况。这对今日头条这样拥有亿级用户的产品是一个显著的竞争优势,而对于一些较小的推荐产品则是一个技术壁垒。
实施例2
本发明实施例还提供了一种基于社交平台的数据挖掘装置,如图4所示,该装置包括:第一获取模块30、第二获取模块32、第一确定模块34、第一处理模块36、第三获取模块38、第二确定模块40和第二处理模块42。
其中,第一获取模块30,用于获取资讯客户端上已注册用户的兴趣标签字典。
本申请的第一获取模块30,用于通过对已注册用户的历史浏览行为的收集,分析得到与每个已注册用户对应的兴趣标签字典。
第二获取模块32,用于获取社交平台中与资讯客户端上已注册用户具有关注关系的第一对象,并读取已注册用户与第一对象之间的关系信息。
本申请的第二获取模块32,用于通过读取已注册用户在社交平台上的关注关系信息,确定与已注册用户有关注关系的对象。
在实际应用当中,关注关系可以是腾讯QQ软件中的好友关系,也可以是微博中的关注关系,也可以是人人网中的朋友关系。
第一确定模块34,用于根据已注册用户具有关注关系的第一对象,确定与已注册用户对应的第一关注集合。
本申请的第一确定模块34,用于通过分别对每个已注册用户有关注关系的第一对象进行整理,从而确定每个已注册用户的第一关注集合。
第一处理模块36,用于根据已注册用户的兴趣标签字典和第一关注集合,构建兴趣模型,其中,兴趣模型用于表征具有相同第一关注集合的已注册用户与兴趣标签的对应关系。
本申请的第一处理模块36,用于通过对每个已注册用户的关注集合进行分析,将具有不同第一关注集合的已注册用户进行分类,分为与若干个第一关注集合分别对应的已注册用户集合,并通过已注册用户集合中的已注册用户的兴趣标签字典,生成与第一关注集合对应的用户集合标签字典。从而确定第一关注集合与兴趣标签的对应关系。
第三获取模块38,用于获取资讯客户端上新注册用户在社交平台中与其具有关注关系的第二对象,并读取新注册用户与第二对象之间的关系信息。
本申请的第三获取模块38,用于通过读取新注册用户在社交平台上的关注关系信息,确定与新注册用户有关注关系的第二对象。
在实际应用当中,关注关系可以是腾讯QQ软件中的好友关系,也可以是微博中的关注关系,也可以是人人网中的朋友关系。
第二确定模块40,用于根据新注册用户具有关注关系的第二对象,确定与新注册用户的第二关注集合。
本申请的第二确定模块40,用于通过对新注册用户有关注关系的第二对象进行整理,从而确定新注册用户的第二关注集合。
第二处理模块42,用于将第二关注集合与兴趣模型进行匹配,根据兴趣模型确定新注册用户的推荐兴趣标签。
本申请的第二处理模块42,用于通过新注册用户的关注第二集合与兴趣模型中的若干个第一关注集合进行匹配,得到与新注册用户的第二关注集合相匹配的第一关注集合,从而通过该第一关注集合确定新注册用户的兴趣标签。
具体的,通过第一获取模块30、第二获取模块32、第一确定模块34、第一处理模块36、第三获取模块38、第二确定模块40和第二处理模块42,对在社交平台中有相同第一关注集合的已注册用户进行分组,得到与该第一关注集合对应的已注册用户集合,根据对资讯客户端上已注册用户的兴趣标签字典的获取,得到与已注册用户集合对应的用户集合标签字典。这样,就构建了一个拥有第一关注集合与用户集合标签字典对应关系的兴趣模型。在获取新注册用户的第二关注集合之后,直接通过第二关注集合与兴趣模型中的第一关注集合进行匹配,就可以得到新注册用户的推荐兴趣标签。
在实际应用当中,一般可以认为社交平台中的关系反映了用户的兴趣相似性。基于不同的假设,我们可以采用不同的方法在社交平台上找到和一个用户兴趣相似的其它用户。不同的假设适用于不同类型的社交平台,比如,对于腾讯QQ,微信这种强调双向交流的社交平台可以假设好友之间的兴趣是类似的。而对于微博这种强调单向关注的社交平台可以假设拥有共同关注对象的用户兴趣是类似的,例如,两个用户都关注了雷军、黄章,他们很可能都对智能手机感兴趣。
社交平台以微博为例,对在资讯客户端上已注册用户的微博关注列表的内容进行筛选,筛选出粉丝数超过一定数值的关注对象或者筛选出粉丝数前几名的关注对象构 成一个第一关注集合。以相同的筛选方式,对所有已注册用户的微博关注列表进行筛选,得到与每个已注册用户对应的第一关注集合,将拥有相同的第一关注集合的已注册用户归为若干个已注册用户集合,每个已注册用户集合具有不同的第一关注集合。通过收集已注册用户集合中的已注册用户的兴趣标签字典,得到与各个已注册用户集合对应的用户集合标签字典。当一个新注册用户注册资讯客户端后并授权资讯客户端调用微博公开数据之后,对该新用户的关注列表也进行同样方式的筛选,将筛选后的第二关注集合与若干个已注册用户集合的第一关注集合进行匹配,从而确定新用户所属的已注册用户集合,得到该已注册用户集合对应的用户集合标签字典,即新注册用户的推荐兴趣标签。
综上所述,本发明解决了现有技术中因新注册用户没有历史浏览记录,导致的无法提供有针对性的资讯的问题。实现了通过新注册用户在社交平台的关注关系为用户提供有针对性资讯的效果。
优选的,本申请提供的优选实施例中,如图5所示,装置还包括:第四获取模块281、提取模块283、第五获取模块285、第三确定模块287和第四确定模块289。
其中,第四获取模块281,用于获取推荐资讯。
提取模块283,用于从推荐资讯的内容提取推荐资讯的兴趣标签。
第五获取模块285,用于获取已注册用户的历史行为数据,其中,历史行为数据用于记录已注册用户对推荐资讯的操作行为。
第三确定模块287,用于根据历史行为数据,确定兴趣标签的标签权重值。
第四确定模块289,用于根据标签权重值,确定与已注册用户对应的兴趣标签字典。
具体的,通过第四获取模块281、提取模块283、第五获取模块285、第三确定模块287和第四确定模块289,对资讯客户端中的所有推荐资讯的内容进行分析,根据推荐资讯的内容为每条推荐资讯提取兴趣标签。当已注册用户对推荐资讯进行操作时,记录已注册用户的操作行为,根据对推荐资讯的操作行为,对与该条推荐资讯对应的兴趣标签进行加权计算,计算得出与已注册用户对应的兴趣标签的权重值。当标签权重值大于阈值时,将该标签加入到与该用户对应的兴趣标签字典当中。
在实际应用当中,在资讯客户端中的推荐服务对客户端推荐的推荐资讯内容会打上兴趣标签,例如:针对内容的分类:科技、足球、篮球等,针对对应的人群的分类: 技术宅、户外爱好者、青少年等,针对内容的关键词:iPhone,坦克大赛,拜仁慕尼黑等。这些兴趣标签有时是人工编辑的,有时是算法自动分析推荐资讯识别的。
在推荐服务可推荐的所有推荐资讯有兴趣标签的情况下,通过记录已注册用户使用推荐服务的行为数据,例如:浏览内容,点击/收藏/评论内容等,并根据与资讯内容对应的兴趣标签得到用户的兴趣标签字典。这个兴趣标签字典描述了用户有哪些兴趣标签,每个兴趣标签的权重是多少。这个兴趣标签字典可以作为兴趣模型在后续步骤中使用。
具体的,兴趣标签的标签权重值的计算方法可以包括:
首先,对于每种用户动作act设定一个权重w,比如点击记1分,浏览但是没有点击记-0.2分,收藏记5分。
给定一个用户动作序列[act1,act2,…,act3],用户的兴趣标签权重值计算如下:
V=ΣiTi·wi;
其中Ti代表第i个用户动作的兴趣标签向量,wi代表第i个用户动作的权重。
优选的,本申请提供的优选实施例中,第一处理模块36,包括:第一子处理模块361、子匹配模块363和第一生成模块365。
其中,第一子处理模块361,用于对第一关注集合进行筛选,得到与已注册用户对应的第三关注集合,其中,筛选方法至少包括:数据筛选法、指标筛选法、条件筛选法和信息筛选法。
子匹配模块363,用于通过第三关注集合对已注册用户进行匹配,生成已注册用户集合,其中,已注册用户集合包括拥有相同第三关注集合的已注册用户。
第一生成模块365,用于根据已注册用户集合中包含的已注册用户的兴趣标签字典,生成与已注册用户集合对应的用户集合标签字典。
具体的,通过第一子处理模块361、子匹配模块363和第一生成模块365,首先对已注册用户的第一关注集合进行筛选,可以按关注数量和/或好友数量和/或活跃度等条件将第一关注集合进行筛选,将不活跃、好友少的用户从第一关注集合中去除,生成经过筛选的第三关注集合。
将经过筛选的已注册用户通过第三关注集合进行匹配,将第三关注集合的匹配度大于预先设置的阈值或者第三关注集合完全相同的已注册用户划分入相同的已注册用户集合。根据第三关注集合的内容差异,已注册用户集合可以由很多个。当然,第三关注集合也可以由人为定义,根据人为定义的第三关注集合,将已注册用户进行分组,分入不同的已注册用户集合当中。
根据已注册用户集合当中与各个已注册用户对应的兴趣标签字典的内容,生成与当前已注册用户集合对应的用户集合标签字典。
上述社交平台以微博为例,如图3所示,图3是通过微博中关注集合对已注册用户进行匹配生成已注册用户集合的流程示意图。
根据对已注册用户的关注列表进行获取,以粉丝数量作为筛选条件,将关注列表中粉丝数量较少的用户筛选过滤。根据筛选过的关注列表生成第三关注集合。当然,对于微博来说,也可以人为对第三关注集合进行定义。例如,将微博中的特定用户按照用户类别进行划分,可以将李开复、雷军、周鸿祎、李彦宏等计算机互联网领域的用户化为一个第三关注集合,可以讲何炅、谢娜、戴军等娱乐传媒领域的用户化为一个第三关注集合,还可以将魏克星、李娜、刘翔等体育运动领域的用户划分为一个第三关注集合。
根据第三关注集合,将已注册的用户进行分类划分,将拥有共同第三关注集合的已注册用户划分到一个已注册用户集合当中,以达到类似兴趣用户群体的目的。
优选的,本申请提供的优选实施例中,第一生成模块365,包括:第一子获取模块3651、第一子计算模块3652、第二子计算模块3653、第三子计算模块3654和子判断模块3655。
其中,第一子获取模块3651,用于获取资讯客户端上已注册用户的第一用户数量和已注册用户集合的第二用户数量。
第一子计算模块3652,用于根据标签权重值和第一用户数量,计算各个兴趣标签的权重分布平均值。
第二子计算模块3653,用于根据已注册用户集合中的已注册用户的标签权重值和第二用户数量,计算用户集合兴趣标签字典中的各个兴趣标签的集合权重平均值。
第三子计算模块3654,用于根据权重分布平均值和集合权重平均值,计算得出兴趣标签在用户集合兴趣标签字典中的已注册用户集合权重值。
子判断模块3655,用于依次将兴趣标签在用户集合兴趣标签字典中的已注册用户集合权重值与预先设定的噪声阈值进行比较。
当兴趣标签在用户集合兴趣标签字典中的已注册用户集合权重值大于预先设定的噪声阈值时,在用户集合标签字典中保留与已注册用户集合权重值对应的兴趣标签。
当兴趣标签在用户集合兴趣标签字典中的已注册用户集合权重值小于或等于预先设定的噪声阈值时,在用户集合标签字典中删除与已注册用户集合权重值对应的兴趣标签。
具体的,通过第一生成模块365,包括:第一子获取模块3651、第一子计算模块3652、第二子计算模块3653、第三子计算模块3654和子判断模块3655在实际应用当中的应用,社交平台以微博为例,在找到类似兴趣用户群体后,可以合并这些用户个体的兴趣标签字典获得群体兴趣模型。最简单方法就是把用户标签向量直接相加。但是在实际应用当中,发现这样做的结果有很大噪声,因为某些领域的微博大号关注者非常多,很多人仅仅是因为这个大号有名气而关注,关注行为本身无法反映自身兴趣,如果简单的把这些用户的兴趣标签向量加和,有意义的信号就容易被普遍兴趣淹没。举个实际实验中的例子,分析关注王兴(美团网创始人)的微博用户,我们发现权重最大的兴趣标签不是“互联网”,“O2O”,而是“娱乐”,“社会新闻”。这是因为“娱乐”和“社会新闻”是普遍的兴趣标签,很多有这两个标签的用户因为王兴是美团网的创始人关注了他,但其实对“互联网”和“O2O”没那么关注。最终我们如果不加区别的考虑所有这些用户,就会得到“娱乐”和“社会新闻”权重比“互联网”,“O2O”更高的结果。
如何去除背景噪声是有效挖掘群体兴趣的核心技术。在实践中,我们首先需要统计全体站的已注册用户的权重分布平均值:
Figure PCTCN2015083804-appb-000002
其中N表示所有注册用户的数量,Vn表示一个用户的兴趣标签权重分布;
通过上述公式,进而求得全体用户在兴趣标签i上的权重分布平均值Vbase[i];
然后对关注关系中拥有某一相同条件的已注册用户集合,(比如:在微博中,所有关注集合中,关注“王兴”的已注册用户的集合),给定这个已注册用户集合群体兴趣标签向量V,分别求得用于去除噪声的已注册用户集合权重值V’:
V’[i]=V[i]/Vbase[i];
其中V’[i]表示兴趣标签i的已注册用户集合权重值,V[i]表示兴趣标签i的兴趣标签的集合权重平均值,Vbase[i]表示全体用户在兴趣标签i上的权重分布平均值。
通过对已注册用户集合权重值V’和预先设定的噪声阈值进行比较,当已注册用户集合权重值V’小于该噪声阈值时,证明此兴趣标签为噪声标签,应当从当前用户集合标签字典中剔除出去;而当已注册用户集合权重值V’大于等于该噪声阈值时,判断该兴趣标签为非噪声标签,将该标签保留在当前的用户集合标签字典当中。
通过对已注册用户集合权重值V’和预先设定的噪声阈值进行比较,当已注册用户集合权重值V’小于该噪声阈值时,证明此兴趣标签为噪声标签,应当从当前用户集合标签字典中剔除出去;而当已注册用户集合权重值V’大于等于该噪声阈值时,判断该兴趣标签为非噪声标签,将该标签保留在当前的用户集合标签字典当中。
优选的,本申请提供的优选实施例中,第二处理模块42,包括:第二子处理模块421、第一子确定模块423和第二子确定模块425。
其中,第二子处理模块421,用于对第二关注集合进行筛选,得到与新注册用户对应的第四关注集合,其中,筛选装置至少包括:数据筛选法、指标筛选法、条件筛选法和信息筛选法。
第一子确定模块423,用于将第四关注集合与第三关注集进行匹配,确定与新注册用户对应的已注册用户集合。
第二子确定模块425,用于根据与新注册用户对应的已注册用户集合的用户集合标签字典,确定新注册用户的推荐兴趣标签。
具体的,通过第二子处理模块421、第一子确定模块423和第二子确定模块425,首先对新注册用户的第二关注集合进行筛选,可以按关注数量和/或好友数量和/或活跃度等条件将第二关注集合进行筛选,将不活跃、好友少的用户从第二关注集合中去除,生成经过筛选的第四关注集合。其中,筛选的方法可以与步骤171中所使用的筛选方法相同,也可以使用其他筛选方法。只要可以达到优化第二关注集合的目的,对所使用的筛选方法不做限制。
然后将第四关注集合与各个第三关注集合进行匹配,当新注册用户的第四关注集合与第三关注集合的匹配度大于预先设置的阈值或者第三关注集合完全相同时,确定该新注册用户与该第三关注集合匹配。从而确定该新注册用户所属的已注册用户集合。
根据新注册所属的已注册用户集合的用户集合标签字典,确定对该新用户推荐的推荐标签。
在实际应用当中,挖掘出一个与新注册用户兴趣相似用户群的群体兴趣模型后,我们可以按照一定权重融合这个群体兴趣模型和用户个体兴趣模型,然后根据融合后的兴趣模型来推荐内容。具体来说,给定一个融合后的兴趣模型(兴趣标签向量),我们可以按照每一个兴趣标签的权重等比例的推荐一些该标签下最优质的内容。
需要说明的是,对于新用户,我们没有任何该用户的站内动作数据,也就无从获得其个体兴趣模型。但是如果这个新用户是用社交平台的网络账号登陆资讯客户端的,我们可以获取该新注册用户社交平台上的社交关系,通过挖掘他的站内兴趣相似用户群,通过利用这个群体兴趣模型给用户推荐内容,就可以实现有针对性的推荐资讯。实际中,这种做法比随机推荐或者推荐最热门的内容效果更好。
优选的,本申请提供的优选实施例中,如图6所示,装置还包括:推送模块43。
其中,推送模块43,用于根据推荐兴趣标签,为新注册用户推送推荐资讯。
具体的,通过推送模块43,根据通过上述步骤为新注册用户确定的兴趣标签,向新注册用户推送与兴趣标签匹配的推荐资讯。
从技术方案可以看出,本发明有效的结合了社交网络公开数据和推荐服务私有数据共同为用户推荐内容。同仅使用社交网络公开数据或推荐服务私有数据相比,融合两种数据有助于更精准的推荐个性化内容。而且本发明提出的融合方法对于新用户也可以利用两种数据的融合(基于站内数据挖掘的站内用户兴趣模型通过社交关系转移到新注册的站外用户身上),这个也是传统方法无法达到的效果。
本发明的一个特点是越是拥有大量用户的推荐服务商,这种方法的效果会越好。因为这样的推荐服务商其用户群体对于社交网络用户群体的覆盖面会比较大,不至于出现任给一个社交账号,其好友或者粉丝大部分都不是站内用户,无法挖掘群体兴趣的情况。这对今日头条这样拥有亿级用户的产品是一个显著的竞争优势,而对于一些较小的推荐产品则是一个技术壁垒。
本申请实施例所提供的各个功能单元可以在移动终端、计算机终端或者类似的运算装置中运行,也可以作为存储介质的一部分进行存储。
由此,本发明的实施例可以提供一种计算机终端,该计算机终端可以是计算机终端群中的任意一个计算机终端设备。可选地,在本实施例中,上述计算机终端也可以替换为移动终端等终端设备。
可选地,在本实施例中,上述计算机终端可以位于计算机网络的多个网络设备中的至少一个网络设备。
在本实施例中,上述计算机终端可以执行基于社交平台的数据挖掘方法中以下步骤的程序代码:获取资讯客户端上已注册用户的兴趣标签字典;获取社交平台中与资讯客户端上已注册用户具有关注关系的第一对象,并读取已注册用户与第一对象之间的关系信息;根据已注册用户具有关注关系的第一对象,确定与已注册用户对应的第一关注集合;根据已注册用户的兴趣标签字典和第一关注集合,构建兴趣模型,其中,兴趣模型用于表征具有相同第一关注集合的已注册用户与兴趣标签的对应关系;获取资讯客户端上新注册用户在社交平台中与其具有关注关系的第二对象,并读取新注册用户与第二对象之间的关系信息;根据新注册用户具有关注关系的第二对象,确定与新注册用户的第二关注集合;将第二关注集合与兴趣模型进行匹配,根据兴趣模型确定新注册用户的推荐兴趣标签。
可选地,该计算机终端可以包括:一个或多个处理器、存储器、以及传输装置。
其中,存储器可用于存储软件程序以及模块,如本发明实施例中的基于社交平台的数据挖掘方法对应的程序指令/模块,处理器通过运行存储在存储器内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的基于社交平台的数据挖掘方法。存储器可包括高速随机存储器,还可以包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器可进一步包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
上述的传输装置用于经由一个网络接收或者发送数据。上述的网络具体实例可包括有线网络及无线网络。在一个实例中,传输装置包括一个网络适配器(Network Interface Controller,NIC),其可通过网线与其他网络设备与路由器相连从而可与互联网或局域网进行通讯。在一个实例中,传输装置为射频(Radio Frequency,RF)模块,其用于通过无线方式与互联网进行通讯。
其中,具体地,存储器用于存储预设动作条件和预设权限用户的信息、以及应用程序。
处理器可以通过传输装置调用存储器存储的信息及应用程序,以执行上述方法实施例中的各个可选或优选实施例的方法步骤的程序代码。
本领域普通技术人员可以理解,计算机终端也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。
本发明的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以用于保存上述方法实施例所提供的基于社交平台的数据挖掘方法所执行的程序代码。
可选地,在本实施例中,上述存储介质可以位于计算机网络中计算机终端群中的任意一个计算机终端中,或者位于移动终端群中的任意一个移动终端中。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的程序代码:获取资讯客户端上已注册用户的兴趣标签字典;获取社交平台中与资讯客户端上已注册用户具有关注关系的第一对象,并读取已注册用户与第一对象之间的关系信息;根据已注册用户具有关注关系的第一对象,确定与已注册用户对应的第一关注集合;根据已注册用户的兴趣标签字典和第一关注集合,构建兴趣模型,其中,兴趣模型用于表征具有相同第一关注集合的已注册用户与兴趣标签的对应关系;获取资讯客户端上新注册用户在社交平台中与其具有关注关系的第二对象,并读取新注册用户与第二对象之间的关系信息;根据新注册用户具有关注关系的第二对象,确定与新注册用户的第二关注集合;将第二关注集合与兴趣模型进行匹配,根据兴趣模型确定新注册用户的推荐兴趣标签。
可选地,在本实施例中,存储介质还可以被设置为存储用于执行基于社交平台的数据挖掘方法提供的各种优选地或可选的方法步骤的程序代码。
如上参照附图以示例的方式描述了根据本发明的基于社交平台的数据挖掘方法。但是,本领域技术人员应当理解,对于上述本发明所提出的页面排版方法及系统,还可以在不脱离本发明内容的基础上做出各种改进。因此,本发明的保护范围应当由所附的权利要求书的内容确定。
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (16)

  1. 一种基于社交平台的数据挖掘方法,其特征在于,包括:
    获取资讯客户端上已注册用户的兴趣标签字典;
    获取社交平台中与所述资讯客户端上已注册用户具有关注关系的第一对象,并读取所述已注册用户与所述第一对象之间的关系信息;
    根据所述已注册用户具有关注关系的所述第一对象,确定与所述已注册用户对应的第一关注集合;
    根据所述已注册用户的所述兴趣标签字典和所述第一关注集合,构建兴趣模型,其中,所述兴趣模型用于表征具有相同所述第一关注集合的所述已注册用户与兴趣标签的对应关系;
    获取所述资讯客户端上新注册用户在社交平台中与其具有关注关系的第二对象,并读取所述新注册用户与所述第二对象之间的关系信息;
    根据所述新注册用户具有关注关系的所述第二对象,确定与所述新注册用户的第二关注集合;
    将所述第二关注集合与所述兴趣模型进行匹配,根据所述兴趣模型确定所述新注册用户的推荐兴趣标签。
  2. 根据权利要求1所述的方法,其特征在于,在所述获取资讯客户端上已注册用户的兴趣标签字典之前,所述方法包括:
    获取推荐资讯;
    从所述推荐资讯的内容提取所述推荐资讯的所述兴趣标签;
    获取所述已注册用户的历史行为数据,其中,所述历史行为数据用于记录所述已注册用户对所述推荐资讯的操作行为;
    根据所述历史行为数据,确定所述兴趣标签的标签权重值;
    根据所述标签权重值,确定与所述已注册用户对应的所述兴趣标签字典。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述已注册用户的兴趣标签字典和所述第一关注集合,构建兴趣模型的步骤包括:
    对所述第一关注集合进行筛选,得到与所述已注册用户对应的第三关注集合,其中,所述筛选的方法至少包括:数据筛选法、指标筛选法、条件筛选法和信息筛选法;
    通过所述第三关注集合对所述已注册用户进行匹配,生成已注册用户集合,其中,所述已注册用户集合包括拥有相同所述第三关注集合的所述已注册用户;
    根据所述已注册用户集合中包含的所述已注册用户的所述兴趣标签字典,生成与所述已注册用户集合对应的用户集合标签字典。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述已注册用户集合中包含的所述已注册用户的所述兴趣标签字典,生成与所述已注册用户集合对应的用户集合标签字典的步骤包括:
    获取所述资讯客户端上已注册用户的第一用户数量和所述已注册用户集合的第二用户数量;
    根据所述标签权重值和所述第一用户数量,计算各个所述兴趣标签的权重分布平均值;
    根据所述已注册用户集合中的所述已注册用户的所述标签权重值和所述第二用户数量,计算用户集合兴趣标签字典中的各个所述兴趣标签的集合权重平均值;
    根据所述权重分布平均值和所述集合权重平均值,计算得出所述兴趣标签在所述用户集合兴趣标签字典中的已注册用户集合权重值;
    依次将所述兴趣标签在所述用户集合兴趣标签字典中的所述已注册用户集合权重值与预先设定的噪声阈值进行比较;
    当所述兴趣标签在所述用户集合兴趣标签字典中的所述已注册用户集合权重值大于预先设定的噪声阈值时,在所述用户集合标签字典中保留与所述已注册用户集合权重值对应的兴趣标签;
    当所述兴趣标签在所述用户集合兴趣标签字典中的所述已注册用户集合权重值小于或等于预先设定的噪声阈值时,在所述用户集合标签字典中删除与所述已注册用户集合权重值对应的兴趣标签。
  5. 根据权利要求4所述的方法,其特征在于,所述将所述第二关注集合与所述兴趣模型进行匹配,根据所述兴趣模型确定所述新注册用户的推荐兴趣标签的步骤包括:
    对所述第二关注集合进行筛选,得到与所述新注册用户对应的第四关注集合,其中,所述筛选的方法至少包括:数据筛选法、指标筛选法、条件筛选法和信息筛选法;
    将所述第四关注集合与所述第三关注集进行匹配,确定与所述新注册用户对应的所述已注册用户集合;
    根据与所述新注册用户对应的所述已注册用户集合的所述用户集合标签字典,确定所述新注册用户的所述推荐兴趣标签。
  6. 根据权利要求1至5中任意一项所述的方法,其特征在于,在所述将所述第二关注集合与所述兴趣模型进行匹配,根据所述兴趣模型确定所述新注册用户的推荐兴趣标签之后,所述方法还包括:
    根据所述推荐兴趣标签,为所述新注册用户推送所述推荐资讯。
  7. 根据权利要求4所述的方法,其特征在于,通过如下公式确定兴趣标签的所述已注册用户集合权重值V’[i]:
    V’[i]=V[i]/Vbase[i];
    其中,V’[i]表示兴趣标签i的所述已注册用户集合权重值,V[i]表示兴趣标签i的所述集合权重平均值,Vbase[i]表示兴趣标签i的所述权重分布平均值。
  8. 一种基于社交平台的数据挖掘装置,其特征在于,包括:
    第一获取模块,用于获取资讯客户端上已注册用户的兴趣标签字典;
    第二获取模块,用于获取社交平台中与所述资讯客户端上已注册用户具有关注关系的第一对象,并读取所述已注册用户与所述第一对象之间的关系信息;
    第一确定模块,用于根据所述已注册用户具有关注关系的所述第一对象,确定与所述已注册用户对应的第一关注集合;
    第一处理模块,用于根据所述已注册用户的所述兴趣标签字典和所述第一关注集合,构建兴趣模型,其中,所述兴趣模型用于表征具有相同所述第一关注集合的所述已注册用户与兴趣标签的对应关系;
    第三获取模块,用于获取所述资讯客户端上新注册用户在社交平台中与其具有关注关系的第二对象,并读取所述新注册用户与所述第二对象之间的关系信息;
    第二确定模块,用于根据所述新注册用户具有关注关系的所述第二对象,确定与所述新注册用户的第二关注集合;
    第二处理模块,用于将所述第二关注集合与所述兴趣模型进行匹配,根据所述兴趣模型确定所述新注册用户的推荐兴趣标签。
  9. 根据权利要求8所述的装置,其特征在于,所述装置还包括:
    第四获取模块,用于获取推荐资讯;
    提取模块,用于从所述推荐资讯的内容提取所述推荐资讯的所述兴趣标签;
    第五获取模块,用于获取所述已注册用户的历史行为数据,其中,所述历史行为数据用于记录所述已注册用户对所述推荐资讯的操作行为;
    第三确定模块,用于根据所述历史行为数据,确定所述兴趣标签的标签权重值;
    第四确定模块,用于根据所述标签权重值,确定与所述已注册用户对应的所述兴趣标签字典。
  10. 根据权利要求9所述的装置,其特征在于,所述第一处理模块,包括:
    第一子处理模块,用于对所述第一关注集合进行筛选,得到与所述已注册用户对应的第三关注集合,其中,所述筛选的方法至少包括:数据筛选法、指标筛选法、条件筛选法和信息筛选法;
    子匹配模块,用于通过所述第三关注集合对所述已注册用户进行匹配,生成已注册用户集合,其中,所述已注册用户集合包括拥有相同所述第三关注集合的所述已注册用户;
    第一生成模块,用于根据所述已注册用户集合中包含的所述已注册用户的所述兴趣标签字典,生成与所述已注册用户集合对应的用户集合标签字典。
  11. 根据权利要求10所述的装置,其特征在于,所述第一生成模块,包括:
    第一子获取模块,用于获取所述资讯客户端上已注册用户的第一用户数量和所述已注册用户集合的第二用户数量;
    第一子计算模块,用于根据所述标签权重值和所述第一用户数量,计算各个所述兴趣标签的权重分布平均值;
    第二子计算模块,用于根据所述已注册用户集合中的所述已注册用户的所述标签权重值和所述第二用户数量,计算用户集合兴趣标签字典中的各个所述兴趣标签的集合权重平均值;
    第三子计算模块,用于根据所述权重分布平均值和所述集合权重平均值,计算得出所述兴趣标签在所述用户集合兴趣标签字典中的已注册用户集合权重值;
    子判断模块,用于依次将所述兴趣标签在所述用户集合兴趣标签字典中的所述已注册用户集合权重值与预先设定的噪声阈值进行比较;
    当所述兴趣标签在所述用户集合兴趣标签字典中的所述已注册用户集合权重值大于预先设定的噪声阈值时,在所述用户集合标签字典中保留与所述已注册用户集合权重值对应的兴趣标签;
    当所述兴趣标签在所述用户集合兴趣标签字典中的所述已注册用户集合权重值小于或等于预先设定的噪声阈值时,在所述用户集合标签字典中删除与所述已注册用户集合权重值对应的兴趣标签。
  12. 根据权利要求11所述的装置,其特征在于,所述第二处理模块,包括:
    第二子处理模块,用于对所述第二关注集合进行筛选,得到与所述新注册用户对应的第四关注集合,其中,所述筛选的方法至少包括:数据筛选法、指标筛选法、条件筛选法和信息筛选法;
    第一子确定模块,用于将所述第四关注集合与所述第三关注集进行匹配,确定与所述新注册用户对应的所述已注册用户集合;
    第二子确定模块,用于根据与所述新注册用户对应的所述已注册用户集合的所述用户集合标签字典,确定所述新注册用户的所述推荐兴趣标签。
  13. 根据权利要求8至12中任意一项所述的装置,其特征在于,所述装置还包括:
    推送模块,用于根据所述推荐兴趣标签,为所述新注册用户推送所述推荐资讯。
  14. 根据权利要求11所述的装置,其特征在于,通过如下公式确定兴趣标签的所述已注册用户集合权重值V’[i]:
    V’[i]=V[i]/Vbase[i];
    其中,V’[i]表示兴趣标签i的所述已注册用户集合权重值,V[i]表示兴趣标签i的所述集合权重平均值,Vbase[i]表示兴趣标签i的所述权重分布平均值。
  15. 一种计算机终端,用于执行所述权利要求1所述的基于社交平台的数据挖掘方法提供的步骤的程序代码。
  16. 一种存储介质,用于保存所述权利要求1所述的基于社交平台的数据挖掘方法所执行的程序代码。
PCT/CN2015/083804 2014-11-10 2015-07-10 基于社交平台的数据挖掘方法及装置 WO2016074492A1 (zh)

Priority Applications (6)

Application Number Priority Date Filing Date Title
US15/525,870 US10360230B2 (en) 2014-11-10 2015-07-10 Method and device for social platform-based data mining
EP15859244.4A EP3220289A4 (en) 2014-11-10 2015-07-10 Social platform-based data mining method and device
BR112017009666A BR112017009666A2 (pt) 2014-11-10 2015-07-10 método e dispositivo para mineração de dados com base em plataforma social
JP2017525373A JP6438135B2 (ja) 2014-11-10 2015-07-10 ソーシャルプラットフォームに基づくデータマイニング方法及び装置
CA2966757A CA2966757C (en) 2014-11-10 2015-07-10 Method and device for social platform-based data mining
MX2017006054A MX2017006054A (es) 2014-11-10 2015-07-10 Procedimiento y dispositivo para mineria de datos basada en plataforma social.

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410645497.2 2014-11-10
CN201410645497.2A CN104317959B (zh) 2014-11-10 2014-11-10 基于社交平台的数据挖掘方法及装置

Publications (1)

Publication Number Publication Date
WO2016074492A1 true WO2016074492A1 (zh) 2016-05-19

Family

ID=52373191

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/083804 WO2016074492A1 (zh) 2014-11-10 2015-07-10 基于社交平台的数据挖掘方法及装置

Country Status (8)

Country Link
US (1) US10360230B2 (zh)
EP (1) EP3220289A4 (zh)
JP (1) JP6438135B2 (zh)
CN (2) CN108197330B (zh)
BR (1) BR112017009666A2 (zh)
CA (1) CA2966757C (zh)
MX (1) MX2017006054A (zh)
WO (1) WO2016074492A1 (zh)

Families Citing this family (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197330B (zh) * 2014-11-10 2019-10-29 北京字节跳动网络技术有限公司 基于社交平台的数据挖掘方法及装置
CN105824855B (zh) * 2015-01-09 2019-12-13 阿里巴巴集团控股有限公司 一种对数据对象筛选分类的方法、装置以及电子设备
CN104991973B (zh) * 2015-07-31 2018-11-13 网易传媒科技(北京)有限公司 一种用户兴趣领域的确定方法和设备
CN105354018B (zh) 2015-09-29 2019-05-21 小米科技有限责任公司 用于更换显示背景的方法、装置及设备
CN105303398B (zh) * 2015-09-29 2020-03-27 努比亚技术有限公司 一种信息显示方法和系统
CN106503050B (zh) * 2016-09-23 2021-04-16 耀灵人工智能(浙江)有限公司 一种基于大数据进行阅读文章推荐的方法与系统
CN106357517B (zh) 2016-09-27 2020-09-11 腾讯科技(北京)有限公司 定向标签生成方法及装置
CN107967276A (zh) * 2016-10-19 2018-04-27 阿里巴巴集团控股有限公司 一种推荐对象的方法和设备
CN107103033B (zh) * 2017-03-21 2021-04-27 创新先进技术有限公司 冷启动用户的偏好预测方法和装置
CN107656918B (zh) * 2017-05-10 2019-07-05 平安科技(深圳)有限公司 获取目标用户的方法及装置
CN107452401A (zh) * 2017-05-27 2017-12-08 北京字节跳动网络技术有限公司 一种广告语音识别方法及装置
CN109145280B (zh) * 2017-06-15 2023-05-12 北京京东尚科信息技术有限公司 信息推送的方法和装置
CN107688605B (zh) * 2017-07-26 2019-02-26 平安科技(深圳)有限公司 跨平台数据匹配方法、装置、计算机设备和存储介质
CN107451255B (zh) * 2017-07-31 2020-05-19 陕西识代运筹信息科技股份有限公司 一种基于关注关系的用户兴趣处理方法和装置
CN110020117B (zh) * 2017-09-29 2022-05-03 北京搜狗科技发展有限公司 一种兴趣信息获取方法、装置及电子设备
CN107909428A (zh) * 2017-11-01 2018-04-13 平安科技(深圳)有限公司 电子装置、产品推荐方法和计算机可读存储介质
CN107886357A (zh) * 2017-11-06 2018-04-06 北京希格斯科技发展有限公司 基于用户行为数据判定内容价值的方法和系统
CN108197211A (zh) * 2017-12-28 2018-06-22 百度在线网络技术(北京)有限公司 一种信息推荐方法、装置、服务器和存储介质
AU2019206495A1 (en) 2018-01-11 2020-09-03 Editorji Technologies Private Limited Method and system for customized content
CN108763189B (zh) * 2018-04-12 2022-03-25 武汉斗鱼网络科技有限公司 一种直播间内容标签权重计算方法、装置及电子设备
CA183996S (en) * 2018-05-03 2019-07-17 Beijing Kuaimajiabian Technology Co Ltd Display screen with graphical user interface
CA184007S (en) * 2018-05-03 2019-07-17 Beijing Kuaimajiabian Technology Co Ltd Display screen with graphical user interface
CA184013S (en) * 2018-05-03 2019-07-17 Beijing Kuaimajiabian Technology Co Ltd Display screen with graphical user interface
CA184012S (en) * 2018-05-03 2019-07-17 Beijing Kuaimajiabian Technology Co Ltd Display screen with graphical user interface
CN108615199A (zh) * 2018-05-11 2018-10-02 国家计算机网络与信息安全管理中心 基于互联网公开论坛注册情况的用户活动轨迹挖掘方法
USD875124S1 (en) * 2018-07-06 2020-02-11 Beijing Microlive Vision Technology Co., Ltd. Display screen or portion thereof with a graphical user interface
CN109241529B (zh) * 2018-08-29 2023-05-02 中国联合网络通信集团有限公司 观点标签的确定方法和装置
CN110968780B (zh) * 2018-09-30 2021-11-16 腾讯科技(深圳)有限公司 页面内容推荐方法、装置、计算机设备和存储介质
CN109787784B (zh) * 2018-10-26 2022-04-22 深圳壹账通智能科技有限公司 群组推荐方法、装置、存储介质和计算机设备
CN109634725B (zh) * 2018-12-11 2023-08-15 苏州大学 一种群智感知任务的派发方法及装置
CN110097394A (zh) * 2019-03-27 2019-08-06 青岛高校信息产业股份有限公司 产品潜客推荐方法和装置
CN110544108B (zh) * 2019-04-18 2022-12-13 国家计算机网络与信息安全管理中心 社交用户的分类方法、装置、电子设备及介质
CN110555081B (zh) * 2019-04-18 2022-05-31 国家计算机网络与信息安全管理中心 社交互动的用户分类方法、装置、电子设备及介质
CN110298245B (zh) * 2019-05-22 2023-10-13 平安科技(深圳)有限公司 兴趣收集方法、装置、计算机设备及存储介质
CN111143670A (zh) * 2019-12-09 2020-05-12 中国平安财产保险股份有限公司 一种信息确定方法及相关产品
JP2021135722A (ja) * 2020-02-26 2021-09-13 国立大学法人 東京大学 情報処理装置、及びプログラム
USD943629S1 (en) * 2020-05-07 2022-02-15 Beijing Dajia Internet Information Technology Co., Ltd. Display screen or portion thereof with graphical user interface
CN111683154B (zh) * 2020-06-17 2023-11-14 腾讯科技(深圳)有限公司 一种内容推送的方法、装置、介质及电子设备
CN111722245B (zh) 2020-06-22 2023-03-10 阿波罗智能技术(北京)有限公司 定位方法、定位装置和电子设备
CN111859131B (zh) * 2020-07-21 2021-06-15 山东省科院易达科技咨询有限公司 一种多约束条件下的个性化信息推荐方法及信息推荐系统
CN111798351A (zh) * 2020-09-09 2020-10-20 北京神州泰岳智能数据技术有限公司 一种数据处理的方法及装置、可读存储介质
TWI749908B (zh) * 2020-11-25 2021-12-11 英業達股份有限公司 基於社群資訊追蹤及預測產品品質的方法
CN116361566A (zh) * 2023-03-29 2023-06-30 竞速信息技术(廊坊)有限公司 一种基于大数据的用户关系推荐方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2249261A1 (en) * 2009-05-08 2010-11-10 Comcast Interactive Media, LLC Recommendation method and system
CN103810192A (zh) * 2012-11-09 2014-05-21 腾讯科技(深圳)有限公司 一种用户的兴趣推荐方法和装置
CN103870538A (zh) * 2014-01-28 2014-06-18 百度在线网络技术(北京)有限公司 针对用户进行个性化推荐的方法、用户建模设备及系统
CN103870541A (zh) * 2014-02-24 2014-06-18 微梦创科网络科技(中国)有限公司 社交网络用户兴趣挖掘方法和系统
US8768936B2 (en) * 2010-06-29 2014-07-01 International Business Machines Corporation Method and apparatus for recommending information to users within a social network
CN104317959A (zh) * 2014-11-10 2015-01-28 北京字节跳动网络技术有限公司 基于社交平台的数据挖掘方法及装置

Family Cites Families (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004003705A2 (en) * 2002-06-27 2004-01-08 Small World Productions, Inc. System and method for locating and notifying a user of a person, place or thing having attributes matching the user's stated prefernces
US8386488B2 (en) * 2004-04-27 2013-02-26 International Business Machines Corporation Method and system for matching appropriate content with users by matching content tags and profiles
US8060463B1 (en) * 2005-03-30 2011-11-15 Amazon Technologies, Inc. Mining of user event data to identify users with common interests
JP2007334502A (ja) * 2006-06-13 2007-12-27 Fujifilm Corp 検索装置、方法およびプログラム
US7685192B1 (en) * 2006-06-30 2010-03-23 Amazon Technologies, Inc. Method and system for displaying interest space user communities
US7739231B2 (en) * 2006-08-28 2010-06-15 Manyworlds, Inc. Mutual commit people matching process
US20080077574A1 (en) * 2006-09-22 2008-03-27 John Nicholas Gross Topic Based Recommender System & Methods
US20090082111A1 (en) * 2007-04-06 2009-03-26 Smith Michael J System and method for connecting users based on common interests, such as shared interests of representations of professional athletes
WO2008134595A1 (en) * 2007-04-27 2008-11-06 Pelago, Inc. Determining locations of interest based on user visits
CN101685458B (zh) * 2008-09-27 2012-09-19 华为技术有限公司 一种基于协同过滤的推荐方法和系统
US9195739B2 (en) * 2009-02-20 2015-11-24 Microsoft Technology Licensing, Llc Identifying a discussion topic based on user interest information
CN102687166B (zh) * 2009-12-31 2016-02-10 诺基亚技术有限公司 用于用户兴趣建模的方法和设备
KR101565339B1 (ko) * 2010-11-03 2015-11-04 네이버 주식회사 집단지성을 이용한 추천 시스템 및 방법
CN102467542B (zh) * 2010-11-11 2016-06-15 腾讯科技(深圳)有限公司 获取用户相似度的方法、装置及用户推荐方法、系统
CN102622364B (zh) * 2011-01-28 2017-12-01 腾讯科技(深圳)有限公司 一种信息聚合的方法、装置及信息处理系统
CN102903047A (zh) * 2011-07-26 2013-01-30 阿里巴巴集团控股有限公司 一种商品信息投放方法和设备
CN102426686A (zh) * 2011-09-29 2012-04-25 南京大学 一种基于矩阵分解的互联网信息产品推荐方法
JP5730741B2 (ja) * 2011-10-19 2015-06-10 日本電信電話株式会社 話題推薦装置及び方法及びプログラム
US20130297590A1 (en) * 2012-04-09 2013-11-07 Eli Zukovsky Detecting and presenting information to a user based on relevancy to the user's personal interest
CN103514204B (zh) 2012-06-27 2018-11-20 华为技术有限公司 信息推荐方法和装置
US9154575B2 (en) * 2012-08-28 2015-10-06 Facebook, Inc. Soft matching user identifiers
CN102880691B (zh) * 2012-09-19 2015-08-19 北京航空航天大学深圳研究院 一种基于用户亲密度的混合推荐系统及方法
US9288275B2 (en) * 2012-10-11 2016-03-15 ThistleWorks Computer implemented event-centric social networking platform
CN103106285B (zh) * 2013-03-04 2017-02-08 中国信息安全测评中心 一种基于信息安全专业社交网络平台的推荐算法
US20140358945A1 (en) * 2013-06-03 2014-12-04 Tencent Technology (Shenzhen) Company Limited Systems and Methods for Matching Users
CN103995823A (zh) * 2014-03-25 2014-08-20 南京邮电大学 一种基于社交网络的信息推荐方法
US9754210B2 (en) * 2014-04-01 2017-09-05 Microsoft Technology Licensing, Llc User interests facilitated by a knowledge base
CN104090971A (zh) * 2014-07-17 2014-10-08 中国科学院自动化研究所 面向个性化应用的跨网络行为关联方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2249261A1 (en) * 2009-05-08 2010-11-10 Comcast Interactive Media, LLC Recommendation method and system
US8768936B2 (en) * 2010-06-29 2014-07-01 International Business Machines Corporation Method and apparatus for recommending information to users within a social network
CN103810192A (zh) * 2012-11-09 2014-05-21 腾讯科技(深圳)有限公司 一种用户的兴趣推荐方法和装置
CN103870538A (zh) * 2014-01-28 2014-06-18 百度在线网络技术(北京)有限公司 针对用户进行个性化推荐的方法、用户建模设备及系统
CN103870541A (zh) * 2014-02-24 2014-06-18 微梦创科网络科技(中国)有限公司 社交网络用户兴趣挖掘方法和系统
CN104317959A (zh) * 2014-11-10 2015-01-28 北京字节跳动网络技术有限公司 基于社交平台的数据挖掘方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3220289A4 *

Also Published As

Publication number Publication date
US10360230B2 (en) 2019-07-23
EP3220289A1 (en) 2017-09-20
CA2966757A1 (en) 2016-05-19
US20170322981A1 (en) 2017-11-09
JP2018503158A (ja) 2018-02-01
MX2017006054A (es) 2017-10-24
CN104317959B (zh) 2018-07-17
CN108197330B (zh) 2019-10-29
BR112017009666A2 (pt) 2017-12-26
CA2966757C (en) 2021-08-10
CN104317959A (zh) 2015-01-28
CN108197330A (zh) 2018-06-22
JP6438135B2 (ja) 2018-12-12
EP3220289A4 (en) 2018-05-16

Similar Documents

Publication Publication Date Title
WO2016074492A1 (zh) 基于社交平台的数据挖掘方法及装置
CN103916436B (zh) 信息推送方法、装置、终端及服务器
WO2017096877A1 (zh) 一种推荐方法和装置
CN105095211B (zh) 多媒体数据的获取方法和装置
US11914639B2 (en) Multimedia resource matching method and apparatus, storage medium, and electronic apparatus
CN105630977B (zh) 应用程序推荐方法、装置及系统
CN103870485A (zh) 实现增强现实应用的方法及设备
CN103647800A (zh) 推荐应用资源的方法及系统
CN103353920A (zh) 基于社交网络推荐游戏的方法和装置
WO2017101652A1 (zh) 网站页面间访问路径的确定方法及装置
KR101559719B1 (ko) 효과적인 마케팅을 도출하는 자동학습 시스템 및 방법
CN106021455A (zh) 图像特征关系的匹配方法、装置和系统
WO2018113673A1 (zh) 针对综艺类query的搜索结果的推送方法及装置
CN110222790B (zh) 用户身份识别方法、装置及服务器
KR20190097879A (ko) 마케팅 플랫폼 시스템과, 이를 이용한 소셜 네트워크 기반 광고 방법 및 컴퓨터 프로그램
CN104348871A (zh) 一种同类账号扩展方法及装置
CN107977678A (zh) 用于输出信息的方法和装置
CN111447081B (zh) 数据链生成方法、装置、服务器及存储介质
CN110209921B (zh) 媒体资源的推送方法和装置、以及存储介质和电子装置
CN104750718B (zh) 一种数据信息的搜索方法和设备
CN111191133B (zh) 业务搜索处理方法、装置及设备
CN113626624B (zh) 一种资源识别方法和相关装置
CN107092650B (zh) 一种网络日志分析方法及装置
CN112667869B (zh) 数据处理方法、设备、系统及存储介质
CN110852338A (zh) 用户画像的构建方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15859244

Country of ref document: EP

Kind code of ref document: A1

REEP Request for entry into the european phase

Ref document number: 2015859244

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2966757

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: MX/A/2017/006054

Country of ref document: MX

ENP Entry into the national phase

Ref document number: 2017525373

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 15525870

Country of ref document: US

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112017009666

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 112017009666

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20170508