WO2017028791A1 - 一种公众号推荐方法及系统 - Google Patents

一种公众号推荐方法及系统 Download PDF

Info

Publication number
WO2017028791A1
WO2017028791A1 PCT/CN2016/095730 CN2016095730W WO2017028791A1 WO 2017028791 A1 WO2017028791 A1 WO 2017028791A1 CN 2016095730 W CN2016095730 W CN 2016095730W WO 2017028791 A1 WO2017028791 A1 WO 2017028791A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
calculated
public number
public
group
Prior art date
Application number
PCT/CN2016/095730
Other languages
English (en)
French (fr)
Inventor
许毓超
苗军
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2017028791A1 publication Critical patent/WO2017028791A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Definitions

  • the present invention relates to the field of Internet information recommendation, and in particular, to a public number recommendation method and system.
  • the characteristics of the person or the item are extracted, the characteristic value of the person or the item is analyzed, or the interaction data between the person and the person or the person and the object, such as the evaluation value.
  • interactive information call records, whether it appears in the same photo, interactive records in circles or forums, location information, etc., and then calculate the strong and weak relationship between the two according to the six-degree separation theory, when judging people and people or people When there is a strong relationship with the object, it will be determined as a candidate recommendation list for someone who has a strong relationship with someone.
  • Prior art solutions include recommendations for people and people as well as recommendations for people and things.
  • the recommendation of people and people is the main method for the social system to build connections.
  • user A is used as an example.
  • User A and user B have a one-degree relationship, a second relationship with user C, a third-degree relationship with user D, and a four-degree relationship with user E.
  • F is a five-degree relationship, and has a six-degree relationship with the user G.
  • the user A and the above-mentioned users further relate to a person or objects a, b, c, d, e, f, g, h, i, j. , k, m, n are associated.
  • the core of the recommendation system is a recommendation algorithm, and the recommendation algorithm determines how the system works and the specific work strategy.
  • the prior art solution is mainly based on content and based on two algorithms.
  • content-based algorithms have the following problems: features are not easy to extract (such as Video, audio, document); if the feature is missing, the calculation result will be invalid; if the feature value is too large, the amount of data is large, it will consume a lot of calculation time.
  • the collaborative-based algorithm is mainly based on the evaluation of the object and the interaction record between the human and the object to determine the strong-weak relationship between the two.
  • the collaborative-based algorithm has the following problems: when the user's evaluation of the item is very sparse, it will lead to user-based The similarity between the users obtained by the evaluation may be inaccurate (ie, the sparsity problem); as the number of users and items increases, the performance of the system will become lower and lower (ie scalability problem); if the user never has a certain If the item is evaluated, this item cannot be recommended.
  • the present invention provides a public number recommendation method and system, which can realize a public number with a high priority for recommendation.
  • the present invention provides a method for recommending a public number, comprising: grouping user data read from a database; and determining, for each user in the group, the nearest neighbor user of the user to be calculated for each user to be calculated And determining, according to the latest information attenuation coefficient of the public time number of the nearest neighbor user and the unit time of the corresponding public number, the recommended public number of the group for the user to be calculated; according to the recommendation of the group for the user to be calculated
  • the public number determines the final recommended public number of the user to be calculated.
  • the grouping the user data read from the database includes:
  • the corresponding number of user data is divided into corresponding groups according to the determined number of groups.
  • the determining, in each group, the nearest neighbor user of the user to be calculated includes: extracting feature values of the user to be calculated and feature values of all users in the group, and calculating the user to be calculated The Euclidean distance between the feature value and the feature value of each user in the group, determining that the first predetermined number of users are the nearest users of the user to be calculated in the group according to the order of the Euclidean distance from small to large .
  • the feature value of the user includes a feature value corresponding to at least one of the following features: Gender, age, city, industry, occupation, income level, education level, marital status.
  • the determining, according to the latest information attenuation coefficient of the public time number of the nearest neighbor user and the unit time of the corresponding public number, determining the recommended public number of the group for the user to be calculated includes: in each group, Calculating a score of all the public numbers of interest of the nearest neighbor user of the user to be calculated, the score of the public number being equal to the Euclidean distance between the nearest neighbor user concerned with the public number and the feature value of the user to be calculated and the public The product of the latest information attenuation coefficient per unit time of the number is determined in descending order of the score of the public number, and the second predetermined number of public numbers is determined as the recommended public number of the group for the user to be calculated.
  • the information attenuation coefficient of the public time unit time is calculated according to one or more of the following parameters: the attention quantity, the viewing amount, and the click amount of the public number in the unit time.
  • the determining, according to the recommended public number of the user to be calculated, the final recommended public number of the user to be calculated includes:
  • the method further includes: pushing the final recommended public number of the user to be calculated to the user to be calculated.
  • the present invention also provides a public number recommendation system, comprising: a first analysis unit configured to group user data read from a database; and a second analysis unit configured to determine, within each group, a user to be calculated The nearest neighbor user of the user to be calculated, and determining the recommended public number of the group for the user to be calculated according to the latest information attenuation coefficient of the public time number of the nearest neighbor user and the unit time of the corresponding public number;
  • the analyzing unit is configured to determine, according to the recommended public number of the group for the user to be calculated, the final recommended public number of the user to be calculated.
  • the first analyzing unit is specifically configured to: determine a group number according to a ratio of a total amount of user data to a decomposition granularity coefficient; and divide the corresponding number of user data into phases according to the determined group number. Should be in the group.
  • the second analyzing unit is configured to determine, in each group, the nearest neighbor user of the user to be calculated, including: extracting feature values of the user to be calculated and feature values of all users in the group Calculating an Euclidean distance between the feature value of the user to be calculated and the feature value of each user in the group, and determining, according to the Euclidean distance, the first predetermined number of users as the group Represents the nearest user of the computing user.
  • the feature value of the user includes a feature value corresponding to at least one of the following characteristics: gender, age, city, industry, occupation, income level, education level, and marital status.
  • the second analyzing unit is configured to determine, according to the latest information attenuation coefficient of the public time number of the nearest neighbor user and the unit time of the corresponding public number, the recommended public number of the group for the user to be calculated.
  • the method includes: calculating, within each group, a score of all the public numbers of interest of the nearest neighbor user of the user to be calculated, the score of the public number being equal to the feature value of the nearest neighbor user and the user to be calculated that are concerned about the public number The product of the Euclidean distance between the Euclidean distance and the latest information attenuation coefficient per unit time of the public number, determining the second predetermined number of public numbers for the group to be calculated according to the order of the public number User's recommended public number.
  • the information attenuation coefficient of the public time unit time is calculated according to one or more of the following parameters: the attention quantity, the viewing amount, and the click amount of the public number in the unit time.
  • the third analyzing unit is specifically configured to: determine, according to a ranking of each group from the recommended public number of the user to be calculated, a third predetermined number of public numbers for all groups. The final recommended public number of the user to be calculated, wherein the third predetermined data is less than or equal to a product of the second predetermined number and the number of groups.
  • system further includes: a pushing module, configured to push the final recommended public number of the user to be calculated to the user to be calculated.
  • a pushing module configured to push the final recommended public number of the user to be calculated to the user to be calculated.
  • Another embodiment of the present invention provides a computer storage medium storing execution instructions for performing the method in the above embodiments.
  • the user data read from the database is grouped; for the user to be calculated, within each group, the nearest neighbor user of the user to be calculated is determined, and according to all public numbers and corresponding public numbers that the nearest neighbor user pays attention to
  • the latest information attenuation coefficient per unit time determines the recommended public number of the group for the user to be calculated; and according to the recommended public number of all the groups for the user to be calculated, the final recommended public number of the user to be calculated is determined.
  • the parallel computing of a large amount of user data improves the computing performance of large-scale data; meanwhile, the public number recommended to the user to be calculated is determined according to the latest information attenuation coefficient per unit time of the nearest neighbor user and the public number. A public number with a high priority for active recommendation is achieved.
  • the feature value of the user includes feature values corresponding to at least one of the following characteristics: gender, age, city, industry, occupation, income level, education level, marital status.
  • the feature values are easy to extract, and the absence of certain feature values does not invalidate the calculation result.
  • determining the nearest neighbor user of the user to be calculated based on the user feature value solves the problem of sparsity and the like based on the coordination algorithm in the prior art.
  • FIG. 2 is a flowchart of a method for recommending a public number according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of user data grouping according to an embodiment of the present invention.
  • Figure 4 is a schematic diagram of the fitting of flow data and exponential function modeled by Matlab;
  • Figure 5 is a diagram showing the information attenuation model of the public number
  • FIG. 6 is a data model diagram of an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of parallel processing according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a public number recommendation system according to Embodiment 1 of the present invention.
  • FIG. 9 is a flowchart of a public number recommendation method according to Embodiment 1 of the present invention.
  • FIG. 2 is a flowchart of a public number recommendation method according to an embodiment of the present invention. As shown in FIG. 2, the public number recommendation method provided in this embodiment includes the following steps:
  • Step 11 Group the user data read from the database.
  • step 11 includes:
  • the corresponding number of user data is divided into corresponding groups according to the determined number of groups.
  • the number T of user data is determined based on the total amount of user data N recorded in the database statistics and the decomposition granularity coefficient P.
  • T N/P.
  • the total amount of user data N is 1 million
  • the decomposition granularity coefficient P is 100,000
  • the number of sets of user data is 10. In other words, for 1 million user data, every 100,000 user data is decomposed into a parallel processing task for parallel processing.
  • Step 12 For each user to be calculated, determine the nearest neighbor user of the user to be calculated, and determine the group to be based on the latest information attenuation coefficient of all public numbers and the corresponding public time of the nearest public user. Calculate the user's recommended public number.
  • determining the nearest neighbor user of the user to be calculated includes: extracting feature values of the user to be calculated and feature values of all users in the group, and calculating feature values of the user to be calculated and each of the groups
  • the Euclidean distance between the feature values of the user is determined in descending order of the Euclidean distance, and the first predetermined number of users are determined to be the nearest neighbors of the user to be calculated in the group.
  • the feature value of the user includes a feature value corresponding to at least one of the following characteristics: gender, age, city, industry, occupation, income level, education level, and marital status. Specifically, before calculating the feature value of the user, different values of each feature are respectively assigned, and the data is normalized for subsequent calculation. For example, assigning values and normalizing data to different cities.
  • the user to be calculated is, for example, user x
  • the Euclidean distance between the eigenvalue of the user x to be calculated and the eigenvalue of the user y is expressed as follows:
  • the Euclidean distance between the user to be calculated and each user in the group is calculated, and the first predetermined number of the Euclidean distance is determined to be the smallest (eg, three
  • the user is the nearest user of the user to be calculated in the group. That is, the smaller the Euclidean distance between the user feature values, the higher the similarity of the user.
  • the first predetermined number is an integer greater than 0, and can be set according to actual needs, which is not limited by the present invention.
  • the recommended public number for the user to be calculated is determined according to the latest information attenuation coefficient of all public numbers of the nearest neighbor user and the unit time of the corresponding public number:
  • the product of the latest information attenuation coefficient per unit time of the number is determined in descending order of the public number, and the second predetermined number of public numbers is determined as the recommended public number of the group for the user to be calculated.
  • the information attenuation coefficient per unit time of the public number is calculated according to one or more of the following parameters: the amount of attention, the amount of viewing, and the amount of clicks of the public number per unit time.
  • N N 0 e -rt .
  • the fitting function and the fitting parameters can be obtained as follows:
  • N N 0 e -rt +B
  • N 0 139.4hits/min
  • r 0.168s -1
  • B 20.5hits/min.
  • N 1 cwF + B.
  • N 2 c 2 w(FN 1 )+B
  • N 1 represents the number of clicks of the link in the first minute
  • the information attenuation coefficient per unit time can be calculated, for example, 0.52.
  • the available click ratio C is as follows:
  • the N value can be obtained as follows:
  • the information attenuation factor of the public number is as follows:
  • the information attenuation coefficient per unit time of the public number is between 0 and 1.
  • the amount of information is exponentially decayed, from fast to slow.
  • the information attenuation coefficient per unit time is changed from small to large, and the information attenuation coefficient of the public number in the first minute is used as the weighting coefficient.
  • Step 13 Determine the final recommended public number of the user to be calculated according to the recommended public number of all groups for the user to be calculated.
  • the step 13 includes: determining, according to a ranking of each group from the recommended public number of the user to be calculated, a third predetermined number of public numbers as the final recommended public number of all groups for the user to be calculated.
  • the third predetermined data is less than or equal to a product of the second predetermined number and the number of groups.
  • the method further includes: pushing the final recommended public number of the user to be calculated to the user to be calculated.
  • FIG. 6 is a data model diagram of an embodiment of the present invention.
  • user A itself follows (Follow) 1, 3, according to the relationship between user A and user B (Relation) can find the user A's once relationship 2 of user B's attention; according to user A and user The relationship between B and E can be found in the second relationship of user A.
  • User E pays attention to 4; according to the relationship between user A and D, the user A's once-related relationship of user D can be found.
  • FIG. 7 is a schematic diagram of parallel processing according to an embodiment of the present invention.
  • KNN K-Nearest Neighbor
  • the K nearest neighbor is determined by the Euclidean distance of the user eigenvalue, and K is an integer greater than zero.
  • the latest information attenuation system of the unit time of the nearest public attention user is weighted, and the corresponding Euclidean distance is weighted, and the weighted result is used in the group.
  • the determination determines the recommended public number of the group for the user to be calculated.
  • the Euclidean distance is, for example, 5, the Euclidean distance between the user A and the user C is, for example, 3, and the Euclidean distance between the user A and the user D is, for example, 2, and the user B pays attention to the public numbers b1 and b2, and Public
  • the information attenuation coefficient of the public number b1 is, for example, 0.8
  • the information attenuation coefficient of the public number b2 is, for example, 0.6
  • the user C pays attention to the public number c1
  • the information attenuation coefficient of the public number c1 is, for example, 0.7
  • the user D pays attention to the public numbers d1 and d2
  • the information attenuation coefficient of the public number d1 is, for example, 0.86, and the information attenuation coefficient of the
  • the above processing is performed for each group, and three recommended public numbers can be obtained in each group; after that, the results of all the groups are combined, and the final recommended public number is determined based on the scores of all the obtained recommended public numbers. For example, when the third predetermined number (J in FIG. 7) is five, five final recommended public numbers are determined for the user A in descending order of the ratings of the recommended public numbers obtained for all the groups.
  • an embodiment of the present invention further provides a public number recommendation system, including: a first analysis unit configured to group user data read from a database; and a second analysis unit configured to be used for each user to be calculated Within the group, determining the nearest neighbor user of the user to be calculated, and determining, according to the latest information attenuation coefficient of the public time number of the nearest neighbor user and the unit time of the corresponding public number, the group of recommended publics for the user to be calculated a third analysis unit, configured to determine a final recommended public number of the user to be calculated according to a recommended public number of all groups for the user to be calculated.
  • a public number recommendation system including: a first analysis unit configured to group user data read from a database; and a second analysis unit configured to be used for each user to be calculated Within the group, determining the nearest neighbor user of the user to be calculated, and determining, according to the latest information attenuation coefficient of the public time number of the nearest neighbor user and the unit time of the corresponding public number, the group of recommended publics for the user
  • the first analyzing unit is specifically configured to: determine the number of groups according to the ratio of the total amount of user data to the decomposition granularity coefficient; and divide the corresponding number of user data into corresponding groups according to the determined number of groups.
  • the second analyzing unit is configured to determine, in each group, the nearest neighbor user of the user to be calculated, including: extracting feature values of the user to be calculated and feature values of all users in the group, and calculating the Determining the Euclidean distance between the feature value of the user and the feature value of each user in the group, determining the first predetermined number of users as the group according to the order of the Euclidean distance from small to large
  • the nearest neighbor user of the user to be calculated includes a feature value corresponding to at least one of the following characteristics: gender, age, city, industry, occupation, income level, education level, and marital status.
  • the second analyzing unit is configured to determine, according to the latest information attenuation coefficient of the public time number of the nearest neighbor user and the latest information of the unit time of the corresponding public number, the recommended public number of the group for the user to be calculated, including: Within the group, calculate the score of all the public numbers of interest of the nearest neighbor user of the user to be calculated, and the score of the public number is equal to the Euclidean distance between the nearest neighbor user who is interested in the public number and the characteristic value of the user to be calculated and the public
  • the product of the latest information attenuation coefficient per unit time of the number is determined in descending order of the public number, and the second predetermined number of public numbers is determined as the recommended public number of the group for the user to be calculated.
  • the information attenuation coefficient per unit time of the public number is calculated according to one or more of the following parameters: the amount of attention, the amount of viewing, and the amount of clicks of the public number per unit time.
  • the third analysis unit is specifically configured to: determine, according to a ranking of each group from the recommended public number of the user to be calculated, a third predetermined number of public numbers for all groups for the user to be calculated. The public number is finally recommended, wherein the third predetermined data is less than or equal to a product of the second predetermined number and the number of packets.
  • the above system further includes a pushing module configured to push the final recommended public number of the user to be calculated to the user to be calculated.
  • FIG. 8 is a schematic diagram of a public number recommendation system according to Embodiment 1 of the present invention.
  • the public number recommendation system includes a first analysis unit, a second analysis unit, a third analysis unit, a push module, a storage module, and a user terminal.
  • the first analysis unit, the second analysis unit and the third analysis unit are integrated, for example, in the analysis module.
  • the storage module is, for example, a component having a data storage function such as a memory; the functions of the first analysis unit, the second analysis unit, the third analysis unit, and the push module are, for example, read by a computer processor in a program stored in the memory/
  • the instructions are implemented, or the functions of the above modules can also be implemented by firmware/logic circuits/integrated circuits.
  • the analysis module is a main calculation module of the public number recommendation system, and is configured to read data according to the user and the public number, then group, join the distributed computing task, and calculate K K to be calculated in each group.
  • the nearest neighbor user and determining the recommended public number in each group according to the latest information attenuation coefficient of the unit time of the K nearest neighbor users weighting the public number, and finally combining the results of all the groups to obtain the final recommended public number;
  • the pushing module is, for example, a task
  • the queue is loaded at the time of program startup, and then polls whether there is a public number candidate set list to be pushed, reads the candidate set data to be pushed, and pushes it to the user terminal;
  • the user terminal is the user client, including the user's friend relationship and the concern
  • the public number, the polling read whether the latest public number is recommended and displayed on the interface;
  • the storage module is set to store the user and public number data, as well as the amount of attention, the amount of viewing, the click record, and
  • FIG. 9 is a flowchart of a public number recommendation method according to Embodiment 1 of the present invention. As shown in FIG. 9, this embodiment is specifically described as follows:
  • Step 101A Start an analysis process in the analysis module.
  • Step 101B Start a push process in the push module
  • Step 102 The analysis module initiates an offline computing task according to the user to be calculated (for example, user A), where the user to be calculated is, for example, any user stored in the database;
  • Step 103 The analysis module reads the database full amount of user data (ie, sample data) from the storage module;
  • Step 104 The storage module returns a data record to the analysis module.
  • Step 105 The analysis module calculates a packet of the full amount of user data, where the grouping policy is the same as that described above, and thus is not described herein;
  • Step 106 The analysis module decomposes the full amount of user data into multiple parallel processing tasks according to the grouping result (eg, task1...taskn);
  • Step 107 The analysis module calculates the K nearest neighbor (KNN) of the user A in each group, where the value of K is, for example, an integer greater than 0 and not greater than 5, wherein the determination process of the K nearest neighbor of the user A is the same
  • KNN K nearest neighbor
  • Step 108 The analysis module obtains the public number of the K nearest neighbor users, and the information attenuation coefficient of the public numbers, which is determined by the product of the Euclidean distance of the nearest neighbor user and the latest information attenuation coefficient of the unit time corresponding to the public number of interest.
  • the scores of all the K nearest neighbor users who pay attention to the public number determine the public number recommendation results of each group according to the order of the scores;
  • Step 109 The analysis module merges the grouping result, that is, merges the recommended public numbers of all groups;
  • Step 110 The analysis module determines the J final recommendation numbers according to the scores of the recommended public numbers of all the merged groups, and determines the J final recommendation public numbers, where J is greater than 0, for example. An integer not greater than 5;
  • Step 111 The analysis module obtains a recommended public number candidate set of user A;
  • Step 112 The analysis module adds the obtained recommended public number candidate set of user A to the push list of the push module.
  • Step 113 The push module polls the push list
  • Step 114 The push module reads the push list.
  • Step 115 The push module pushes the recommended public number candidate set of the user A to the user A corresponding terminal.
  • Step 116 The user terminal pays attention to, views or clicks on the public number.
  • Embodiments of the present invention also provide a storage medium.
  • an execution instruction is stored in the storage medium, and the execution instruction is used to execute the foregoing method.
  • the foregoing storage medium may include, but is not limited to, a USB flash drive, a Read-Only Memory (ROM), and a Random Access Memory (RAM).
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • the user data is grouped and the public number is scored by weighting the information attenuation coefficient per unit time, thereby improving the calculation performance of the large-scale data, and the information attenuation coefficient per unit time.
  • Dynamically changing the dynamic adjustment of the public number candidate set is achieved by dynamically adjusting the information attenuation coefficient.
  • the embodiment of the invention achieves priority A public number with a high degree of activity is recommended, and the system provided by the embodiment of the present invention automatically learns according to the increase and change of the amount of data.
  • the public number recommendation method and system provided by the embodiments of the present invention have the following beneficial effects: the parallel computing of a large amount of user data improves the computing performance of large-scale data; meanwhile, according to the nearest neighbor user and public number The latest information attenuation coefficient per unit time determines the public number recommended to the user to be calculated, and the public number with high priority is recommended.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种公众号推荐方法及系统,其中方法包括:将从数据库读取的用户数据进行分组(11);针对待计算用户,在每个组内,确定待计算用户的最邻近用户,并根据最邻近用户关注的所有公众号以及对应公众号的单位时间的最新信息衰减系数确定该组针对待计算用户的推荐公众号(12);根据所有组针对待计算用户的推荐公众号,确定待计算用户最终的推荐公众号(13)。该公众号推荐方法及系统能够实现优先推荐活跃度高的公众号。

Description

一种公众号推荐方法及系统 技术领域
本发明涉及互联网信息推荐领域,尤其涉及一种公众号推荐方法及系统。
背景技术
数学领域中的六度分隔理论(Six Degrees of Separation)指出:世界上任意两个人之间建立联系,最多只需要六个人,也就是说,任一人最多通过五个中间人就能够认识任何一个陌生人。该理论亦称为小世界理论。
在大数据时代,互联网领域普遍存在信息过载的问题。为了解决信息过载的问题,在现有技术方案中,会抽取人或物品的特征,针对人或物品的特征值进行分析,或者根据人和人或人和物之间的互动数据,比如评价值、交互信息、通话记录、是否出现在同一照片、在圈子或论坛里的互动记录、位置信息等等,再根据六度分隔理论计算两者之间的强弱关系,当判定人和人或者人和物为强关系时,将确定为与某人为强关系的人或物加入某人的候选推荐列表。
现有技术方案包括人和人的推荐以及人和物的推荐。其中,人和人的推荐是社交系统构建连接的主要方法,为了能够增加真实社交关系在虚拟社交中的连接,通常会根据如图1所示的理论模型进行推荐。如图1所示,以用户A为例描述,用户A与用户B之间为一度关系,与用户C为二度关系,与用户D为三度关系,与用户E为四度关系,与用户F为五度关系,与用户G为六度关系,用户A通过与上述该些用户之间的关系进而与人或物a、b、c、d、e、f、g、h、i、j、k、m、n进行关联。
具体而言,推荐系统的核心是推荐算法,推荐算法决定了系统如何工作和具体工作的策略,现有技术方案主要是基于内容和基于协同两种算法来计算。然而,基于内容的算法存在以下问题:特征不容易抽取(比如有 视频、音频、文档);若特征缺失,会导致计算结果失效;若特征值过多,数据量大,会消耗大量的计算时间。基于协同的算法主要基于物的评价以及人和物的互动记录来判定两者之间的强弱关系,基于协同的算法存在以下问题:当用户对物品的评价非常稀疏时,会导致基于用户的评价所得到的用户间的相似性可能不准确(即稀疏性问题);随着用户和物品的增多,系统的性能会越来越低(即可扩展性问题);如果从来没有用户对某一物品加以评价,则这个物品不可能被推荐。
发明内容
为了解决上述技术问题,本发明提供一种公众号推荐方法及系统,能够实现优先推荐活跃度高的公众号。
为了达到上述技术目的,本发明提供一种公众号推荐方法,包括:将从数据库读取的用户数据进行分组;针对待计算用户,在每个组内,确定所述待计算用户的最邻近用户,并根据所述最邻近用户关注的所有公众号以及对应公众号的单位时间的最新信息衰减系数确定该组针对所述待计算用户的推荐公众号;根据所有组针对所述待计算用户的推荐公众号,确定所述待计算用户的最终推荐公众号。
可选地,所述将从数据库读取的用户数据进行分组包括:
根据用户数据总量与分解粒度系数的比值确定组数;
按照确定的组数将相应数目的用户数据分到相应的组中。
可选地,所述在每个组内,确定所述待计算用户的最邻近用户包括:抽取所述待计算用户的特征值以及该组内所有用户的特征值,计算所述待计算用户的特征值与该组内每个用户的特征值之间的欧氏距离,按照欧氏距离由小到大的顺序,确定第一预定数目的用户为该组内所述待计算用户的最邻近用户。
可选地,所述用户的特征值包括对应于以下至少一项特征的特征值: 性别、年龄、所在城市、所属行业、职业、收入水平、教育程度、婚姻状况。
可选地,所述根据所述最邻近用户关注的所有公众号以及对应公众号的单位时间的最新信息衰减系数确定该组针对所述待计算用户的推荐公众号包括:在每个组内,计算所述待计算用户的最邻近用户的所有关注的公众号的评分,所述公众号的评分等于关注该公众号的最邻近用户与待计算用户的特征值之间的欧氏距离与该公众号的单位时间的最新信息衰减系数的乘积,按照所述公众号的评分由大到小的顺序,确定第二预定数目的公众号为该组针对所述待计算用户的推荐公众号。
可选地,所述公众号的单位时间的信息衰减系数根据以下参数中的一个或多个计算得到:单位时间内的公众号的关注量、查看量及点击量。
可选地,所述根据所有组针对所述待计算用户的推荐公众号,确定所述待计算用户的最终推荐公众号包括:
根据每个组针对所述待计算用户的推荐公众号的评分由大到小的顺序,确定第三预定数目的公众号为所有组针对待计算用户的最终推荐公众号,其中,所述第三预定数据小于或等于所述第二预定数目与组数的乘积。
可选地,所述确定所述待计算用户的最终推荐公众号之后,该方法还包括:将所述待计算用户的最终推荐公众号推送给所述待计算用户。
本发明还提供一种公众号推荐系统,包括:第一分析单元,设置为将从数据库读取的用户数据进行分组;第二分析单元,设置为针对待计算用户,在每个组内,确定所述待计算用户的最邻近用户,并根据所述最邻近用户关注的所有公众号以及对应公众号的单位时间的最新信息衰减系数确定该组针对所述待计算用户的推荐公众号;第三分析单元,设置为根据所有组针对所述待计算用户的推荐公众号,确定所述待计算用户的最终推荐公众号。
可选地,所述第一分析单元,具体设置为:根据用户数据总量与分解粒度系数的比值确定组数;按照确定的组数将相应数目的用户数据分到相 应的组中。
可选地,所述第二分析单元,设置为在每个组内,确定所述待计算用户的最邻近用户,包括:抽取所述待计算用户的特征值以及该组内所有用户的特征值,计算所述待计算用户的特征值与该组内每个用户的特征值之间的欧氏距离,按照欧氏距离由小到大的顺序,确定第一预定数目的用户为该组内所述待计算用户的最邻近用户。
可选地,所述用户的特征值包括对应于以下至少一项特征的特征值:性别、年龄、所在城市、所属行业、职业、收入水平、教育程度、婚姻状况。
可选地,所述第二分析单元,设置为根据所述最邻近用户关注的所有公众号以及对应公众号的单位时间的最新信息衰减系数确定该组针对所述待计算用户的推荐公众号,包括:在每个组内,计算所述待计算用户的最邻近用户的所有关注的公众号的评分,所述公众号的评分等于关注该公众号的最邻近用户与待计算用户的特征值之间的欧氏距离与该公众号的单位时间的最新信息衰减系数的乘积,按照所述公众号的评分由大到小的顺序,确定第二预定数目的公众号为该组针对所述待计算用户的推荐公众号。
可选地,所述公众号的单位时间的信息衰减系数根据以下参数中的一个或多个计算得到:单位时间内的公众号的关注量、查看量及点击量。
可选地,所述第三分析单元,具体设置为:根据每个组针对所述待计算用户的推荐公众号的评分由大到小的顺序,确定第三预定数目的公众号为所有组针对所述待计算用户的最终推荐公众号,其中,所述第三预定数据小于或等于所述第二预定数目与组数的乘积。
可选地,该系统还包括:推送模块,设置为将所述待计算用户的最终推荐公众号推送给所述待计算用户。
本发明另一实施例提供了一种计算机存储介质,所述计算机存储介质存储有执行指令,所述执行指令用于执行上述实施例中的方法。
在本发明中,将从数据库读取的用户数据进行分组;针对待计算用户,在每个组内,确定待计算用户的最邻近用户,并根据最邻近用户关注的所有公众号以及对应公众号的单位时间的最新信息衰减系数确定该组针对待计算用户的推荐公众号;根据所有组针对待计算用户的推荐公众号,确定待计算用户最终的推荐公众号。在本发明中,通过将大量用户数据进行并行处理,提升了大规模数据的计算性能;同时,根据最邻近用户及公众号的单位时间的最新信息衰减系数确定向待计算用户推荐的公众号,实现了优先推荐活跃度高的公众号。
较佳地,在本发明中,用户的特征值包括对应于以下至少一项特征的特征值:性别、年龄、所在城市、所属行业、职业、收入水平、教育程度、婚姻状况。如此,相较于现有技术,特征值容易抽取,而且,某些特征值的缺失并不会造成计算结果失效。而且,基于用户特征值确定待计算用户的最邻近用户,解决了现有技术中基于协调算法存在的稀疏性等问题。
附图说明
图1为现有技术方案的理论模型图;
图2为本发明实施例提供的公众号推荐方法的流程图;
图3为本发明实施例中用户数据分组示意图;
图4为Matlab建模的流量数据与指数函数的拟合示意图;
图5为公众号的信息衰减模型图;
图6为本发明实施例的数据模型图;
图7为本发明实施例并行处理的示意图;
图8为本发明实施例一提供的公众号推荐系统的示意图;
图9为本发明实施例一提供的公众号推荐方法的流程图。
具体实施方式
以下结合附图对本发明的实施例进行详细说明,应当理解,以下所说明的实施例仅用于说明和解释本发明,并不用于限定本发明。
图2为本发明实施例提供的公众号推荐方法的流程图。如图2所示,本实施例提供的公众号推荐方法包括以下步骤:
步骤11:将从数据库读取的用户数据进行分组。
其中,步骤11包括:
根据用户数据总量与分解粒度系数的比值确定组数;
按照确定的组数将相应数目的用户数据分到相应的组中。
具体而言,如图3所示,根据数据库统计记录的用户数据总量N以及分解粒度系数P,确定用户数据的组数T。于此,T=N/P。举例而言,用户数据总量N为100万,分解粒度系数P为10万,则用户数据的组数为10个。换言之,针对100万用户数据,每10万个用户数据分解到一个并行处理任务,进行并行处理。
步骤12:针对待计算用户,在每个组内,确定待计算用户的最邻近用户,并根据最邻近用户关注的所有公众号以及对应公众号的单位时间的最新信息衰减系数确定该组针对待计算用户的推荐公众号。
具体而言,在每个组内,确定待计算用户的最邻近用户包括:抽取待计算用户的特征值以及该组内所有用户的特征值,计算待计算用户的特征值与该组内每个用户的特征值之间的欧氏距离,按照欧氏距离由小到大的顺序,确定第一预定数目的用户为该组内待计算用户的最邻近用户。
其中,用户的特征值包括对应于以下至少一项特征的特征值:性别、年龄、所在城市、所属行业、职业、收入水平、教育程度、婚姻状况。具体而言,在对用户的特征值进行计算之前,会对各特征的不同情况分别赋值,并对数据进行归一化处理,以便于后续计算。比如,对不同城市进行赋值和数据归一化处理。
举例而言,待计算用户例如为用户x,用户x的特征值表示为x=(x1,……,xn),第一组中的用户y的特征值例如表示为y=(y1,……,yn),则待计算用户x的特征值与用户y的特征值之间的欧氏距离表示如下:
Figure PCTCN2016095730-appb-000001
具体而言,在每个组内,根据上述欧氏距离计算公式,计算待计算用户与该组内每个用户之间的欧氏距离,确定欧氏距离最小的第一预定数目(如三个)的用户为该组内待计算用户的最邻近用户。即,用户特征值之间的欧氏距离越小,用户的相似度越高。其中,第一预定数目为大于0的整数,可根据实际需要进行设定,本发明对此并不限定。
于此,针对待计算用户,在每个组内,根据最邻近用户关注的所有公众号以及对应公众号的单位时间的最新信息衰减系数确定该组针对待计算用户的推荐公众号包括:在每个组内,计算待计算用户的最邻近用户的所有关注的公众号的评分,公众号的评分等于关注该公众号的最邻近用户与待计算用户的特征值之间的欧氏距离与该公众号的单位时间的最新信息衰减系数的乘积,按照公众号的评分由大到小的顺序,确定第二预定数目的公众号为该组针对待计算用户的推荐公众号。
其中,公众号的单位时间的信息衰减系数根据以下参数中的一个或多个计算得到:单位时间内的公众号的关注量、查看量及点击量。
以下为了说明公众号的信息衰减系数的确定进行如下假设:
假定某事物(如人体内的病毒或啤酒沫)的数量为N,且该事物减少的速度与其数量成一定比例,则在给定的时间间隔Δt下,有如下表达式:
Figure PCTCN2016095730-appb-000002
若Δt趋于零,则可以得到一个导数,数量为时间的函数,表达式如下:
N=N0e-rt
于此,为了对上述函数进行验证,通过数学软件Matlab建模,得到流量数据与指数函数的拟合情况如图4所示,据此,可以得到拟合函数和拟合参数如下:
N=N0e-rt+B,
其中,N0=139.4hits/min,r=0.168s-1,B=20.5hits/min。
这表明在指数衰减模型下,公众号发布的消息每分钟能获得约20次点击,其中,N为每分钟点击数,并非总点击数。
现假定,有85万关注者可能会看到公众号发布的链接,此处忽略了那些看到链接的非关注者,将公众号的关注量设为F,在这些关注者中有些会查看自己的公众号消息,假定这部分关注者为W(即,公众号的查看量),这些看到链接的关注者中还有一部分会打开链接,假定这部分用户为C(即,公众号的点击量),另外,还有些人会通过其它途径点击该链接,假定这部分人为B。如图5所示,其中,大圈为总关注者量F,中圈为查看量W,小圈为点击量C。
公众号含有链接的消息发出一分钟后,所获得点击数如下:
N1=cwF+B。
假定用户a是一个看到此链接的人,每分钟用户a都会在此公众号中看到一个新链接,且用户a点击某个链接的几率和链接总数成比例,则以下前两分钟的点击量为:
Figure PCTCN2016095730-appb-000003
Figure PCTCN2016095730-appb-000004
其中,l为某个常量,表示新增加信息发表量,0.25为针对没有点击情况的假定比例。
此外,B也为常量,并存在另外一个假设,有些点击者会转发该链接,从而产生二阶效应,由此得出第二分钟内的函数关系如下:
N2=c2w(F-N1)+B,
其中,N1代表第一分钟内该链接的点击次数,通过工具对数据模型进行验证,可以得出以下参数:w=0.02,B=15,假定l0=25,即每分钟普通推友能看到25条新信息,根据以上推导,可以进行计算得出单位时间的信息衰减系数,例如为0.52。
具体而言,根据F=850000,此处假定查看比例为0.02,即850000*0.02=17000人会查看,
其中,根据上述公式推导可得点击比例数C如下:
C1=0.25*1/25=0.01;
C2=0.25*1/50=0.005;
C3=0.25*1/75=0.0033;
C4=0.25*1/100=0.0025;
根据上述公式推导可得到N值如下:
N1=0.01*0.02*850000+20=190;
N2=0.005*0.02*(850000-190)+15=99;
N3=0.0033*0.02*(850000-99)+10=66;
N4=0.0025*0.02*(850000-66)+8=50;
相应地,公众号的信息衰减系数如下:
T1=99/190=0.52;
T2=66/99=0.66;
T3=50/66=0.75;
其中,公众号的单位时间的信息衰减系数的取值在0~1之间。
由此可见,根据数据拟合曲线可见,信息量以指数衰减,从快到慢, 单位时间的信息衰减系数由小变大,以第一分钟的公众号的信息衰减系数作为加权系数,取值越大衰减越慢,越小衰减越快,且欧氏距离越小用户越接近,如此,综合评分越大,公众号的活跃度越高。
步骤13:根据所有组针对待计算用户的推荐公众号,确定待计算用户的最终推荐公众号。
于此,步骤13包括:根据每个组针对所述待计算用户的推荐公众号的评分由大到小的顺序,确定第三预定数目的公众号为所有组针对待计算用户的最终推荐公众号,其中,第三预定数据小于或等于第二预定数目与组数的乘积。
此外,于步骤13之后,该方法还包括:将待计算用户的最终推荐公众号推送给待计算用户。
于此,图6为本发明实施例的数据模型图。如图6所示,针对用户A,用户A本身关注(Follow)1、3,根据用户A与用户B的关系(Relation)可以找到用户A的一度关系用户B关注的2;根据用户A与用户B、E的关系可以找到用户A的二度关系用户E关注的4;根据用户A与D的关系可以找到用户A的一度关系用户D关注的5。
图7为本发明实施例并行处理的示意图。如图7所示,在本实施例中,针对待计算用户,每个组内,计算组内K最近邻(k-Nearest Neighbor,简称为KNN)(即K个最邻近用户),于此,K最近邻通过用户特征值的欧氏距离确定,K为大于0的整数。在每个组内,确定待计算用户的K最近邻之后,通过最邻近用户关注公众号的单位时间的最新信息衰减系统对其对应的欧氏距离进行加权,并根据加权后的结果进行组内判断确定该组针对待计算用户的推荐公众号。
举例而言,以第一组内待计算用户(如用户A)的最邻近用户为用户B、用户C及用户D为例(即,此时,K=3),用户A与用户B之间的欧氏距离例如为5,用户A与用户C之间的欧氏距离例如为3,用户A与用户D之间的欧氏距离例如为2,另外,用户B关注公众号b1与b2,且公 众号b1的信息衰减系数例如为0.8,公众号b2的信息衰减系数例如为0.6,用户C关注公众号c1,且公众号c1的信息衰减系数例如为0.7,用户D关注公众号d1与d2,且公众号d1的信息衰减系数例如为0.86,公众号d2的信息衰减系数例如为0.95;此时,公众号b1的评分为5*0.8=4;公众号b2的评分为5*0.6=3;公众号c1的评分为3*0.7=2.1;公众号d1的评分为2*0.86=1.72;公众号d2的评分为2*0.95=1.9。据此,当第二预定数目(如图7中的K)为3个时,第一组针对待计算用户的推荐公众号为公众号b1、b2及c1。
同理,针对每个组均进行上述处理,在每个组均能得到三个推荐公众号;之后,合并所有组的结果,并根据所有得到的推荐公众号的评分确定最终推荐公众号。例如,当第三预定数目(如图7中的J)为5个时,根据所有组得到的推荐公众号的评分由大到小的顺序,针对用户A确定5个最终推荐公众号。
此外,本发明实施例还提供一种公众号推荐系统,包括:第一分析单元,设置为将从数据库读取的用户数据进行分组;第二分析单元,设置为针对待计算用户,在每个组内,确定所述待计算用户的最邻近用户,并根据所述最邻近用户关注的所有公众号以及对应公众号的单位时间的最新信息衰减系数确定该组针对所述待计算用户的推荐公众号;第三分析单元,设置为根据所有组针对所述待计算用户的推荐公众号,确定所述待计算用户的最终推荐公众号。
其中,第一分析单元,具体设置为:根据用户数据总量与分解粒度系数的比值确定组数;按照确定的组数将相应数目的用户数据分到相应的组中。
其中,第二分析单元,设置为在每个组内,确定所述待计算用户的最邻近用户,包括:抽取所述待计算用户的特征值以及该组内所有用户的特征值,计算所述待计算用户的特征值与该组内每个用户的特征值之间的欧氏距离,按照欧氏距离由小到大的顺序,确定第一预定数目的用户为该组 内所述待计算用户的最邻近用户。其中,用户的特征值包括对应于以下至少一项特征的特征值:性别、年龄、所在城市、所属行业、职业、收入水平、教育程度、婚姻状况。
其中,第二分析单元,设置为根据所述最邻近用户关注的所有公众号以及对应公众号的单位时间的最新信息衰减系数确定该组针对所述待计算用户的推荐公众号,包括:在每个组内,计算待计算用户的最邻近用户的所有关注的公众号的评分,公众号的评分等于关注该公众号的最邻近用户与待计算用户的特征值之间的欧氏距离与该公众号的单位时间的最新信息衰减系数的乘积,按照公众号的评分由大到小的顺序,确定第二预定数目的公众号为该组针对所述待计算用户的推荐公众号。其中,公众号的单位时间的信息衰减系数根据以下参数中的一个或多个计算得到:单位时间内的公众号的关注量、查看量及点击量。
其中,第三分析单元,具体设置为:根据每个组针对待计算用户的推荐公众号的评分由大到小的顺序,确定第三预定数目的公众号为所有组针对所述待计算用户的最终推荐公众号,其中,所述第三预定数据小于或等于所述第二预定数目与分组数目的乘积。
此外,上述系统还包括推送模块,设置为将所述待计算用户的最终推荐公众号推送给所述待计算用户。
关于上述系统的具体处理流程同上述方法所述,故于此不再赘述。
图8为本发明实施例一提供的公众号推荐系统的示意图。如图8所示,于本实施例中,公众号推荐系统包括第一分析单元、第二分析单元、第三分析单元、推送模块、存储模块以及用户终端。于此,第一分析单元、第二分析单元以及第三分析单元例如整合在分析模块中。于实际应用中,存储模块例如为存储器等具有数据存储功能的元件;第一分析单元、第二分析单元、第三分析单元以及推送模块的功能例如由计算机处理器读取存储在存储器的程序/指令实现,或者,上述模块的功能还可以通过固件/逻辑电路/集成电路实现。
于本实施例中,分析模块为公众号推荐系统的主计算模块,设置为根据用户和公众号,读取数据,然后分组,加入分布式计算任务,计算每个组内待计算用户的K个最邻近用户,并根据K个最邻近用户加权公众号的单位时间的最新信息衰减系数确定每个组内的推荐公众号,最后合并所有组的结果得到最终推荐公众号;推送模块例如为一个任务队列,在程序启动时加载,然后轮询是否存在待推送的公众号候选集列表,读取待推送候选集数据,推送给用户终端;用户终端为用户客户端,包括用户的朋友关系以及关注的公众号,轮询读取是否有最新的公众号被推荐过来,并显示在界面上;存储模块,设置为存储用户及公众号数据,以及关注量、查看量、点击记录等。
图9为本发明实施例一提供的公众号推荐方法的流程图。如图9所示,本实施例具体描述如下:
步骤101A:在分析模块启动分析进程;
步骤101B:在推送模块启动推送进程;
步骤102:分析模块按待计算用户(例如,用户A)发起离线计算任务,其中,待计算用户例如为数据库存储的任意用户;
步骤103:分析模块从存储模块读取数据库全量用户数据(即,样本数据);
步骤104:存储模块向分析模块返回数据记录;
步骤105:分析模块计算全量用户数据的分组,其中,分组策略同上述方法所述,故于此不再赘述;
步骤106:分析模块按照分组结果将全量用户数据分解成多个并行处理任务(如,task1……taskn);
步骤107:分析模块计算每个组内用户A的K最近邻(KNN),于此,K的取值例如为大于0且不大于5的整数,其中,用户A的K最近邻的确定过程同上述方法所述,故于此不再赘述;
步骤108:分析模块获取K个最邻近用户关注的公众号,以及该些公众号的信息衰减系数,通过最邻近用户的欧氏距离与对应关注公众号的单位时间的最新信息衰减系数的乘积确定K个最邻近用户所有的关注公众号的评分;按照评分从大到小的顺序,确定每个组的公众号推荐结果;
步骤109:分析模块合并分组结果,即合并所有组的推荐公众号;
步骤110:分析模块根据合并的所有组的推荐公众号的评分,按照从大到小的顺序,取前J个临近结果,即确定J个最终推荐公众号,于此,J例如为大于0且不大于5的整数;
步骤111:分析模块得到用户A的推荐公众号候选集;
步骤112:分析模块将得到的用户A的推荐公众号候选集加入推送模块的推送列表;
步骤113:推送模块轮询推送列表;
步骤114:推送模块读取推送列表;
步骤115:推送模块将用户A的推荐公众号候选集推给用户A对应终端;
步骤116:用户终端对公众号进行关注、查看或点击。
本发明的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质中存储有执行指令,该执行指令用于执行上述的方法。
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
综上所述,在本发明实施例中,通过将用户数据进行分组和通过加权单位时间的信息衰减系数对公众号进行评分,提高了大规模数据的计算性能,而且,单位时间的信息衰减系数为动态变化的,从而通过动态调整的信息衰减系数实现了公众号候选集的动态调整。本发明实施例实现了优先 推荐活跃程度高的公众号,而且,本发明实施例提供的系统会根据数据量的增长和变化进行自动学习。
以上显示和描述了本发明的基本原理和主要特征和本发明的优点。本发明不受上述实施例的限制,上述实施例和说明书中描述的只是说明本发明的原理,在不脱离本发明精神和范围的前提下,本发明还会有各种变化和改进,这些变化和改进都落入要求保护的本发明范围内。
工业实用性
如上所述,本发明实施例提供的一种公众号推荐方法及系统具有以下有益效果:通过将大量用户数据进行并行处理,提升了大规模数据的计算性能;同时,根据最邻近用户及公众号的单位时间的最新信息衰减系数确定向待计算用户推荐的公众号,实现了优先推荐活跃度高的公众号。

Claims (16)

  1. 一种公众号推荐方法,包括:
    将从数据库读取的用户数据进行分组;
    针对待计算用户,在每个组内,确定所述待计算用户的最邻近用户,并根据所述最邻近用户关注的所有公众号以及对应公众号的单位时间的最新信息衰减系数确定该组针对所述待计算用户的推荐公众号;
    根据所有组针对所述待计算用户的推荐公众号,确定所述待计算用户的最终推荐公众号。
  2. 如权利要求1所述的方法,其中,所述将从数据库读取的用户数据进行分组包括:
    根据用户数据总量与分解粒度系数的比值确定组数;
    按照确定的组数将相应数目的用户数据分到相应的组中。
  3. 如权利要求1所述的方法,其中,所述在每个组内,确定所述待计算用户的最邻近用户包括:抽取所述待计算用户的特征值以及该组内所有用户的特征值,计算所述待计算用户的特征值与该组内每个用户的特征值之间的欧氏距离,按照欧氏距离由小到大的顺序,确定第一预定数目的用户为该组内所述待计算用户的最邻近用户。
  4. 如权利要求3所述的方法,其中,所述用户的特征值包括对应于以下至少一项特征的特征值:性别、年龄、所在城市、所属行业、职业、收入水平、教育程度、婚姻状况。
  5. 如权利要求3所述的方法,其中,所述根据所述最邻近用户关注的所有公众号以及对应公众号的单位时间的最新信息衰减系数确定该组针对所述待计算用户的推荐公众号包括:在每个组内,计算 所述待计算用户的最邻近用户的所有关注的公众号的评分,所述公众号的评分等于关注该公众号的最邻近用户与待计算用户的特征值之间的欧氏距离与该公众号的单位时间的最新信息衰减系数的乘积,按照所述公众号的评分由大到小的顺序,确定第二预定数目的公众号为该组针对所述待计算用户的推荐公众号。
  6. 如权利要求5所述的方法,其中,所述公众号的单位时间的信息衰减系数根据以下参数中的一个或多个计算得到:单位时间内的公众号的关注量、查看量及点击量。
  7. 如权利要求5所述的方法,其中,所述根据所有组针对所述待计算用户的推荐公众号,确定所述待计算用户的最终推荐公众号包括:
    根据每个组针对所述待计算用户的推荐公众号的评分由大到小的顺序,确定第三预定数目的公众号为所有组针对待计算用户的最终推荐公众号,其中,所述第三预定数据小于或等于所述第二预定数目与组数的乘积。
  8. 如权利要求1所述的方法,其中,所述确定所述待计算用户的最终推荐公众号之后,还包括:将所述待计算用户的最终推荐公众号推送给所述待计算用户。
  9. 一种公众号推荐系统,包括:
    第一分析单元,设置为将从数据库读取的用户数据进行分组;
    第二分析单元,设置为针对待计算用户,在每个组内,确定所述待计算用户的最邻近用户,并根据所述最邻近用户关注的所有公众号以及对应公众号的单位时间的最新信息衰减系数确定该组针对所述待计算用户的推荐公众号;
    第三分析单元,设置为根据所有组针对所述待计算用户的推荐公众号,确定所述待计算用户的最终推荐公众号。
  10. 如权利要求9所述的系统,其中,所述第一分析单元,具体设置为:根据用户数据总量与分解粒度系数的比值确定组数;按照确定的组数将相应数目的用户数据分到相应的组中。
  11. 如权利要求9所述的系统,其中,所述第二分析单元,设置为在每个组内,确定所述待计算用户的最邻近用户,包括:抽取所述待计算用户的特征值以及该组内所有用户的特征值,计算所述待计算用户的特征值与该组内每个用户的特征值之间的欧氏距离,按照欧氏距离由小到大的顺序,确定第一预定数目的用户为该组内所述待计算用户的最邻近用户。
  12. 如权利要求11所述的系统,其中,所述用户的特征值包括对应于以下至少一项特征的特征值:性别、年龄、所在城市、所属行业、职业、收入水平、教育程度、婚姻状况。
  13. 如权利要求11所述的系统,其中,所述第二分析单元,设置为根据所述最邻近用户关注的所有公众号以及对应公众号的单位时间的最新信息衰减系数确定该组针对所述待计算用户的推荐公众号,包括:在每个组内,计算所述待计算用户的最邻近用户的所有关注的公众号的评分,所述公众号的评分等于关注该公众号的最邻近用户与待计算用户的特征值之间的欧氏距离与该公众号的单位时间的最新信息衰减系数的乘积,按照所述公众号的评分由大到小的顺序,确定第二预定数目的公众号为该组针对所述待计算用户的推荐公众号。
  14. 如权利要求13所述的系统,其中,所述公众号的单位时间的信息衰减系数根据以下参数中的一个或多个计算得到:单位时间内的公众号的关注量、查看量及点击量。
  15. 如权利要求13所述的系统,其中,所述第三分析单元,具体设置为:根据每个组针对所述待计算用户的推荐公众号的评分由大到小的顺序,确定第三预定数目的公众号为所有组针对所述待计算用户的最终推荐公众号,其中,所述第三预定数据小于或等于所述第二预定数目与组数的乘积。
  16. 如权利要求9所述的系统,其中,还包括:推送模块,设置为将所述待计算用户的最终推荐公众号推送给所述待计算用户。
PCT/CN2016/095730 2015-08-18 2016-08-17 一种公众号推荐方法及系统 WO2017028791A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510508226.7A CN106469163A (zh) 2015-08-18 2015-08-18 一种公众号推荐方法及系统
CN201510508226.7 2015-08-18

Publications (1)

Publication Number Publication Date
WO2017028791A1 true WO2017028791A1 (zh) 2017-02-23

Family

ID=58050828

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/095730 WO2017028791A1 (zh) 2015-08-18 2016-08-17 一种公众号推荐方法及系统

Country Status (2)

Country Link
CN (1) CN106469163A (zh)
WO (1) WO2017028791A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109710857A (zh) * 2018-12-27 2019-05-03 杭州启迪万华科技产业发展有限公司 一种公众号推荐方法和装置

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109062963B (zh) * 2018-06-27 2021-06-04 阿里巴巴(中国)有限公司 自媒体推荐方法、装置及电子设备
CN109614542B (zh) * 2018-12-11 2024-05-14 平安科技(深圳)有限公司 公众号推荐方法、装置、计算机设备及存储介质
CN114996561B (zh) * 2021-03-02 2024-03-29 腾讯科技(深圳)有限公司 一种基于人工智能的信息推荐方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130036121A1 (en) * 2011-08-01 2013-02-07 Nhn Corporation System and method for recommending blog
CN103116589A (zh) * 2011-11-17 2013-05-22 腾讯科技(深圳)有限公司 一种发送推荐信息的方法及装置
CN103166930A (zh) * 2011-12-15 2013-06-19 腾讯科技(深圳)有限公司 推送网络信息的方法和系统
CN103488714A (zh) * 2013-09-11 2014-01-01 杭州东信北邮信息技术有限公司 一种基于社交网络的图书推荐方法和系统
KR20140093795A (ko) * 2013-01-16 2014-07-29 에스케이플래닛 주식회사 컨텐츠 추천 서비스 시스템 및 컨텐츠 추천 서비스 방법
CN104573109A (zh) * 2015-01-30 2015-04-29 深圳市中兴移动通信有限公司 一种基于群组关系的自动推荐方法、终端及系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7739314B2 (en) * 2005-08-15 2010-06-15 Google Inc. Scalable user clustering based on set similarity
CN102780920A (zh) * 2011-07-05 2012-11-14 上海奂讯通信安装工程有限公司 电视节目推荐方法及系统
CN104598583B (zh) * 2015-01-14 2018-01-09 百度在线网络技术(北京)有限公司 查询语句推荐列表的生成方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130036121A1 (en) * 2011-08-01 2013-02-07 Nhn Corporation System and method for recommending blog
CN103116589A (zh) * 2011-11-17 2013-05-22 腾讯科技(深圳)有限公司 一种发送推荐信息的方法及装置
CN103166930A (zh) * 2011-12-15 2013-06-19 腾讯科技(深圳)有限公司 推送网络信息的方法和系统
KR20140093795A (ko) * 2013-01-16 2014-07-29 에스케이플래닛 주식회사 컨텐츠 추천 서비스 시스템 및 컨텐츠 추천 서비스 방법
CN103488714A (zh) * 2013-09-11 2014-01-01 杭州东信北邮信息技术有限公司 一种基于社交网络的图书推荐方法和系统
CN104573109A (zh) * 2015-01-30 2015-04-29 深圳市中兴移动通信有限公司 一种基于群组关系的自动推荐方法、终端及系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109710857A (zh) * 2018-12-27 2019-05-03 杭州启迪万华科技产业发展有限公司 一种公众号推荐方法和装置

Also Published As

Publication number Publication date
CN106469163A (zh) 2017-03-01

Similar Documents

Publication Publication Date Title
US11710054B2 (en) Information recommendation method, apparatus, and server based on user data in an online forum
US10552488B2 (en) Social media user recommendation system and method
Nguyen et al. Real-time event detection for online behavioral analysis of big social data
US9934512B2 (en) Identifying influential users of a social networking service
US9455891B2 (en) Methods, apparatus, and articles of manufacture to determine a network efficacy
WO2022141861A1 (zh) 情感分类方法、装置、电子设备及存储介质
JP6167493B2 (ja) 情報を管理するための方法、コンピュータプログラム、記憶媒体及びシステム
US8682830B2 (en) Information processing apparatus, information processing method, and program
US20180322188A1 (en) Automatic conversation creator for news
CN109033408B (zh) 信息推送方法及装置、计算机可读存储介质、电子设备
WO2017028791A1 (zh) 一种公众号推荐方法及系统
US20170140397A1 (en) Measuring influence propagation within networks
US9407589B2 (en) System and method for following topics in an electronic textual conversation
US20140147048A1 (en) Document quality measurement
CN111279332A (zh) 使用机器学习模型来生成电子通信的请求不可知的交互分值以及利用请求不可知的交互分值
CN111259220B (zh) 一种基于大数据的数据采集方法和系统
CN103117891A (zh) 微博平台上的僵尸用户探测方法
CN110991742A (zh) 一种社交网络信息转发概率预测方法及系统
CN110413842B (zh) 基于舆情态势感知的内容审核方法系统电子设备及介质
WO2011159863A1 (en) A system and method for query temporality analysis
Hu et al. Predicting key events in the popularity evolution of online information
US20180121824A1 (en) Artificial Intelligence for Decision Making Based on Machine Learning of Human Decision Making Process
US11144599B2 (en) Method of and system for clustering documents
Yoon et al. DiTeX: Disease-related topic extraction system through internet-based sources
US20220292127A1 (en) Information management system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16836655

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16836655

Country of ref document: EP

Kind code of ref document: A1