WO2021103508A1 - Seed user selection method, apparatus and device, and storage medium - Google Patents

Seed user selection method, apparatus and device, and storage medium Download PDF

Info

Publication number
WO2021103508A1
WO2021103508A1 PCT/CN2020/097517 CN2020097517W WO2021103508A1 WO 2021103508 A1 WO2021103508 A1 WO 2021103508A1 CN 2020097517 W CN2020097517 W CN 2020097517W WO 2021103508 A1 WO2021103508 A1 WO 2021103508A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
influence
users
seed
parameter
Prior art date
Application number
PCT/CN2020/097517
Other languages
French (fr)
Chinese (zh)
Inventor
陈啟柱
Original Assignee
北京三快在线科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京三快在线科技有限公司 filed Critical 北京三快在线科技有限公司
Publication of WO2021103508A1 publication Critical patent/WO2021103508A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Definitions

  • the embodiments of the present application relate to the field of Internet technology, and in particular to a method, device, equipment, and storage medium for selecting seed users.
  • operators of online social network platforms usually select some seed users from users registered on their platform based on the behaviors of users registered on their platform, and provide certain feedback to the selected seed users. , Enabling seed users to promote the operator’s products to the surrounding population.
  • the embodiments of the present application provide a method, device, equipment, and storage medium for selecting seed users.
  • the technical solution is as follows:
  • an embodiment of the present application provides a method for selecting seed users, and the method includes:
  • the user collection data includes n users, where n is an integer greater than 1;
  • each of the user groups includes two users having an association relationship, and the m is a positive integer
  • the influence parameter of the first user relative to the second user determines the influence parameter of the first user relative to the second user, and the influence parameter is used to characterize that the first user successfully recommends the product to the Probability of the second user;
  • the influence matrix being a matrix of n rows and n columns, wherein the element in the u th row and the v column in the influence matrix represents the influence parameter of the user u relative to the user v;
  • an embodiment of the present application provides a device for selecting a seed user, and the device includes:
  • the collection data acquisition module is configured to acquire user collection data, where the user collection data includes n users, and the n is an integer greater than 1;
  • a user group creation module configured to create m user groups according to the user collection data, each of the user groups includes two users having an association relationship, and the m is a positive integer;
  • the characteristic data acquisition module is configured to acquire characteristic data of the i-th user group for the i-th user group in the m user groups, wherein the i-th user group includes the first user group having an association relationship.
  • the influence parameter determination module is configured to determine the influence parameter of the first user relative to the second user according to the characteristic data of the i-th user group, and the influence parameter is used to characterize the first user. The probability that the user successfully recommends the product to the second user;
  • the matrix construction module is used to construct an influence matrix, which is a matrix with n rows and n columns, wherein the element in the u-th row and v-th column in the influence matrix represents the influence of user u relative to user v Force parameter
  • a seed user selection module configured to select at least one seed user from the n users according to the influence matrix
  • the information storage module is used to store the user information of the seed user.
  • an embodiment of the present application provides a computer device, the computer device includes a processor and a memory, and a computer program is stored in the memory.
  • the computer program is loaded and executed by the processor to realize the above-mentioned seed. The user's selection method.
  • an embodiment of the present application provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the above method for selecting a seed user is implemented.
  • a computer program product is provided.
  • the computer program product is executed by a processor, it is used to implement the above-mentioned seed user selection method.
  • the influence parameter is determined based on the characteristic data of the user group with the association relationship, and the association relationship between the users is considered, and the social network relationship is deeply mined, so that the prediction of the influence parameter is more comprehensive. It is accurate, and solves the technical problem that the related technology only considers the characteristic data of a single user, which is too singular to accurately determine the seed user.
  • FIG. 1 is a flowchart of a method for selecting seed users according to an embodiment of the present application
  • FIG. 2 is a diagram of a social network relationship provided by an embodiment of the present application.
  • Fig. 3 is a flowchart of selecting seed users according to an influence matrix provided by an embodiment of the present application
  • FIG. 4 is a flowchart of selecting seed users according to an influence matrix according to another embodiment of the present application.
  • FIG. 5 is a flowchart of a method for selecting seed users according to another embodiment of the present application.
  • FIG. 6 is a block diagram of a device for selecting seed users according to an embodiment of the present application.
  • FIG. 7 is a block diagram of a device for selecting seed users according to another embodiment of the present application.
  • Fig. 8 is a block diagram of a computer device provided by an embodiment of the present application.
  • the execution subject of each step may be a computer device, such as a server with computing and storage capabilities, or terminals such as mobile phones, tablets, multimedia playback devices, wearable devices, or other computer devices.
  • the computer device when the computer device is a server, the computer device may be a server, a server cluster composed of multiple servers, or a cloud computing service center.
  • the steps are executed by the server for introduction and description, but this does not constitute a limitation.
  • FIG. 1 shows a flowchart of a method for selecting a seed user according to an embodiment of the present application.
  • the method can include the following steps (110-170):
  • Step 110 Obtain user set data, where the user set data includes n users, where n is an integer greater than 1.
  • the server can obtain user collection data from its own stored data, or it can obtain user collection data from other computer devices with storage functions, such as from other servers, terminals, etc., which are not limited in the embodiments of the present application.
  • the user set data includes n users, and data information corresponding to the n users, such as user IDs, association relationships, and so on.
  • Step 120 Create m user groups according to the user set data, each user group includes two users having an association relationship, and m is a positive integer.
  • the association relationship refers to the relationship between two users who have the ability to send and receive information.
  • the association relationship is expressed as a social relationship.
  • the association relationship includes, but is not limited to, any of the following: friend relationship, follow and Concerned relationship, ordering relationship, etc.
  • the association relationship has different manifestations in different online social network platforms. For example, in an online social network platform such as an instant messaging application, the association relationship is expressed as a friend relationship; in an entertainment online social network platform, The association relationship is expressed as the following and being followed.
  • an association relationship between users can be constructed based on user information, where user information refers to information generated when a user uses the non-online social network platform, for example, in non-online social network platforms such as shopping
  • the association relationship between each user can be constructed based on the user information such as each user address, purchase record, order data, link sharing, network interaction, and device sharing.
  • the user group is expressed in the form of a number of pairs, for example, the user group may be expressed as (first user, second user).
  • two user groups can be created.
  • the relationship characteristics of the two user groups are different. That is, when the user group is expressed in the form of number pairs, two of the number pairs
  • the position of each element is different, the relationship characteristics represented are also different, for example, the user group (first user, second user) represents the relationship characteristics of the first user relative to the second user, and the user group (second user, first user) ) Represents the relationship feature of the second user relative to the first user.
  • the server After the server determines the user set containing n users and the association relationship between the users, it can create m user groups, and each user group includes two users with an association relationship.
  • the value of m is determined by the association relationship between users in the user set.
  • the above step 120 includes: constructing a relationship graph corresponding to the user set data, the relationship graph including n nodes, n nodes and n users are in one-to-one correspondence, and the two users having an association relationship correspond to each other. There are edges between nodes; from the relationship graph, extract m user groups. In this way, it is convenient for the server to quickly construct m user groups from the user set.
  • the relationship diagram also known as the social network relationship diagram, is used to characterize the relationship between users. After the server determines a user set containing n users, it can construct a relationship graph based on the user set. The number of nodes included in the relationship graph is the same as the number of users, and there is a one-to-one correspondence between the nodes of the relationship graph and the users.
  • the number of users in the social network is 6, the number of nodes 21 in the relationship graph is also 6, and the 6 nodes in the relationship graph are There is a one-to-one correspondence between the 6 users, that is, the number in node 21 corresponds to the user ID, node 1 represents user 1, node 2 represents user 2, node 3 represents user 3...
  • the user ID is user
  • the server can randomly sort the users in the user set after obtaining the user set, so as to obtain the user identification of each user.
  • the server may also use certain parameters after obtaining the user set.
  • the users in the user set are sorted, such as the number of strokes corresponding to the user name, the phonetic order of initials, etc., which are not limited in the embodiment of the present application.
  • the nodes corresponding to two users with an association relationship have an edge 22 between them. For example, there is an edge 22 between the node 1 and the node 2, which means that there is an association relationship between the user 1 and the user 2.
  • the server can extract m user groups according to the relationship graph. As shown in Figure 2, there are five edges 22 in the relationship graph. Since two users with an association relationship can create two user groups in this embodiment of the application, the server can extract from the relationship graph shown in Figure 2 10 user groups, namely (User 1, User 2), (User 2, User 1), (User 1, User 4), (User 4, User 1), (User 1, User 6), (User 6 , User 1), (User 6, User 5), (User 5, User 6), (User 4, User 3), and (User 3, User 4).
  • user groups namely (User 1, User 2), (User 2, User 1), (User 1, User 4), (User 4, User 1), (User 1, User 6), (User 6 , User 1), (User 6, User 5), (User 5, User 6), (User 4, User 3), and (User 3, User 4).
  • Step 130 For the i-th user group among the m user groups, obtain characteristic data of the i-th user group, where the i-th user group includes the first user and the second user having an association relationship.
  • the characteristic data represents the characteristics of the users in the user group and the characteristics of the relationship between the users.
  • the characteristic data of the i-th user group includes the characteristic data of the first user, the characteristic data of the second user, and the relationship between the first user and the second user. Characteristic data.
  • the characteristic data of different user groups composed of two associated users are not the same, for example, between user 1 and user 2. There is an association relationship.
  • the characteristic data of (User 1, User 2) includes the characteristic data of User 1, the characteristic data of User 2, and the characteristic data of the relationship between User 1 and User 2, and the characteristic data of (User 2, User 1) includes The characteristic data of user 1, the characteristic data of user 2, and the characteristic data of the relationship between user 2 and user 1.
  • the relationship feature data of user 1 relative to user 2 is different from the relationship feature data of user 2 relative to user 1. For example, if user 1 follows user 2, but user 2 does not follow user 1, then user 1 is relative The relationship feature data for user 2 is different from the relationship feature data for user 2 with respect to 1.
  • the relationship feature data refers to the feature data of the relative relationship between users.
  • the relationship feature data may include recommendation status, attention status, message status, etc., which is not limited in the embodiment of the present application.
  • Table 1 It shows the relationship feature data among the feature data of 10 user groups constructed according to the relationship diagram in FIG. 2.
  • the recommendation status indicates whether the product has been successfully recommended in history.
  • the recommendation status of (user1, user2) is 1, which means that user 1 has successfully recommended the product to user 2 in history, (user2, user1 ) If the recommendation status is 0, it means that user 2 has not successfully recommended the product to user 1 in the past;
  • the attention status indicates whether or not to follow, as shown in Table 1, if the attention status of (user 1, user 4) is 0, it means that user 1 has not Follow user 4, (user 4, user 1)'s attention situation is 1, it means that user 4 follows user 1;
  • message status indicates the number of messages sent in history, as shown in Table 1, (user 1, user 2) message situation If it is 10, it means that the number of messages sent by user 1 to user 2 in history is 10, and the message situation of (user 2, user 1) is 20, which means that the number of messages sent by user 2 to user 1 in history is 20.
  • the relationship feature data of other user groups refer to the above explanation, which will not be repeated here.
  • the user’s respective characteristic data refers to the data generated by the user when using the application.
  • the user’s respective characteristic data may include user identification, user age, user gender, consumption level, and activity records. This is not limited. For example, as shown in Table 2, it shows the respective characteristic data of the users included in the 10 user groups constructed according to the relationship diagram of FIG. 2.
  • the consumption level is used to indicate the user's consumption ability status.
  • the consumption level can be expressed by the user's average consumption amount.
  • the average consumption amount can be either a daily average consumption amount or a monthly average consumption amount, or It is the average daily consumption amount during the period of participating in the activity, which is not limited in the embodiment of this application; the activity record refers to whether the user has participated in the product recommendation activity. As shown in Table 2, the activity record of user 1 is 1, which means user 1 Participated in product recommendation activities, user 2's activity record is 0, it means that user 2 has not participated in product recommendation activities. For the feature data of other users, refer to the above explanation, which will not be repeated here.
  • the embodiments of this application only take the relationship feature data including recommendation status, attention status, and message status, and the user's respective feature data including user identification, user age, user gender, consumption level, and activity records as examples. It is noted that after understanding the technical solutions of the embodiments of the present application, those skilled in the art will easily think that the relationship feature data and the user's respective feature data include other aspects, but they should all fall within the protection scope of the present application.
  • Step 140 Determine the influence parameter of the first user relative to the second user according to the characteristic data of the i-th user group, where the influence parameter is used to characterize the probability of the first user successfully recommending the product to the second user.
  • the influence parameter is used to indicate the probability of successfully recommending the product. For example, when the influence parameter of the first user relative to the second user is 0.8, it means that the probability of the first user successfully recommending the product to the second user is 0.8.
  • the influence parameter can either be expressed in the form of numerical value or in the form of percentage, which is not limited in the embodiment of the present application.
  • the value range of the influence parameter is [0,1].
  • the above step 140 includes: invoking the influence calculation model, and calculating the influence parameter of the first user relative to the second user according to the characteristic data of the i-th user group; wherein, the i-th user group
  • the characteristic data of includes: characteristic data of the first user, characteristic data of the second user, and characteristic data of the relationship between the first user and the second user.
  • the influence calculation model is a model trained on historical data.
  • the influence calculation model can be a binary classification model, such as LR (Logistic Regression) model, neural network model, GBDT (Gradient Boosting Decision Tree, gradient The descending tree) model, etc.; the influence calculation model may also be a regression model, which is not limited in the embodiment of the present application.
  • the training process of the influence calculation model is as follows: construct at least one training sample, each training sample includes a sample user group; obtain the feature data and labels of the training samples, and the label is used to represent the first sample in the training sample Whether the user has successfully recommended the product to the second sample of users; use the training sample to train the influence calculation model to obtain the influence calculation model that has completed the training.
  • Training samples refer to the samples used to train the influence calculation model. Each training sample includes a sample user group.
  • the embodiment of this application does not limit the specific number of training samples.
  • the server processing cost and influence calculation model can be combined The accuracy of these two factors are used to comprehensively determine the specific number of training samples.
  • the server may determine a sample user set based on the historical data, then construct a sample relationship tree based on the sample user set, and then extract training samples from the sample relationship tree.
  • the label refers to the recommendation of the first sample user to the second sample user in the sample user group corresponding to the training sample.
  • the value of the label is 0 or 1, and the value 1 indicates that the product has been successfully recommended.
  • a value of 0 means that the product has not been successfully recommended.
  • the label corresponding to the sample user group (the first sample user, the second sample user) is 1, it means that the first sample user has successfully recommended the product to the second sample user ; If the label corresponding to the sample user group (the first sample user, the second sample user) is 0, it means that the first sample user has not successfully recommended a product to the second sample user.
  • the feature data of the training sample includes the feature data of each sample user and the relationship feature data between the sample users. Based on the explanations of the characteristic data of each user and the characteristic data of the relationship between the users in the above step 130, the characteristic data of each sample user in step 140 and the explanation of the characteristic data of the relationship between the sample users are obtained here. Please refer to the above, so I won't repeat it here.
  • the relationship feature data between sample users also includes historical influence parameters. Indicates the influence of sample users in the sample user group. For example, if the historical influence parameter corresponding to the sample user group (the first sample user, the second sample user) is 0.2, it means that the first sample user has The influence parameter of the second sample of users is 0.2.
  • the server After the server obtains the training sample and its corresponding feature data and labels, it selects a suitable model as the influence calculation model, such as a binary classification model or a regression model, and then uses the training sample to train the influence calculation model to complete The influence calculation model after training.
  • a suitable model such as a binary classification model or a regression model
  • Step 150 Construct an influence matrix.
  • the influence matrix is a matrix with n rows and n columns.
  • the element in the u-th row and v-th column in the influence matrix represents the influence parameter of the user u relative to the user v.
  • the influence parameter of user u relative to user v, and the influence parameter of user v relative to user u can be calculated through the influence calculation model ; If there is no correlation between user u and user v, then the influence parameter of user u relative to user v, and the influence parameter of user v relative to user u, do not need to be calculated, and can be directly recorded as 0. The influence parameter of each user relative to itself is also recorded as 0.
  • Step 160 According to the influence matrix, at least one seed user is selected from n users.
  • the influence parameters of each of the n users relative to each other user can be obtained.
  • the influence parameter of each user in n users relative to each other user can refer to the influence parameter of each user relative to other users that have an association relationship with the user, or it can refer to the influence parameter of each user relative to n users
  • the influence parameter of each user in the embodiment of the present application does not limit this.
  • the influence parameter of each user in the n users relative to each other user refers to the influence parameter of each user relative to each user in the n users
  • the user is related to the user
  • the influence parameters of other users in the relationship are obtained through the influence calculation model.
  • the influence parameters of the user relative to other users that are not associated with the user are 0, and the influence parameters of the user relative to the user itself are also 0.
  • the server determines the influence matrix it can select at least one seed user from the n users according to a certain selection method.
  • Step 170 Store the user information of the seed user.
  • the server After the server determines the seed user, it can store the user information of the seed user in its own memory, or store the user information of the seed user in the memory of other computer equipment, such as other servers, terminals, etc.
  • This application is implemented The example does not limit this.
  • the technical solutions provided by the embodiments of the present application determine the influence parameters between users according to the characteristic data of each user group, and then construct the influence matrix according to the influence parameters between users, and The influence matrix selects seed users, which expands a method of selecting seed users.
  • the influence parameter is determined based on the characteristic data of the user group with the association relationship, and the association relationship between the users is considered, and the social network relationship is deeply mined, so that the prediction of the influence parameter is more comprehensive. It is accurate, and solves the technical problem that the related technology only considers the characteristic data of a single user, which is too singular to accurately determine the seed user.
  • the influence parameter between users is determined according to the characteristic data by completing the trained influence calculation model, so that the server can calculate the influence parameter more simply.
  • the influence calculation model is obtained by training based on historical feature data, so that through the influence calculation model, the server can predict the influence parameters more truthfully and accurately, which improves the accuracy of seed users.
  • the above selection of at least one seed user from n users according to the influence matrix includes the following steps (1041-1047):
  • Step 1041 Calculate the comprehensive influence parameter of each of the n users according to the influence matrix, where the comprehensive influence parameter of the user u is used to represent the comprehensive probability of the user u successfully recommending the product to each other user.
  • the comprehensive influence parameter refers to the comprehensive probability of the user successfully recommending the product to each user.
  • the comprehensive influence parameter is obtained by the accumulation of the user's influence parameters relative to other users.
  • the calculation formula of the comprehensive influence parameter is as follows Shown:
  • Wu represents the comprehensive influence parameter of user u
  • Wuj represents the influence parameter of user u relative to user j
  • u is a positive integer less than or equal to n
  • j is a positive integer less than or equal to n.
  • Step 1042 build a seed user set.
  • the seed user set is initially empty.
  • Step 1043 From the non-seed users, select users s whose comprehensive influence parameters meet the conditions to be added to the seed user set, where the non-seed users refer to users who have not been added to the seed user set among the n users.
  • Eligibility for the comprehensive influence parameter may mean that the comprehensive influence parameter is the largest, or it may mean that the comprehensive influence parameter reaches a preset threshold, which is not limited in the embodiment of the present application. It should be noted that the introduction and description of the following embodiments only use the maximum comprehensive influence parameter as the comprehensive influence parameter to meet the conditions. Those skilled in the art will easily think of others after understanding the technical solution of this application. The technical solutions, such as the comprehensive influence parameter reaching the preset threshold value as the qualified embodiment of the comprehensive influence parameter, should all fall within the protection scope of this application.
  • the number of non-seed users is the number of users in the user set, that is, the number of non-seed users is n.
  • the server determines the non-seed users and their corresponding comprehensive influence parameters, it selects the user with the largest comprehensive influence parameter from the non-seed users to join the seed user set.
  • the formula for selecting the user with the largest comprehensive influence parameter from non-seed users is as follows:
  • the set U represents the user set
  • the set S represents the seed user set
  • the user s represents the user with the largest comprehensive influence parameter selected from the non-seed users
  • s is a positive integer less than or equal to n.
  • Step 1044 Subtract the value of the s-th row element from the value of each row element in the influence matrix to obtain an updated influence matrix, where the s-th row element includes the influence parameter of the user s relative to each other user.
  • the influence parameter of the user s relative to each user selected according to step 1043 corresponds to the value of the s-th row element in the influence matrix. After subtracting the s-th row element from the value of each row element in the influence matrix, the updated influence matrix can be obtained.
  • the update formula of the influence matrix is as follows:
  • Wnj represents the element in the nth row and jth column in the influence matrix
  • Wsj represents the element in the sth row and jth column in the influence matrix.
  • the server determines the difference between Wnj and Wsj as the updated Wnj.
  • the value range of the aforementioned influence parameter is [0, 1]; the aforementioned value of each row element in the influence matrix is subtracted from the value of the s-th row element ,
  • To get the updated influence matrix including: subtract the value of the element in the sth row from the value of the element in the n rows of the influence matrix to obtain the calculated value of the n rows of elements; for the calculated n rows of elements For target elements whose median value is less than zero, modify the value of the target element to 0 to obtain the updated influence matrix.
  • the server After the server subtracts the value of the element in the n row of the influence matrix from the value of the element in the sth row, it can compare the calculated value of the n row element with 0. If any value is less than 0, the value is modified to 0 , To ensure that the values of the n rows of elements in the updated influence matrix are all greater than 0.
  • the update formula of the influence matrix is as follows:
  • Step 1045 Determine whether the stop condition selected by the seed user is satisfied
  • Step 1046 If the stop condition selected by the seed user is not met, based on the updated influence matrix, the execution starts again from the step of calculating the comprehensive influence parameter of each of the n users according to the influence matrix.
  • the server continuously repeats steps 1042 to 1044 based on the updated influence matrix.
  • the stop condition may be a condition preset by the server.
  • the stop condition The number of elements in the seed user set may reach a preset threshold, such as 10, or the number of loop executions may reach a preset number of times, such as 5 times, which is not limited in the embodiment of the present application.
  • Step 1047 If the stop condition selected by the seed user is satisfied, the user in the seed user set is determined as the seed user.
  • the technical solution provided by the embodiments of the present application constructs an influence matrix and a set of seed users, and sets the stop condition for seed user selection.
  • the stop condition for seed user selection is not met, it is continuously based on the updated The influence matrix, select the user with the largest comprehensive influence parameter from the user set to add to the seed user set, thereby avoiding the possibility of excessive recommendation to a single user and wasting seed user resources, so as to achieve the purpose of selecting seed users reasonably .
  • the above-mentioned selecting at least one seed user from n users according to the influence matrix includes the following steps (104A-104B):
  • Step 104A calculate the comprehensive influence parameter and comprehensive influence parameter of each of the n users, where the comprehensive influence parameter of user u is used to characterize that user u successfully recommends the product to each other user
  • the comprehensive probability of user u is used to characterize the comprehensive probability of each other user recommending the product to user u successfully.
  • the above step 104A includes: for the user u among the n users, obtaining the influence parameters of the user u relative to each other user; and summing the influence parameters of the user u relative to each other user to obtain the user u
  • the comprehensive influence parameter For example, the calculation formula of the comprehensive influence parameter is as follows:
  • Wun represents the comprehensive influence parameter of user u
  • Wuj represents the influence parameter of user u relative to user j
  • u is a positive integer less than or equal to n
  • j is a positive integer less than or equal to n.
  • the above step 104A includes: for the user u among the n users, obtaining the influence parameters of each other user relative to the user u; and calculating the maximum or average value of the influence parameters of each other user relative to the user u, Determined as the comprehensive influence parameter of user u.
  • the calculation formula for the comprehensive affected force parameter is as follows:
  • Wnu represents the comprehensive influence parameter of user u
  • Wiu represents the influence parameter of user i relative to user u
  • u is a positive integer less than or equal to n
  • i is a positive integer less than or equal to n.
  • step 104B at least one seed user is selected from the n users according to the comprehensive influence parameter and the comprehensive influence parameter of each of the n users.
  • Wpq is the element in the p-th row and q-th column of the influence matrix W, and Wpq represents the influence parameter of user p relative to user q, p is a positive integer less than or equal to n, and q is less than or equal to n Positive integer.
  • Step 104B includes the following steps:
  • xR 1
  • N 1
  • the element in the column vector r represents a reasonable influence parameter of the user.
  • the technical solution provided by the embodiments of the present application constructs an influence matrix, and calculates the comprehensive influence parameter and the comprehensive influenced parameter of each user according to the influence matrix, and then calculates the comprehensive influence parameter according to the comprehensive influence parameter. And integrated the parameters of the influence, select seed users from the user set, so as to realize the comprehensive consideration of the influence of the seed users, rationally select the seed users, and avoid the selected seed users to over-market or under-market a single user.
  • FIG. 5 shows a flowchart of a method for selecting a seed user according to another embodiment of the present application.
  • the method can include the following steps (501 ⁇ 509):
  • Step 501 construct a training sample
  • Step 502 Obtain training samples and their corresponding feature data and labels
  • Step 503 adopt appropriate model training to obtain an influence calculation model
  • Step 504 construct a user group
  • Step 505 Obtain characteristic data corresponding to the user group
  • Step 506 input the feature data into the influence calculation model
  • Step 507 Calculate the influence parameters of each user
  • Step 508 Select seed users according to the greedy algorithm, which is the selection method described in steps 1041 to 1046;
  • Step 509 Select seed users according to the optimization algorithm, which is the selection method described in step 104A to step 104B.
  • FIG. 6 shows a block diagram of an apparatus for selecting a seed user according to an embodiment of the present application.
  • the device has the function of realizing the above method example, and the function can be realized by hardware, or by hardware executing corresponding software.
  • the device can be the computer equipment introduced above, or can be set in the computer equipment.
  • the device 700 may include: a collection data acquisition module 710, a user group creation module 720, a characteristic data acquisition module 730, an influence parameter determination module 740, a matrix construction module 750, a seed user selection module 760, and an information storage module 770.
  • the collection data obtaining module 710 is configured to obtain user collection data, where the user collection data includes n users, and the n is an integer greater than 1;
  • the user group creation module 720 is configured to create m user groups according to the user collection data, each of the user groups includes two users having an association relationship, and the m is a positive integer;
  • the characteristic data obtaining module 730 is configured to obtain characteristic data of the i-th user group for the i-th user group in the m user groups, wherein the i-th user group includes the i-th user group having an association relationship. A user and a second user;
  • the influence parameter determination module 740 is configured to determine the influence parameter of the first user relative to the second user according to the characteristic data of the i-th user group, and the influence parameter is used to characterize the first user. Probability of a user successfully recommending a product to the second user;
  • the matrix construction module 750 is configured to construct an influence matrix, which is a matrix of n rows and n columns, wherein the element in the u-th row and v-th column in the influence matrix represents the relationship between user u and user v Influence parameters;
  • the seed user selection module 760 is configured to select at least one seed user from the n users according to the influence matrix
  • the information storage module 770 is used to store the user information of the seed user.
  • the seed user selection module 760 includes: a comprehensive influence parameter calculation sub-module 761, configured to calculate the comprehensive influence of each of the n users according to the influence matrix Power parameter, wherein the comprehensive influence parameter of the user u is used to characterize the comprehensive probability that the user u successfully recommends the product to each other user; the seed user set construction sub-module 762 is used to construct the seed user set, the The seed user set is initially empty; the user selection submodule 763 is used to select from non-seed users, users who meet the conditions of the comprehensive influence parameter and add them to the seed user set, where the non-seed users refer to Users who are not added to the seed user set among the n users; a matrix update sub-module 764 for subtracting the value of each row element in the influence matrix from the value of the s-th row element to obtain an update The following influence matrix, wherein the s-th row element includes the influence parameter of the user s relative to each other user; the loop sub-module 765,
  • the value range of the influence parameter is [0, 1]; the matrix update sub-module 764 is configured to: , Respectively subtract the value of the element in the sth row to obtain the value of the element in the n rows after calculation; for the target element whose value is less than zero among the elements in the n rows after the calculation, modify the value of the target element to 0 , Get the updated influence matrix.
  • the seed user selection module 760 includes: a comprehensive parameter calculation sub-module 767, configured to calculate the comprehensive influence parameter of each of the n users according to the influence matrix And the integrated influential parameter, where the integrated influential parameter of user u is used to characterize the overall probability of the user u successfully recommending the product to each other user, and the integrated influential parameter of user u is used to characterize each other The comprehensive probability that the user successfully recommends the product to the user u; the seed user selection sub-module 768 is used to select the comprehensive influence parameters and comprehensive influence parameters of each of the n users from the n At least one of the seed users is selected from the users.
  • the comprehensive parameter calculation submodule 767 is configured to: for the user u among the n users, obtain the influence parameter of the user u relative to each other user; Summing the influence parameters of the user u relative to each other user obtains the comprehensive influence parameters of the user u.
  • the comprehensive parameter calculation submodule 767 is configured to: for the user u among the n users, obtain the influence parameters of each other user relative to the user u; The maximum or average value of the influence parameters of each other user relative to the user u is determined as the comprehensive influence parameter of the user u.
  • the influence matrix W is as follows:
  • Wpq is the element in the p-th row and q-th column in the influence matrix W, and the Wpq represents the influence parameter of the user p relative to the user q, and the p is a positive integer less than or equal to the n , Said q is a positive integer less than or equal to said n;
  • the seed user selection submodule 767 is used to:
  • xR if the value of xR is 1, it means that the user R is selected as the seed user, and the R is a positive integer less than or equal to the n;
  • the column vector x is calculated based on the following formula:
  • the influence parameter determination module 740 is configured to: call an influence calculation model, and calculate the influence parameter of the first user relative to the second user according to the characteristic data of the i-th user group
  • the characteristic data of the i-th user group includes: characteristic data of the first user, characteristic data of the second user, and characteristic data of the relationship between the first user and the second user.
  • the training process of the influence calculation model is as follows: at least one training sample is constructed, each of the training samples includes a sample user group; the feature data and labels of the training samples are obtained, and the labels are used to characterize all the training samples. Whether the first sample user in the training sample has successfully recommended a product to the second sample user; the training sample is used to train the influence calculation model to obtain the influence calculation model that has completed the training.
  • the technical solutions provided by the embodiments of the present application determine the influence parameters between users according to the characteristic data of each user group, and then construct the influence matrix according to the influence parameters between users, and The influence matrix selects seed users, which expands a method of selecting seed users.
  • the influence parameter is determined based on the characteristic data of the user group with the association relationship, and the association relationship between the users is considered, and the social network relationship is deeply mined, so that the prediction of the influence parameter is more comprehensive. It is accurate, and solves the technical problem that the related technology only considers the characteristic data of a single user, which is too singular to accurately determine the seed user.
  • the influence parameter between users is determined according to the characteristic data by completing the trained influence calculation model, so that the server can calculate the influence parameter more simply.
  • the influence calculation model is obtained by training based on historical feature data, so that through the influence calculation model, the server can predict the influence parameters more truthfully and accurately, which improves the accuracy of seed users.
  • the device provided in the above embodiment when implementing its functions, only uses the division of the above functional modules as an example. In actual applications, the above functions can be allocated by different functional modules as needed, i.e. The internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the device and method embodiments provided in the above-mentioned embodiments belong to the same conception, and the specific implementation process is detailed in the method embodiments, which will not be repeated here.
  • FIG. 8 shows a structural block diagram of a computer device provided by an embodiment of the present application.
  • the computer device can be used to implement the seed user selection method provided in the above-mentioned embodiment.
  • the computer device may be the server described above. Specifically:
  • the computer device 800 includes a processing unit (such as a CPU (Central Processing Unit, central processing unit), GPU (Graphics Processing Unit, graphics processor), and FPGA (Field Programmable Gate Array, field programmable logic gate array), etc.) 801, including The system memory 804 of RAM (Random-Access Memory) 802 and ROM (Read-Only Memory) 803, and the system bus 805 connecting the system memory 804 and the central processing unit 801.
  • the computer device 800 also includes a basic input/output system (I/O system) 806 that helps to transfer information between various devices in the computer device, and a large capacity for storing the operating system 813, application programs 814, and other program modules 815.
  • the basic input/output system 806 includes a display 808 for displaying information and an input device 809 such as a mouse and a keyboard for the user to input information.
  • the display 808 and the input device 809 are both connected to the central processing unit 801 through the input and output controller 810 connected to the system bus 805.
  • the basic input/output system 806 may also include an input and output controller 810 for receiving and processing input from multiple other devices such as a keyboard, a mouse, or an electronic stylus.
  • the input and output controller 810 also provides output to a display screen, a printer, or other types of output devices.
  • the mass storage device 807 is connected to the central processing unit 801 through a mass storage controller (not shown) connected to the system bus 805.
  • the mass storage device 807 and its associated computer-readable medium provide non-volatile storage for the computer device 800. That is, the mass storage device 807 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM (Compact Disc Read-Only Memory) drive.
  • the computer-readable media may include computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storing information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media include RAM, ROM, EPROM (Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), flash memory or Other solid-state storage technologies, such as CD-ROM, DVD (Digital Video Disc, high-density digital video disc) or other optical storage, tape cartridges, magnetic tape, disk storage or other magnetic storage devices.
  • CD-ROM Compact Disc
  • DVD Digital Video Disc, high-density digital video disc
  • the computer storage medium is not limited to the above-mentioned types.
  • the aforementioned system memory 704 and mass storage device 807 may be collectively referred to as a memory.
  • the computer device 800 may also be connected to a remote computer on the network through a network such as the Internet to run. That is, the computer device 800 can be connected to the network 812 through the network interface unit 811 connected to the system bus 805, or in other words, the network interface unit 811 can also be used to connect to other types of networks or remote computer systems (not shown) .
  • the memory also includes a computer program, which is stored in the memory and configured to be executed by one or more processors, so as to implement the above-mentioned method for selecting a seed user.
  • a non-transitory computer-readable storage medium on which a computer program is stored, and the computer program is executed by a processor to implement the above-mentioned method for selecting a seed user.
  • the computer-readable storage medium may include: ROM (Read-Only Memory), RAM (Random-Access Memory, random access memory), SSD (Solid State Drives, solid state hard disk), or optical disk.
  • random access memory may include ReRAM (Resistance Random Access Memory) and DRAM (Dynamic Random Access Memory).
  • a computer program product is also provided, which is used to implement the above-mentioned seed user selection method when the computer program product is executed by a processor.
  • the "plurality” mentioned herein refers to two or more.
  • “And/or” describes the association relationship of the associated object, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone.
  • the character "/” generally indicates that the associated objects before and after are in an "or” relationship.
  • the numbering of the steps described in this article only exemplarily shows a possible order of execution among the steps. In some other embodiments, the above steps may also be executed out of the order of the numbers, such as two different numbers. The steps are executed at the same time, or the two steps with different numbers are executed in the reverse order from the figure, which is not limited in the embodiment of the present application.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Tourism & Hospitality (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A seed user selection method and a corresponding apparatus. The method comprises: obtaining user set data; creating m user groups according to the user set data; for the i-th user group in the m user groups, obtaining feature data of the i-th user group; determining an influence parameter of the first user with respect to the second user according to the feature data of the i-th user group; constructing an influence matrix; selecting at least one seed user from n users according to the influence matrix; and storing user information of the seed user.

Description

种子用户的选取方法、装置、设备及存储介质Seed user selection method, device, equipment and storage medium
本公开要求于2019年11月25日提交的申请号为201911168479.9、发明名称为“种子用户的选取方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This disclosure claims the priority of the Chinese patent application filed on November 25, 2019, with the application number 201911168479.9 and the invention title "Seed user selection method, device, equipment and storage medium", the entire content of which is incorporated herein by reference. In the open.
技术领域Technical field
本申请实施例涉及互联网技术领域,特别涉及一种种子用户的选取方法、装置、设备及存储介质。The embodiments of the present application relate to the field of Internet technology, and in particular to a method, device, equipment, and storage medium for selecting seed users.
背景技术Background technique
随着在线社交网络平台与日常生活的融合越来越紧密,在线社交网络平台的商业价值也被越来越多地挖掘与利用。As online social networking platforms become more and more closely integrated with daily life, the commercial value of online social networking platforms has been increasingly tapped and utilized.
在相关技术中,在线社交网络平台的运营商通常会根据在其平台中注册的用户的行为,从其平台中注册的用户中选取出一些种子用户,通过给选取出的种子用户以一定的回馈,使得种子用户向其周围人群推广运营商的产品。In related technologies, operators of online social network platforms usually select some seed users from users registered on their platform based on the behaviors of users registered on their platform, and provide certain feedback to the selected seed users. , Enabling seed users to promote the operator’s products to the surrounding population.
发明内容Summary of the invention
本申请实施例提供了一种种子用户的选取方法、装置、设备及存储介质。所述技术方案如下:The embodiments of the present application provide a method, device, equipment, and storage medium for selecting seed users. The technical solution is as follows:
一方面,本申请实施例提供了一种种子用户的选取方法,所述方法包括:On the one hand, an embodiment of the present application provides a method for selecting seed users, and the method includes:
获取用户集合数据,所述用户集合数据包括n个用户,所述n为大于1的整数;Acquiring user collection data, where the user collection data includes n users, where n is an integer greater than 1;
根据所述用户集合数据创建m个用户组,每个所述用户组包括具有关联关系的两个用户,所述m为正整数;Creating m user groups according to the user collection data, each of the user groups includes two users having an association relationship, and the m is a positive integer;
对于所述m个用户组中的第i个用户组,获取所述第i个用户组的特征数据,其中,所述第i个用户组包括具有关联关系的第一用户和第二用户;For the i-th user group in the m user groups, acquiring characteristic data of the i-th user group, where the i-th user group includes a first user and a second user that have an association relationship;
根据所述第i个用户组的特征数据,确定所述第一用户相对于所述第二用户的影响力参数,所述影响力参数用于表征所述第一用户将产品成功推荐给所述第二用户的概率;According to the characteristic data of the i-th user group, determine the influence parameter of the first user relative to the second user, and the influence parameter is used to characterize that the first user successfully recommends the product to the Probability of the second user;
构建影响力矩阵,所述影响力矩阵为n行n列矩阵,其中,所述影响力矩阵中的第u行第v列的元素,表示用户u相对于用户v的影响力参数;Constructing an influence matrix, the influence matrix being a matrix of n rows and n columns, wherein the element in the u th row and the v column in the influence matrix represents the influence parameter of the user u relative to the user v;
根据所述影响力矩阵,从所述n个用户中选取至少一个种子用户;Selecting at least one seed user from the n users according to the influence matrix;
存储所述种子用户的用户信息。Store the user information of the seed user.
另一方面,本申请实施例提供了一种种子用户的选取装置,所述装置包括:On the other hand, an embodiment of the present application provides a device for selecting a seed user, and the device includes:
集合数据获取模块,用于获取用户集合数据,所述用户集合数据包括n个用户,所述n为大于1的整数;The collection data acquisition module is configured to acquire user collection data, where the user collection data includes n users, and the n is an integer greater than 1;
用户组创建模块,用于根据所述用户集合数据创建m个用户组,每个所述用户组包括具有关联关系的两个用户,所述m为正整数;A user group creation module, configured to create m user groups according to the user collection data, each of the user groups includes two users having an association relationship, and the m is a positive integer;
特征数据获取模块,用于对于所述m个用户组中的第i个用户组,获取所述第i个用户组的特征数据,其中,所述第i个用户组包括具有关联关系的第一用户和第二用户;The characteristic data acquisition module is configured to acquire characteristic data of the i-th user group for the i-th user group in the m user groups, wherein the i-th user group includes the first user group having an association relationship. User and second user;
影响力参数确定模块,用于根据所述第i个用户组的特征数据,确定所述第一用户相对于所述第二用户的影响力参数,所述影响力参数用于表征所述第一用户将产品成功推荐给所述第二用户的概率;The influence parameter determination module is configured to determine the influence parameter of the first user relative to the second user according to the characteristic data of the i-th user group, and the influence parameter is used to characterize the first user. The probability that the user successfully recommends the product to the second user;
矩阵构建模块,用于构建影响力矩阵,所述影响力矩阵为n行n列矩阵,其中,所述影响力矩阵中的第u行第v列的元素,表示用户u相对于用户v的影响力参数;The matrix construction module is used to construct an influence matrix, which is a matrix with n rows and n columns, wherein the element in the u-th row and v-th column in the influence matrix represents the influence of user u relative to user v Force parameter
种子用户选取模块,用于根据所述影响力矩阵,从所述n个用户中选取至少一个种子用户;A seed user selection module, configured to select at least one seed user from the n users according to the influence matrix;
信息存储模块,用于存储所述种子用户的用户信息。The information storage module is used to store the user information of the seed user.
再一方面,本申请实施例提供了一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有计算机程序,所述计算机程序由所述处理器加载并执行以实现上述种子用户的选取方法。In another aspect, an embodiment of the present application provides a computer device, the computer device includes a processor and a memory, and a computer program is stored in the memory. The computer program is loaded and executed by the processor to realize the above-mentioned seed. The user's selection method.
又一方面,本申请实施例提供了一种非临时性计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述种子用户的选取方法。In another aspect, an embodiment of the present application provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the above method for selecting a seed user is implemented.
还一方面,提供了一种计算机程序产品,当所述计算机程序产品被处理器执行时,用于实现上述种子用户的选取方法。In yet another aspect, a computer program product is provided. When the computer program product is executed by a processor, it is used to implement the above-mentioned seed user selection method.
本申请实施例提供的技术方案可以带来如下有益效果:The technical solutions provided in the embodiments of the present application can bring the following beneficial effects:
通过根据各个用户组的特征数据,确定用户之间的影响力参数,然后根据用户之间的影响力参数,构建影响力矩阵,并根据该影响力矩阵选取种子用户,扩展了一种种子用户的选取方法。并且,本申请实施例中,影响力参数是根据有关联关系的用户组的特征数据确定的,考虑了用户之间的关联关系,深度挖掘了社交网络关系,使得影响力参数的预测更为全面准确,解决了相关技术只考虑单个用户的特征数据,过于单一,无法准确确定种子用户的技术问题。By determining the influence parameters between users according to the characteristic data of each user group, and then constructing an influence matrix according to the influence parameters between users, and selecting seed users according to the influence matrix, a kind of seed user’s Selection method. Moreover, in the embodiment of the present application, the influence parameter is determined based on the characteristic data of the user group with the association relationship, and the association relationship between the users is considered, and the social network relationship is deeply mined, so that the prediction of the influence parameter is more comprehensive. It is accurate, and solves the technical problem that the related technology only considers the characteristic data of a single user, which is too singular to accurately determine the seed user.
附图说明Description of the drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.
图1是本申请一个实施例提供的种子用户的选取方法的流程图;FIG. 1 is a flowchart of a method for selecting seed users according to an embodiment of the present application;
图2是本申请一个实施例提供的社交网络的关系图;FIG. 2 is a diagram of a social network relationship provided by an embodiment of the present application;
图3是本申请一个实施例提供的根据影响力矩阵选取种子用户的流程图;Fig. 3 is a flowchart of selecting seed users according to an influence matrix provided by an embodiment of the present application;
图4是本申请另一个实施例提供的根据影响力矩阵选取种子用户的流程图;FIG. 4 is a flowchart of selecting seed users according to an influence matrix according to another embodiment of the present application;
图5是本申请另一个实施例提供的种子用户的选取方法的流程图;FIG. 5 is a flowchart of a method for selecting seed users according to another embodiment of the present application;
图6是本申请一个实施例提供的种子用户的选取装置的框图;FIG. 6 is a block diagram of a device for selecting seed users according to an embodiment of the present application;
图7是本申请另一个实施例提供的种子用户的选取装置的框图;FIG. 7 is a block diagram of a device for selecting seed users according to another embodiment of the present application;
图8是本申请一个实施例提供的计算机设备的框图。Fig. 8 is a block diagram of a computer device provided by an embodiment of the present application.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the purpose, technical solutions, and advantages of the present application clearer, the implementation manners of the present application will be described in further detail below in conjunction with the accompanying drawings.
本申请实施例提供的技术方案,各步骤的执行主体可以是计算机设备,如具有计算和存储能力的服务器,或者诸如手机、平板电脑、多媒体播放设备、可穿戴设备等终端,或者其他计算机设备。可选地,当计算机设备为服务器时,该计算机设备可以是一台服务器,也可以是由多台服务器组成的服务器集群,或者是一个云计算服务中心。为了便于说明,在下述方法实施例中,仅以各步骤由服务器来执行进行介绍说明,但对此不构成限定。In the technical solutions provided by the embodiments of the present application, the execution subject of each step may be a computer device, such as a server with computing and storage capabilities, or terminals such as mobile phones, tablets, multimedia playback devices, wearable devices, or other computer devices. Optionally, when the computer device is a server, the computer device may be a server, a server cluster composed of multiple servers, or a cloud computing service center. For ease of description, in the following method embodiments, only the steps are executed by the server for introduction and description, but this does not constitute a limitation.
请参考图1,其示出了本申请一个实施例提供的种子用户的选取方法的流程图。该方法可以包括如下几个步骤(110~170):Please refer to FIG. 1, which shows a flowchart of a method for selecting a seed user according to an embodiment of the present application. The method can include the following steps (110-170):
步骤110,获取用户集合数据,用户集合数据包括n个用户,所述n为大于1的整数。Step 110: Obtain user set data, where the user set data includes n users, where n is an integer greater than 1.
服务器既可以从自身存储的数据中获取用户集合数据,也可以从其它具备存储功能的计算机设备中获取用户集合数据,如从其它服务器、终端等,本申请实施例对此不作限定。用户集合数据中包括n个用户,以及该n个用户分别对应的数据信息,如用户标识、关联关系 等。The server can obtain user collection data from its own stored data, or it can obtain user collection data from other computer devices with storage functions, such as from other servers, terminals, etc., which are not limited in the embodiments of the present application. The user set data includes n users, and data information corresponding to the n users, such as user IDs, association relationships, and so on.
步骤120,根据用户集合数据创建m个用户组,每个用户组包括具有关联关系的两个用户,m为正整数。Step 120: Create m user groups according to the user set data, each user group includes two users having an association relationship, and m is a positive integer.
关联关系是指具备信息收发能力的两个用户之间的关系,可选地,关联关系表现为社交关系,本申请实施例中,关联关系包括但不限于以下任意一种:好友关系、关注与被关注关系、拼单关系,等等。可选地,关联关系在不同的在线社交网络平台有不同的表现形式,例如,在即时通讯应用程序等在线社交网络平台中,关联关系表现为好友关系;在娱乐性质的在线社交网络平台中,关联关系表现为关注与被关注关系。可选地,在非在线社交网络平台中,可以根据用户信息构建用户之间的关联关系,其中,用户信息是指用户使用非在线社交网络平台时所生成的信息,例如,在购物性质等非在线社交网络平台中,可以根据各个的用户地址、购买记录、拼单数据、链接分享、网络互动和设备共享等用户信息,构建各个用户之间的关联关系。The association relationship refers to the relationship between two users who have the ability to send and receive information. Optionally, the association relationship is expressed as a social relationship. In the embodiment of the present application, the association relationship includes, but is not limited to, any of the following: friend relationship, follow and Concerned relationship, ordering relationship, etc. Optionally, the association relationship has different manifestations in different online social network platforms. For example, in an online social network platform such as an instant messaging application, the association relationship is expressed as a friend relationship; in an entertainment online social network platform, The association relationship is expressed as the following and being followed. Optionally, in a non-online social network platform, an association relationship between users can be constructed based on user information, where user information refers to information generated when a user uses the non-online social network platform, for example, in non-online social network platforms such as shopping In the online social network platform, the association relationship between each user can be constructed based on the user information such as each user address, purchase record, order data, link sharing, network interaction, and device sharing.
可选地,为了清楚简便地表示用户组,用户组以数对的形式表示,例如,用户组可以表示为(第一用户,第二用户)。本申请实施例中,对于具有关联关系的两个用户,可以创建两个用户组,这两个用户组表示的关系特征不同,即在使用数对的形式表示用户组时,数对中的两个元素的位置不同,表示的关系特征也不同,例如,用户组(第一用户,第二用户)表示第一用户相对于第二用户的关系特征,而用户组(第二用户,第一用户)表示第二用户相对于第一用户的关系特征。Optionally, in order to express the user group clearly and simply, the user group is expressed in the form of a number of pairs, for example, the user group may be expressed as (first user, second user). In the embodiment of the present application, for two users with an association relationship, two user groups can be created. The relationship characteristics of the two user groups are different. That is, when the user group is expressed in the form of number pairs, two of the number pairs The position of each element is different, the relationship characteristics represented are also different, for example, the user group (first user, second user) represents the relationship characteristics of the first user relative to the second user, and the user group (second user, first user) ) Represents the relationship feature of the second user relative to the first user.
服务器确定包含n个用户的用户集合,以及各个用户之间的关联关系后,即可创建m个用户组,每个用户组包括具有关联关系的两个用户。本申请实施例中,m的数值由用户集合中各个用户之间的关联关系决定。After the server determines the user set containing n users and the association relationship between the users, it can create m user groups, and each user group includes two users with an association relationship. In the embodiment of the present application, the value of m is determined by the association relationship between users in the user set.
在一种可能的实施方式中,上述步骤120包括:构建用户集合数据对应的关系图,关系图包括n个节点,n个节点和n个用户一一对应,具有关联关系的两个用户对应的节点之间具有边;从关系图中,提取m个用户组。通过这种方式,可以便于服务器快速从用户集合中构建m个用户组。In a possible implementation manner, the above step 120 includes: constructing a relationship graph corresponding to the user set data, the relationship graph including n nodes, n nodes and n users are in one-to-one correspondence, and the two users having an association relationship correspond to each other. There are edges between nodes; from the relationship graph, extract m user groups. In this way, it is convenient for the server to quickly construct m user groups from the user set.
关系图,又称为社交网络关系图,用于表征用户之间的关联关系。服务器在确定包含n个用户的用户集合后,即可根据该用户集合构建关系图,关系图包括的节点数量与用户数量相同,并且关系图的节点与用户之间一一对应。例如,如图2所示,其示出了一种社交网络的关系图,该社交网络中的用户数量为6,关系图的节点21的数量也为6,且关系图中的6个节点与6个用户之间一一对应,即节点21中的数字与用户标识一一对应,节点1表示用户1、节点2表示用户2、节点3表示用户3……可选地,用户标识即为用户在用户集合中的次序,本申请实施例中服务器在获取用户集合后可以随机对用户集合中的用户进行排序,以此得到各个用户的用户标识,服务器在获取用户集合后也可以根据一定的参数对用户集合中的用户进行排序,如用户名对应的笔画数量、首字母拼音顺序等,本申请实施例对此不作限定。如图2所示,具有关联关系的两个用户对应的节点之间具有边22,如节点1和节点2之间具有边22,即表示用户1与用户2之间存在关联关系。The relationship diagram, also known as the social network relationship diagram, is used to characterize the relationship between users. After the server determines a user set containing n users, it can construct a relationship graph based on the user set. The number of nodes included in the relationship graph is the same as the number of users, and there is a one-to-one correspondence between the nodes of the relationship graph and the users. For example, as shown in Figure 2, it shows a relationship graph of a social network, the number of users in the social network is 6, the number of nodes 21 in the relationship graph is also 6, and the 6 nodes in the relationship graph are There is a one-to-one correspondence between the 6 users, that is, the number in node 21 corresponds to the user ID, node 1 represents user 1, node 2 represents user 2, node 3 represents user 3... Optionally, the user ID is user In the order in the user set, in the embodiment of the application, the server can randomly sort the users in the user set after obtaining the user set, so as to obtain the user identification of each user. The server may also use certain parameters after obtaining the user set. The users in the user set are sorted, such as the number of strokes corresponding to the user name, the phonetic order of initials, etc., which are not limited in the embodiment of the present application. As shown in FIG. 2, the nodes corresponding to two users with an association relationship have an edge 22 between them. For example, there is an edge 22 between the node 1 and the node 2, which means that there is an association relationship between the user 1 and the user 2.
服务器根据关系图,可以提取出m个用户组。如图2所示,关系图中共存在5条边22,由于本申请实施例中,具有关联关系的两个用户可以创建两个用户组,因此服务器从图2所示的关系图中可以提取出10个用户组,分别为(用户1,用户2)、(用户2,用户1)、(用户1,用户4)、(用户4,用户1)、(用户1,用户6)、(用户6、用户1)、(用户6,用户5)、(用户5,用户6)、(用户4,用户3)和(用户3,用户4)。The server can extract m user groups according to the relationship graph. As shown in Figure 2, there are five edges 22 in the relationship graph. Since two users with an association relationship can create two user groups in this embodiment of the application, the server can extract from the relationship graph shown in Figure 2 10 user groups, namely (User 1, User 2), (User 2, User 1), (User 1, User 4), (User 4, User 1), (User 1, User 6), (User 6 , User 1), (User 6, User 5), (User 5, User 6), (User 4, User 3), and (User 3, User 4).
步骤130,对于m个用户组中的第i个用户组,获取第i个用户组的特征数据,其中,第i个用户组包括具有关联关系的第一用户和第二用户。Step 130: For the i-th user group among the m user groups, obtain characteristic data of the i-th user group, where the i-th user group includes the first user and the second user having an association relationship.
特征数据表示用户组中的用户特征以及用户之间关系特征的数据,第i个用户组的特征数据包括第一用户的特征数据、第二用户的特征数据以及第一用户和第二用户的关系特征数据。可选地,为了深度挖掘关联关系,准确表示各个用户和用户组的特征数据,两个具有关 联关系的用户组成的不同用户组的特征数据是不相同的,例如,用户1与用户2之间存在关联关系,(用户1,用户2)的特征数据包括用户1的特征数据、用户2的特征数据以及用户1相对于用户2的关系特征数据,而(用户2,用户1)的特征数据包括用户1的特征数据、用户2的特征数据以及用户2相对于用户1的关系特征数据。其中,用户1相对于用户2的关系特征数据,与用户2相对于用户1的关系特征数据是不同的,例如,若用户1关注了用户2,但是用户2未关注用户1,则用户1相对于用户2的关系特征数据,与用户2相对于1的关系特征数据是不同的。The characteristic data represents the characteristics of the users in the user group and the characteristics of the relationship between the users. The characteristic data of the i-th user group includes the characteristic data of the first user, the characteristic data of the second user, and the relationship between the first user and the second user. Characteristic data. Optionally, in order to dig deeper into the association relationship and accurately represent the characteristic data of each user and user group, the characteristic data of different user groups composed of two associated users are not the same, for example, between user 1 and user 2. There is an association relationship. The characteristic data of (User 1, User 2) includes the characteristic data of User 1, the characteristic data of User 2, and the characteristic data of the relationship between User 1 and User 2, and the characteristic data of (User 2, User 1) includes The characteristic data of user 1, the characteristic data of user 2, and the characteristic data of the relationship between user 2 and user 1. Among them, the relationship feature data of user 1 relative to user 2 is different from the relationship feature data of user 2 relative to user 1. For example, if user 1 follows user 2, but user 2 does not follow user 1, then user 1 is relative The relationship feature data for user 2 is different from the relationship feature data for user 2 with respect to 1.
关系特征数据是指用户之间相对关系的特征数据,可选地,关系特征数据可以包括推荐情况、关注情况和消息情况等,本申请实施例对此不作限定,例如,如表一所示,其示出了根据图2关系图所构建的10个用户组的特征数据中的关系特征数据。The relationship feature data refers to the feature data of the relative relationship between users. Optionally, the relationship feature data may include recommendation status, attention status, message status, etc., which is not limited in the embodiment of the present application. For example, as shown in Table 1, It shows the relationship feature data among the feature data of 10 user groups constructed according to the relationship diagram in FIG. 2.
表一Table I
用户组user group 推荐情况Recommended situation 关注情况Concerned about the situation 消息情况News situation
(用户1,用户2)(User 1, User 2) 11 11 1010
(用户2,用户1)(User 2, User 1) 00 11 2020
(用户1,用户4)(User 1, User 4) 00 00 1515
(用户4,用户1)(User 4, User 1) 11 11 3030
(用户1,用户6)(User 1, User 6) 11 00 2828
(用户6、用户1)(User 6, User 1) 11 00 1212
(用户6,用户5)(User 6, User 5) 00 11 4545
(用户5,用户6)(User 5, User 6) 11 11 2626
(用户4,用户3)(User 4, User 3) 00 11 55
(用户3,用户4)(User 3, User 4) 00 00 4040
其中,推荐情况表示历史是否成功推荐过产品,如表一所示,(用户1,用户2)的推荐情况为1,则表示用户1历史成功推荐过产品给用户2,(用户2,用户1)推荐情况为0,则表示用户2历史未成功推荐过产品给用户1;关注情况表示是否关注,如表一所示,(用户1,用户4)的关注情况为0,则表示用户1未关注用户4,(用户4,用户1)的关注情况为1,则表示用户4关注用户1;消息情况表示历史发送的消息数量,如表一所示,(用户1,用户2)的消息情况为10,则表示用户1历史发送给用户2的消息数量为10,(用户2,用户1)的消息情况为20,则表示用户2历史发送给用户1的消息数量为20。其他用户组的关系特征数据参照上述解释,此处不再赘述。Among them, the recommendation status indicates whether the product has been successfully recommended in history. As shown in Table 1, the recommendation status of (user1, user2) is 1, which means that user 1 has successfully recommended the product to user 2 in history, (user2, user1 ) If the recommendation status is 0, it means that user 2 has not successfully recommended the product to user 1 in the past; the attention status indicates whether or not to follow, as shown in Table 1, if the attention status of (user 1, user 4) is 0, it means that user 1 has not Follow user 4, (user 4, user 1)'s attention situation is 1, it means that user 4 follows user 1; message status indicates the number of messages sent in history, as shown in Table 1, (user 1, user 2) message situation If it is 10, it means that the number of messages sent by user 1 to user 2 in history is 10, and the message situation of (user 2, user 1) is 20, which means that the number of messages sent by user 2 to user 1 in history is 20. For the relationship feature data of other user groups, refer to the above explanation, which will not be repeated here.
用户各自的特征数据是指用户在使用应用程序时所生成的数据,可选地,用户各自的特征数据可以包括用户标识、用户年龄、用户性别、消费水平和活动记录等,本申请实施例对此不作限定,例如,如表二所示,其示出了根据图2关系图所构建的10个用户组所包含的用户各自的特征数据。The user’s respective characteristic data refers to the data generated by the user when using the application. Optionally, the user’s respective characteristic data may include user identification, user age, user gender, consumption level, and activity records. This is not limited. For example, as shown in Table 2, it shows the respective characteristic data of the users included in the 10 user groups constructed according to the relationship diagram of FIG. 2.
表二Table II
用户标识User ID 用户年龄User age 用户性别User gender 消费水平Consumption level 活动记录Activity record
11 24twenty four male 600600 11
22 1717 Female 120120 00
33 3535 Female 480480 11
44 2020 male 240240 00
55 4848 Female 500500 00
66 23twenty three male 100100 11
其中,消费水平用于表示用户消费能力状况,可选地,消费水平可以采用用户的平均消费金额来表示,该平均消费金额既可以是日平均消费金额,也可以是月平均消费金额,还可以是参加活动期间的日平均消费金额等,本申请实施例对此不作限定;活动记录是指用户是 否参与过产品推荐活动,如表二所示,用户1的活动记录为1,则表示用户1参与过产品推荐活动,用户2的活动记录为0,则表示用户2未参与过产品推荐活动。其他用户的特征数据参照上述解释,此处不再赘述。Among them, the consumption level is used to indicate the user's consumption ability status. Optionally, the consumption level can be expressed by the user's average consumption amount. The average consumption amount can be either a daily average consumption amount or a monthly average consumption amount, or It is the average daily consumption amount during the period of participating in the activity, which is not limited in the embodiment of this application; the activity record refers to whether the user has participated in the product recommendation activity. As shown in Table 2, the activity record of user 1 is 1, which means user 1 Participated in product recommendation activities, user 2's activity record is 0, it means that user 2 has not participated in product recommendation activities. For the feature data of other users, refer to the above explanation, which will not be repeated here.
需要说明的一点是,本申请实施例仅以关系特征数据包括推荐情况、关注情况和消息情况,以及用户各自的特征数据包括用户标识、用户年龄、用户性别、消费水平和活动记录为例进行举例说明,本领域技术人员在了解了本申请实施例的技术方案后,将很容易想到关系特征数据和用户各自的特征数据包括其他方面的内容,但均应属于本申请的保护范围之内。It should be noted that the embodiments of this application only take the relationship feature data including recommendation status, attention status, and message status, and the user's respective feature data including user identification, user age, user gender, consumption level, and activity records as examples. It is noted that after understanding the technical solutions of the embodiments of the present application, those skilled in the art will easily think that the relationship feature data and the user's respective feature data include other aspects, but they should all fall within the protection scope of the present application.
步骤140,根据第i个用户组的特征数据,确定第一用户相对于第二用户的影响力参数,影响力参数用于表征第一用户将产品成功推荐给第二用户的概率。Step 140: Determine the influence parameter of the first user relative to the second user according to the characteristic data of the i-th user group, where the influence parameter is used to characterize the probability of the first user successfully recommending the product to the second user.
影响力参数用于表示成功推荐产品的概率,例如,当第一用户相对于第二用户的影响力参数为0.8时,表示第一用户将产品成功推荐给第二用户的概率为0.8。影响力参数既可以采用数值的形式表示,也可以采用百分比的形式表示,本申请实施例对此不作限定。可选地,当影响力参数采用数值的形式表示时,影响力参数的取值范围为[0,1],通过这种设计,可以便于服务器对影响力参数的计算,提高服务器的处理速度,降低服务器的处理开销。The influence parameter is used to indicate the probability of successfully recommending the product. For example, when the influence parameter of the first user relative to the second user is 0.8, it means that the probability of the first user successfully recommending the product to the second user is 0.8. The influence parameter can either be expressed in the form of numerical value or in the form of percentage, which is not limited in the embodiment of the present application. Optionally, when the influence parameter is expressed in the form of a numerical value, the value range of the influence parameter is [0,1]. Through this design, the calculation of the influence parameter by the server can be facilitated, and the processing speed of the server can be improved. Reduce the processing overhead of the server.
在一种可能的实施方式中,上述步骤140包括:调用影响力计算模型,根据第i个用户组的特征数据计算第一用户相对于第二用户的影响力参数;其中,第i个用户组的特征数据包括:第一用户的特征数据、第二用户的特征数据以及第一用户和第二用户的关系特征数据。通过这种设计,可以在方便服务器计算影响力参数的同时,得到更加真实准确的影响力参数预测结果。In a possible implementation manner, the above step 140 includes: invoking the influence calculation model, and calculating the influence parameter of the first user relative to the second user according to the characteristic data of the i-th user group; wherein, the i-th user group The characteristic data of includes: characteristic data of the first user, characteristic data of the second user, and characteristic data of the relationship between the first user and the second user. Through this design, it is possible to obtain more true and accurate prediction results of the influence parameters while facilitating the calculation of influence parameters by the server.
影响力计算模型是基于历史数据训练得到的模型,可选地,影响力计算模型可以是二分类模型,如LR(Logistic Regression,逻辑回归)模型、神经网络模型、GBDT(Gradient Boosting Decision Tree,梯度下降树)模型等;影响力计算模型也可以是回归模型,本申请实施例对此不作限定。The influence calculation model is a model trained on historical data. Optionally, the influence calculation model can be a binary classification model, such as LR (Logistic Regression) model, neural network model, GBDT (Gradient Boosting Decision Tree, gradient The descending tree) model, etc.; the influence calculation model may also be a regression model, which is not limited in the embodiment of the present application.
示例性地,影响力计算模型的训练过程如下:构建至少一个训练样本,每个训练样本包括一个样本用户组;获取训练样本的特征数据和标签,标签用于表征训练样本中的第一样本用户是否向第二样本用户成功推荐过产品;采用训练样本对影响力计算模型进行训练,得到完成训练的影响力计算模型。Exemplarily, the training process of the influence calculation model is as follows: construct at least one training sample, each training sample includes a sample user group; obtain the feature data and labels of the training samples, and the label is used to represent the first sample in the training sample Whether the user has successfully recommended the product to the second sample of users; use the training sample to train the influence calculation model to obtain the influence calculation model that has completed the training.
训练样本是指训练影响力计算模型所使用的样本,每个训练样本包括一个样本用户组,本申请实施例对训练样本的具体数量不作限定,实际应用中可以结合服务器处理开销与影响力计算模型的准确性这两方面的因素,来综合确定训练样本的具体数量。可选地,服务器获取历史数据后,可以根据该历史数据确定样本用户集合,接着根据该样本用户集合构建出样本关系树,然后从样本关系树中提取出训练样本。Training samples refer to the samples used to train the influence calculation model. Each training sample includes a sample user group. The embodiment of this application does not limit the specific number of training samples. In practical applications, the server processing cost and influence calculation model can be combined The accuracy of these two factors are used to comprehensively determine the specific number of training samples. Optionally, after the server obtains historical data, it may determine a sample user set based on the historical data, then construct a sample relationship tree based on the sample user set, and then extract training samples from the sample relationship tree.
标签是指训练样本对应的样本用户组中第一样本用户对第二样本用户的推荐情况,可选地,该标签的取值为0或1,取值为1表示成功推荐过产品,取值为0表示未成功推荐过产品,例如,若样本用户组(第一样本用户,第二样本用户)对应的标签为1,则表示第一样本用户向第二样本用户成功推荐过产品;若样本用户组(第一样本用户,第二样本用户)对应的标签为0,则表示第一样本用户向第二样本用户未成功推荐过产品。The label refers to the recommendation of the first sample user to the second sample user in the sample user group corresponding to the training sample. Optionally, the value of the label is 0 or 1, and the value 1 indicates that the product has been successfully recommended. A value of 0 means that the product has not been successfully recommended. For example, if the label corresponding to the sample user group (the first sample user, the second sample user) is 1, it means that the first sample user has successfully recommended the product to the second sample user ; If the label corresponding to the sample user group (the first sample user, the second sample user) is 0, it means that the first sample user has not successfully recommended a product to the second sample user.
本申请实施例中,训练样本的特征数据包括各个样本用户的特征数据,以及样本用户之间的关系特征数据。基于上述步骤130中用户各自的特征数据以及用户之间的关系特征数据的解释说明,得到此处步骤140中各个样本用户的特征数据,以及样本用户之间的关系特征数据的解释说明,详细介绍请参照上文,此处不再赘述。In the embodiment of the present application, the feature data of the training sample includes the feature data of each sample user and the relationship feature data between the sample users. Based on the explanations of the characteristic data of each user and the characteristic data of the relationship between the users in the above step 130, the characteristic data of each sample user in step 140 and the explanation of the characteristic data of the relationship between the sample users are obtained here. Please refer to the above, so I won't repeat it here.
需要说明的一点是,为了使得训练出的影响力计算模型可以用于预测影响力参数,本申请实施例中,样本用户之间的关系特征数据还包括历史影响力参数,该历史影响力参数用户表示样本用户组中样本用户之间的影响力情况,例如,若样本用户组(第一样本用户,第二样本用户)对应的历史影响力参数为0.2,则表示第一样本用户对第二样本用户的影响力参数为0.2。It should be noted that, in order to enable the trained influence calculation model to be used to predict influence parameters, in this embodiment of the application, the relationship feature data between sample users also includes historical influence parameters. Indicates the influence of sample users in the sample user group. For example, if the historical influence parameter corresponding to the sample user group (the first sample user, the second sample user) is 0.2, it means that the first sample user has The influence parameter of the second sample of users is 0.2.
服务器获取训练样本及其对应的特征数据和标签后,即选取一个合适的模型作为影响力计算模型,如二分类模型或回归模型,然后采用训练样本对该影响力计算模型进行训练,从而得到完成训练之后的影响力计算模型。After the server obtains the training sample and its corresponding feature data and labels, it selects a suitable model as the influence calculation model, such as a binary classification model or a regression model, and then uses the training sample to train the influence calculation model to complete The influence calculation model after training.
步骤150,构建影响力矩阵,影响力矩阵为n行n列矩阵,其中,影响力矩阵中的第u行第v列的元素,表示用户u相对于用户v的影响力参数。Step 150: Construct an influence matrix. The influence matrix is a matrix with n rows and n columns. The element in the u-th row and v-th column in the influence matrix represents the influence parameter of the user u relative to the user v.
本申请实施例中,若用户u与用户v之间存在关联关系,则用户u相对于用户v的影响力参数,与用户v相对于用户u的影响力参数,可以通过影响力计算模型计算得到;若用户u与用户v之间不存在关联关系,则用户u相对于用户v的影响力参数,与用户v相对于用户u的影响力参数,不需要通过计算,直接记为0即可。每个用户相对于自身的影响力参数也记为0。In the embodiment of the present application, if there is an association relationship between user u and user v, the influence parameter of user u relative to user v, and the influence parameter of user v relative to user u can be calculated through the influence calculation model ; If there is no correlation between user u and user v, then the influence parameter of user u relative to user v, and the influence parameter of user v relative to user u, do not need to be calculated, and can be directly recorded as 0. The influence parameter of each user relative to itself is also recorded as 0.
步骤160,根据影响力矩阵,从n个用户中选取至少一个种子用户。Step 160: According to the influence matrix, at least one seed user is selected from n users.
从影响力矩阵中,可以得到n个用户中每个用户相对于各个其它用户的影响力参数。n个用户中每个用户相对于各个其它用户的影响力参数,既可以指每个用户相对于与该用户有关联关系的其它用户的影响力参数,也可以指每个用户相对于n个用户中各个用户的影响力参数,本申请实施例对此不作限定。可选地,当n个用户中每个用户相对于各个其它用户的影响力参数指的是每个用户相对于n个用户中各个用户的影响力参数时,该用户相对于与该用户有关联关系的其它用户的影响力参数通过影响力计算模型得到,该用户相对于与该用户没有关联关系的其它用户的影响力参数为0,该用户相对于该用户本身的影响力参数也为0。服务器确定影响力矩阵后,即可根据一定的选取方式从这n个用户中选取出至少一个种子用户。From the influence matrix, the influence parameters of each of the n users relative to each other user can be obtained. The influence parameter of each user in n users relative to each other user can refer to the influence parameter of each user relative to other users that have an association relationship with the user, or it can refer to the influence parameter of each user relative to n users The influence parameter of each user in the embodiment of the present application does not limit this. Optionally, when the influence parameter of each user in the n users relative to each other user refers to the influence parameter of each user relative to each user in the n users, the user is related to the user The influence parameters of other users in the relationship are obtained through the influence calculation model. The influence parameters of the user relative to other users that are not associated with the user are 0, and the influence parameters of the user relative to the user itself are also 0. After the server determines the influence matrix, it can select at least one seed user from the n users according to a certain selection method.
步骤170,存储种子用户的用户信息。Step 170: Store the user information of the seed user.
服务器确定种子用户后,既可以将该种子用户的用户信息存储于自身的存储器中,也可以将该种子用户的用户信息存储于其它计算机设备的存储器中,如其它服务器、终端等,本申请实施例对此不作限定。After the server determines the seed user, it can store the user information of the seed user in its own memory, or store the user information of the seed user in the memory of other computer equipment, such as other servers, terminals, etc. This application is implemented The example does not limit this.
综上所述,本申请实施例提供的技术方案,通过根据各个用户组的特征数据,确定用户之间的影响力参数,然后根据用户之间的影响力参数,构建影响力矩阵,并根据该影响力矩阵选取种子用户,扩展了一种种子用户的选取方法。并且,本申请实施例中,影响力参数是根据有关联关系的用户组的特征数据确定的,考虑了用户之间的关联关系,深度挖掘了社交网络关系,使得影响力参数的预测更为全面准确,解决了相关技术只考虑单个用户的特征数据,过于单一,无法准确确定种子用户的技术问题。In summary, the technical solutions provided by the embodiments of the present application determine the influence parameters between users according to the characteristic data of each user group, and then construct the influence matrix according to the influence parameters between users, and The influence matrix selects seed users, which expands a method of selecting seed users. Moreover, in the embodiment of the present application, the influence parameter is determined based on the characteristic data of the user group with the association relationship, and the association relationship between the users is considered, and the social network relationship is deeply mined, so that the prediction of the influence parameter is more comprehensive. It is accurate, and solves the technical problem that the related technology only considers the characteristic data of a single user, which is too singular to accurately determine the seed user.
另外,本申请实施例提供的技术方案,通过完成训练的影响力计算模型,根据特征数据确定用户之间的影响力参数,使得服务器能够更加简便地计算影响力参数。并且,本申请实施例中,影响力计算模型是基于历史特征数据训练得到的,从而通过该影响力计算模型,服务器可以更加真实准确地预测影响力参数,提升了种子用户的精确性。In addition, in the technical solution provided by the embodiments of the present application, the influence parameter between users is determined according to the characteristic data by completing the trained influence calculation model, so that the server can calculate the influence parameter more simply. In addition, in the embodiment of the present application, the influence calculation model is obtained by training based on historical feature data, so that through the influence calculation model, the server can predict the influence parameters more truthfully and accurately, which improves the accuracy of seed users.
在一个示例中,如图3所示,上述根据影响力矩阵,从n个用户中选取至少一个种子用户,包括如下几个步骤(1041~1047):In an example, as shown in Fig. 3, the above selection of at least one seed user from n users according to the influence matrix includes the following steps (1041-1047):
步骤1041,根据影响力矩阵计算n个用户中每个用户的综合影响力参数,其中,用户u的综合影响力参数用于表征用户u将产品成功推荐给各个其它用户的综合概率。Step 1041: Calculate the comprehensive influence parameter of each of the n users according to the influence matrix, where the comprehensive influence parameter of the user u is used to represent the comprehensive probability of the user u successfully recommending the product to each other user.
综合影响力参数是指用户将产品成功推荐给各个用户的综合概率,可选地,综合影响力参数通过用户相对于其它各个用户的影响力参数累加得到,例如,综合影响力参数的计算公式如下所示:The comprehensive influence parameter refers to the comprehensive probability of the user successfully recommending the product to each user. Optionally, the comprehensive influence parameter is obtained by the accumulation of the user's influence parameters relative to other users. For example, the calculation formula of the comprehensive influence parameter is as follows Shown:
Figure PCTCN2020097517-appb-000001
Figure PCTCN2020097517-appb-000001
其中,Wu表示用户u的综合影响力参数,Wuj表示用户u相对于用户j的影响力参数,u为小于或等于n的正整数,j为小于或等于n正整数。Among them, Wu represents the comprehensive influence parameter of user u, Wuj represents the influence parameter of user u relative to user j, u is a positive integer less than or equal to n, and j is a positive integer less than or equal to n.
步骤1042,构建种子用户集合。 Step 1042, build a seed user set.
本申请实施例中,种子用户集合初始为空。In this embodiment of the application, the seed user set is initially empty.
步骤1043,从非种子用户中,选取综合影响力参数符合条件的用户s加入种子用户集合中,其中,非种子用户是指n个用户中未被加入至种子用户集合中的用户。Step 1043: From the non-seed users, select users s whose comprehensive influence parameters meet the conditions to be added to the seed user set, where the non-seed users refer to users who have not been added to the seed user set among the n users.
综合影响力参数符合条件可以是指综合影响力参数最大,也可以是指综合影响力参数达到预设阈值,本申请实施例对此不作限定。需要说明的一点是,以下实施例的介绍说明,仅以综合影响力参数最大作为综合影响力参数符合条件来介绍说明,本领域技术人员在了解了本申请的技术方案后,将很容易想到其它的技术方案,如综合影响力参数达到预设阈值作为综合影响力参数符合条件的实施例,但均应属于本申请的保护范围内。Eligibility for the comprehensive influence parameter may mean that the comprehensive influence parameter is the largest, or it may mean that the comprehensive influence parameter reaches a preset threshold, which is not limited in the embodiment of the present application. It should be noted that the introduction and description of the following embodiments only use the maximum comprehensive influence parameter as the comprehensive influence parameter to meet the conditions. Those skilled in the art will easily think of others after understanding the technical solution of this application. The technical solutions, such as the comprehensive influence parameter reaching the preset threshold value as the qualified embodiment of the comprehensive influence parameter, should all fall within the protection scope of this application.
在未选出任何种子用户的情况下,非种子用户的数量即为用户集合中的用户数量,即非种子用户的数量为n。服务器确定非种子用户及其对应的综合影响力参数后,即从非种子用户中选取出综合影响力参数最大的用户加入种子用户集合。可选地,从非种子用户中选出综合影响力参数最大的用户的公式如下:When no seed users are selected, the number of non-seed users is the number of users in the user set, that is, the number of non-seed users is n. After the server determines the non-seed users and their corresponding comprehensive influence parameters, it selects the user with the largest comprehensive influence parameter from the non-seed users to join the seed user set. Optionally, the formula for selecting the user with the largest comprehensive influence parameter from non-seed users is as follows:
Figure PCTCN2020097517-appb-000002
Figure PCTCN2020097517-appb-000002
其中,集合U表示用户集合,集合S表示种子用户集合,用户s表示从非种子用户中选取出的综合影响力参数最大的用户,s为小于或等于n的正整数。Among them, the set U represents the user set, the set S represents the seed user set, the user s represents the user with the largest comprehensive influence parameter selected from the non-seed users, and s is a positive integer less than or equal to n.
步骤1044,将影响力矩阵中的各行元素的值减去第s行元素的值,得到更新后的影响力矩阵,其中,第s行元素包括用户s相对于各个其它用户的影响力参数。Step 1044: Subtract the value of the s-th row element from the value of each row element in the influence matrix to obtain an updated influence matrix, where the s-th row element includes the influence parameter of the user s relative to each other user.
根据步骤1043选取出的用户s相对于各个用户的影响力参数,对应于影响力矩阵中中第s行元素的值。将影响力矩阵中各行元素的值减去第s行元素后,即可得到更新后的影响力矩阵,例如,影响力矩阵的更新公式如下所示:The influence parameter of the user s relative to each user selected according to step 1043 corresponds to the value of the s-th row element in the influence matrix. After subtracting the s-th row element from the value of each row element in the influence matrix, the updated influence matrix can be obtained. For example, the update formula of the influence matrix is as follows:
W nj←W nj-W sjW nj ←W nj -W sj ;
其中,Wnj表示影响力矩阵中第n行第j列的元素,Wsj表示影响力矩阵中第s行第j列的元素。本申请实施例中,服务器将Wnj与Wsj的差值确定为更新后的Wnj。Among them, Wnj represents the element in the nth row and jth column in the influence matrix, and Wsj represents the element in the sth row and jth column in the influence matrix. In the embodiment of the present application, the server determines the difference between Wnj and Wsj as the updated Wnj.
示例性地,为了便于服务器的计算,降低服务器的处理开销,上述影响力参数的取值范围为[0,1];上述将影响力矩阵中的各行元素的值减去第s行元素的值,得到更新后的影响力矩阵,包括:将影响力矩阵中的n行元素的值,分别减去第s行元素的值,得到计算后的n行元素的值;对于计算后的n行元素中值小于零的目标元素,将目标元素的值修改为0,得到更新后的影响力矩阵。Exemplarily, in order to facilitate the calculation of the server and reduce the processing overhead of the server, the value range of the aforementioned influence parameter is [0, 1]; the aforementioned value of each row element in the influence matrix is subtracted from the value of the s-th row element , To get the updated influence matrix, including: subtract the value of the element in the sth row from the value of the element in the n rows of the influence matrix to obtain the calculated value of the n rows of elements; for the calculated n rows of elements For target elements whose median value is less than zero, modify the value of the target element to 0 to obtain the updated influence matrix.
服务器将影响力矩阵中的n行元素的值减去第s行元素的值之后,可以将计算后的n行元素的值与0进行比较,如果有值小于0,则将该值修改为0,以确保更新后的影响力矩阵中的n行元素的值均大于0。例如,影响力矩阵的更新公式如下所示:After the server subtracts the value of the element in the n row of the influence matrix from the value of the element in the sth row, it can compare the calculated value of the n row element with 0. If any value is less than 0, the value is modified to 0 , To ensure that the values of the n rows of elements in the updated influence matrix are all greater than 0. For example, the update formula of the influence matrix is as follows:
W nj←max(W nj-W sj,0)。 W nj ←max(W nj -W sj , 0).
步骤1045,判断是否满足种子用户选取的停止条件;Step 1045: Determine whether the stop condition selected by the seed user is satisfied;
步骤1046,若未满足种子用户选取的停止条件,则基于更新后的影响力矩阵,再次从根据影响力矩阵计算n个用户中每个用户的综合影响力参数的步骤开始执行。Step 1046: If the stop condition selected by the seed user is not met, based on the updated influence matrix, the execution starts again from the step of calculating the comprehensive influence parameter of each of the n users according to the influence matrix.
若未满足种子用户选取的停止条件,服务器则不断基于更新后的影响力矩阵再次步骤1042至步骤1044,本申请实施例中,停止条件可以是服务器预先设置的条件,可选地,该停止条件可以种子用户集合中的元素数量达到预设阈值,如10,也可以是循环执行的次数达到预设次数,如5次,本申请实施例对此不作限定。If the stop condition selected by the seed user is not met, the server continuously repeats steps 1042 to 1044 based on the updated influence matrix. In this embodiment of the application, the stop condition may be a condition preset by the server. Optionally, the stop condition The number of elements in the seed user set may reach a preset threshold, such as 10, or the number of loop executions may reach a preset number of times, such as 5 times, which is not limited in the embodiment of the present application.
步骤1047,若满足种子用户选取的停止条件,则将种子用户集合中的用户确定为种子用户。Step 1047: If the stop condition selected by the seed user is satisfied, the user in the seed user set is determined as the seed user.
综上所述,本申请实施例提供的技术方案,通过构建影响力矩阵和种子用户集合,并设定种子用户选取的停止条件,在没有满足种子用户选取的停止条件时,不断基于更新后的影 响力矩阵,从用户集合中选取综合影响力参数最大的用户加入至种子用户集合中,从而避免了可能出现的对单个用户过度推荐、浪费种子用户资源的情况,以实现合理选取种子用户的目的。In summary, the technical solution provided by the embodiments of the present application constructs an influence matrix and a set of seed users, and sets the stop condition for seed user selection. When the stop condition for seed user selection is not met, it is continuously based on the updated The influence matrix, select the user with the largest comprehensive influence parameter from the user set to add to the seed user set, thereby avoiding the possibility of excessive recommendation to a single user and wasting seed user resources, so as to achieve the purpose of selecting seed users reasonably .
在另一个示例中,如图4所示,上述根据影响力矩阵,从n个用户中选取至少一个种子用户,包括如下几个步骤(104A~104B):In another example, as shown in FIG. 4, the above-mentioned selecting at least one seed user from n users according to the influence matrix includes the following steps (104A-104B):
步骤104A,根据影响力矩阵,计算n个用户中每个用户的综合影响力参数和综合被影响力参数,其中,用户u的综合影响力参数用于表征用户u将产品成功推荐给各个其它用户的综合概率,用户u的综合被影响力参数用于表征各个其它用户将产品成功推荐给用户u的综合概率。 Step 104A, according to the influence matrix, calculate the comprehensive influence parameter and comprehensive influence parameter of each of the n users, where the comprehensive influence parameter of user u is used to characterize that user u successfully recommends the product to each other user The comprehensive probability of user u is used to characterize the comprehensive probability of each other user recommending the product to user u successfully.
示例性地,上述步骤104A包括:对于n个用户中的用户u,获取用户u相对于各个其它用户的影响力参数;将用户u相对于各个其它用户的影响力参数进行求和,得到用户u的综合影响力参数。例如,综合影响力参数的计算公式如下所示:Exemplarily, the above step 104A includes: for the user u among the n users, obtaining the influence parameters of the user u relative to each other user; and summing the influence parameters of the user u relative to each other user to obtain the user u The comprehensive influence parameter. For example, the calculation formula of the comprehensive influence parameter is as follows:
Figure PCTCN2020097517-appb-000003
Figure PCTCN2020097517-appb-000003
其中,Wun表示用户u的综合影响力参数,Wuj表示用户u相对于用户j的影响力参数,u为小于或等于n的正整数,j为小于或等于n正整数。Among them, Wun represents the comprehensive influence parameter of user u, Wuj represents the influence parameter of user u relative to user j, u is a positive integer less than or equal to n, and j is a positive integer less than or equal to n.
示例性地,上述步骤104A包括:对于n个用户中的用户u,获取各个其它用户相对于用户u的影响力参数;将各个其它用户相对于用户u的影响力参数的最大值或平均值,确定为用户u的综合被影响力参数。例如,综合被影响力参数的计算公式如下所示:Exemplarily, the above step 104A includes: for the user u among the n users, obtaining the influence parameters of each other user relative to the user u; and calculating the maximum or average value of the influence parameters of each other user relative to the user u, Determined as the comprehensive influence parameter of user u. For example, the calculation formula for the comprehensive affected force parameter is as follows:
Figure PCTCN2020097517-appb-000004
Figure PCTCN2020097517-appb-000005
Figure PCTCN2020097517-appb-000004
or
Figure PCTCN2020097517-appb-000005
其中,Wnu表示用户u的综合被影响力参数,Wiu表示用户i相对于用户u的影响力参数,u为小于或等于n的正整数,i为小于或等于n正整数。Among them, Wnu represents the comprehensive influence parameter of user u, Wiu represents the influence parameter of user i relative to user u, u is a positive integer less than or equal to n, and i is a positive integer less than or equal to n.
步骤104B,根据n个用户中每个用户的综合影响力参数和综合被影响力参数,从n个用户中选取至少一个种子用户。In step 104B, at least one seed user is selected from the n users according to the comprehensive influence parameter and the comprehensive influence parameter of each of the n users.
示例性地,上述影响力矩阵W如下:Illustratively, the aforementioned influence matrix W is as follows:
Figure PCTCN2020097517-appb-000006
W pq∈[0,1];
Figure PCTCN2020097517-appb-000006
W pq ∈[0,1];
其中,Wpq为影响力矩阵W中的第p行第q列的元素,且Wpq表示用户p相对于用户q的影响力参数,p为小于或等于n的正整数,q为小于或等于n的正整数。Among them, Wpq is the element in the p-th row and q-th column of the influence matrix W, and Wpq represents the influence parameter of user p relative to user q, p is a positive integer less than or equal to n, and q is less than or equal to n Positive integer.
步骤104B包括如下几个步骤:Step 104B includes the following steps:
(1)定义一个元素取值为0或1且所有元素之和为K的列向量x:(1) Define a column vector x whose element value is 0 or 1, and the sum of all elements is K:
Figure PCTCN2020097517-appb-000007
x R∈{0,1};
Figure PCTCN2020097517-appb-000007
x R ∈{0,1};
其中,如果xR取值为1,表示用户R被选取为种子用户,N为小于或等于n的正整数。Among them, if the value of xR is 1, it means that user R is selected as a seed user, and N is a positive integer less than or equal to n.
(3)定义列向量e:(3) Define the column vector e:
Figure PCTCN2020097517-appb-000008
Figure PCTCN2020097517-appb-000008
(4)定义列向量r:(4) Define the column vector r:
Figure PCTCN2020097517-appb-000009
Figure PCTCN2020097517-appb-000010
Figure PCTCN2020097517-appb-000009
or
Figure PCTCN2020097517-appb-000010
本申请实施例中,确定列向量r中元素表示用户的合理被影响力参数。In the embodiment of the present application, it is determined that the element in the column vector r represents a reasonable influence parameter of the user.
(5)基于下述公式计算所述列向量x:(5) Calculate the column vector x based on the following formula:
Figure PCTCN2020097517-appb-000011
Figure PCTCN2020097517-appb-000011
其中,
Figure PCTCN2020097517-appb-000012
表示待求解的变量为x,且x的取值使得
Figure PCTCN2020097517-appb-000013
后面的表达式取到最大值,λ为大于或等于0的实数,||W′x-r||表示欧几里得范数,W′x-r表示用户实际受到的被影响力参数与合理被影响力参数的差值所组成的集合,||W′x-r|| 2表示集合中各项元素的平方和。
among them,
Figure PCTCN2020097517-appb-000012
Indicates that the variable to be solved is x, and the value of x is such that
Figure PCTCN2020097517-appb-000013
The following expression takes the maximum value, λ is a real number greater than or equal to 0, ||W′xr|| represents the Euclidean norm, and W′xr represents the actual parameter of the user's influence and the reasonable influence The set of parameter differences, ||W'xr|| 2 represents the sum of squares of the elements in the set.
综上所述,本申请实施例提供的技术方案,通过构建影响力矩阵,并根据该影响力矩阵,计算每个用户的综合影响力参数和综合被影响力参数,然后根据该综合影响力参数和综合被影响力参数,从用户集合中选取种子用户,从而实现了综合考虑种子用户的影响力情况,合理选取种子用户,避免选取出的种子用户对单个用户过度营销,或营销不足。In summary, the technical solution provided by the embodiments of the present application constructs an influence matrix, and calculates the comprehensive influence parameter and the comprehensive influenced parameter of each user according to the influence matrix, and then calculates the comprehensive influence parameter according to the comprehensive influence parameter. And integrated the parameters of the influence, select seed users from the user set, so as to realize the comprehensive consideration of the influence of the seed users, rationally select the seed users, and avoid the selected seed users to over-market or under-market a single user.
请参考图5,其示出了本申请另一个实施例提供的种子用户的选取方法的流程图。该方法可以包括如下几个步骤(501~509):Please refer to FIG. 5, which shows a flowchart of a method for selecting a seed user according to another embodiment of the present application. The method can include the following steps (501~509):
步骤501,构建训练样本; Step 501, construct a training sample;
步骤502,获取训练样本及其对应的特征数据和标签;Step 502: Obtain training samples and their corresponding feature data and labels;
步骤503,采取合适的模型训练得到影响力计算模型; Step 503, adopt appropriate model training to obtain an influence calculation model;
步骤504,构建用户组; Step 504, construct a user group;
步骤505,获取用户组对应的特征数据;Step 505: Obtain characteristic data corresponding to the user group;
步骤506,将特征数据输入影响力计算模型; Step 506, input the feature data into the influence calculation model;
步骤507,计算得到各个用户的影响力参数;Step 507: Calculate the influence parameters of each user;
步骤508,根据贪心算法选取种子用户,该贪心算法即为上述步骤1041至步骤1046所述的选取方式;Step 508: Select seed users according to the greedy algorithm, which is the selection method described in steps 1041 to 1046;
步骤509,根据最优化算法选取种子用户,该最优化算法即为上述步骤104A至步骤104B所述的选取方式。Step 509: Select seed users according to the optimization algorithm, which is the selection method described in step 104A to step 104B.
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。The following are device embodiments of this application, which can be used to implement the method embodiments of this application. For details that are not disclosed in the device embodiments of this application, please refer to the method embodiments of this application.
请参考图6,其示出了本申请一个实施例提供的种子用户的选取装置的框图。该装置具有实现上述方法示例的功能,所述功能可以由硬件实现,也可以由硬件执行相应的软件实现。该装置可以是上文介绍的计算机设备,也可以设置在该计算机设备中。该装置700可以包括:集合数据获取模块710、用户组创建模块720、特征数据获取模块730、影响力参数确定模块740、矩阵构建模块750、种子用户选取模块760和信息存储模块770。Please refer to FIG. 6, which shows a block diagram of an apparatus for selecting a seed user according to an embodiment of the present application. The device has the function of realizing the above method example, and the function can be realized by hardware, or by hardware executing corresponding software. The device can be the computer equipment introduced above, or can be set in the computer equipment. The device 700 may include: a collection data acquisition module 710, a user group creation module 720, a characteristic data acquisition module 730, an influence parameter determination module 740, a matrix construction module 750, a seed user selection module 760, and an information storage module 770.
集合数据获取模块710,用于获取用户集合数据,所述用户集合数据包括n个用户,所 述n为大于1的整数;The collection data obtaining module 710 is configured to obtain user collection data, where the user collection data includes n users, and the n is an integer greater than 1;
用户组创建模块720,用于根据所述用户集合数据创建m个用户组,每个所述用户组包括具有关联关系的两个用户,所述m为正整数;The user group creation module 720 is configured to create m user groups according to the user collection data, each of the user groups includes two users having an association relationship, and the m is a positive integer;
特征数据获取模块730,用于对于所述m个用户组中的第i个用户组,获取所述第i个用户组的特征数据,其中,所述第i个用户组包括具有关联关系的第一用户和第二用户;The characteristic data obtaining module 730 is configured to obtain characteristic data of the i-th user group for the i-th user group in the m user groups, wherein the i-th user group includes the i-th user group having an association relationship. A user and a second user;
影响力参数确定模块740,用于根据所述第i个用户组的特征数据,确定所述第一用户相对于所述第二用户的影响力参数,所述影响力参数用于表征所述第一用户将产品成功推荐给所述第二用户的概率;The influence parameter determination module 740 is configured to determine the influence parameter of the first user relative to the second user according to the characteristic data of the i-th user group, and the influence parameter is used to characterize the first user. Probability of a user successfully recommending a product to the second user;
矩阵构建模块750,用于构建影响力矩阵,所述影响力矩阵为n行n列矩阵,其中,所述影响力矩阵中的第u行第v列的元素,表示用户u相对于用户v的影响力参数;The matrix construction module 750 is configured to construct an influence matrix, which is a matrix of n rows and n columns, wherein the element in the u-th row and v-th column in the influence matrix represents the relationship between user u and user v Influence parameters;
种子用户选取模块760,用于根据所述影响力矩阵,从所述n个用户中选取至少一个种子用户;The seed user selection module 760 is configured to select at least one seed user from the n users according to the influence matrix;
信息存储模块770,用于存储所述种子用户的用户信息。The information storage module 770 is used to store the user information of the seed user.
可选地,如图7所示,所述种子用户选取模块760包括:综合影响力参数计算子模块761,用于根据所述影响力矩阵,计算所述n个用户中每个用户的综合影响力参数,其中,所述用户u的综合影响力参数用于表征所述用户u将产品成功推荐给各个其它用户的综合概率;种子用户集合构建子模块762,用于构建种子用户集合,所述种子用户集合初始为空;用户选取子模块763,用于从非种子用户中,选取所述综合影响力参数符合条件的用户s加入所述种子用户集合中,其中,所述非种子用户是指所述n个用户中未被加入至所述种子用户集合中的用户;矩阵更新子模块764,用于将所述影响力矩阵中的各行元素的值减去第s行元素的值,得到更新后的所述影响力矩阵,其中,所述第s行元素包括所述用户s相对于各个其它用户的影响力参数;循环子模块765,若未满足所述种子用户选取的停止条件,则还用于基于更新后的所述影响力矩阵,再次从所述根据所述影响力矩阵计算所述n个用户中每个用户的综合影响力参数的步骤开始执行;种子用户确定子模块766,若满足所述种子用户选取的停止条件,则用于将所述种子用户集合中的用户确定为所述种子用户。Optionally, as shown in FIG. 7, the seed user selection module 760 includes: a comprehensive influence parameter calculation sub-module 761, configured to calculate the comprehensive influence of each of the n users according to the influence matrix Power parameter, wherein the comprehensive influence parameter of the user u is used to characterize the comprehensive probability that the user u successfully recommends the product to each other user; the seed user set construction sub-module 762 is used to construct the seed user set, the The seed user set is initially empty; the user selection submodule 763 is used to select from non-seed users, users who meet the conditions of the comprehensive influence parameter and add them to the seed user set, where the non-seed users refer to Users who are not added to the seed user set among the n users; a matrix update sub-module 764 for subtracting the value of each row element in the influence matrix from the value of the s-th row element to obtain an update The following influence matrix, wherein the s-th row element includes the influence parameter of the user s relative to each other user; the loop sub-module 765, if the stop condition selected by the seed user is not met, then It is used to start execution again from the step of calculating the comprehensive influence parameter of each of the n users according to the influence matrix based on the updated influence matrix; the seed user determination submodule 766, if If the stop condition selected by the seed user is met, it is used to determine the user in the seed user set as the seed user.
可选地,如图7所示,所述影响力参数的取值范围为[0,1];所述矩阵更新子模块764,用于:将所述影响力矩阵中的n行元素的值,分别减去所述第s行元素的值,得到计算后的n行元素的值;对于所述计算后的n行元素中值小于零的目标元素,将所述目标元素的值修改为0,得到更新后的所述影响力矩阵。Optionally, as shown in FIG. 7, the value range of the influence parameter is [0, 1]; the matrix update sub-module 764 is configured to: , Respectively subtract the value of the element in the sth row to obtain the value of the element in the n rows after calculation; for the target element whose value is less than zero among the elements in the n rows after the calculation, modify the value of the target element to 0 , Get the updated influence matrix.
可选地,如图7所示,所述种子用户选取模块760包括:综合参数计算子模块767,用于根据所述影响力矩阵,计算所述n个用户中每个用户的综合影响力参数和综合被影响力参数,其中,用户u的综合影响力参数用于表征所述用户u将产品成功推荐给各个其它用户的综合概率,所述用户u的综合被影响力参数用于表征各个其它用户将产品成功推荐给所述用户u的综合概率;种子用户选取子模块768,用于根据所述n个用户中每个用户的综合影响力参数和综合被影响力参数,从所述n个用户中选取至少一个所述种子用户。Optionally, as shown in FIG. 7, the seed user selection module 760 includes: a comprehensive parameter calculation sub-module 767, configured to calculate the comprehensive influence parameter of each of the n users according to the influence matrix And the integrated influential parameter, where the integrated influential parameter of user u is used to characterize the overall probability of the user u successfully recommending the product to each other user, and the integrated influential parameter of user u is used to characterize each other The comprehensive probability that the user successfully recommends the product to the user u; the seed user selection sub-module 768 is used to select the comprehensive influence parameters and comprehensive influence parameters of each of the n users from the n At least one of the seed users is selected from the users.
可选地,如图7所示,所述综合参数计算子模块767,用于:对于所述n个用户中的所述用户u,获取所述用户u相对于各个其它用户的影响力参数;将所述用户u相对于各个其它用户的影响力参数进行求和,得到所述用户u的综合影响力参数。Optionally, as shown in FIG. 7, the comprehensive parameter calculation submodule 767 is configured to: for the user u among the n users, obtain the influence parameter of the user u relative to each other user; Summing the influence parameters of the user u relative to each other user obtains the comprehensive influence parameters of the user u.
可选地,如图7所示,所述综合参数计算子模块767,用于:对于所述n个用户中的所述用户u,获取各个其它用户相对于所述用户u的影响力参数;将各个其它用户相对于所述用户u的影响力参数的最大值或平均值,确定为所述用户u的综合被影响力参数。Optionally, as shown in FIG. 7, the comprehensive parameter calculation submodule 767 is configured to: for the user u among the n users, obtain the influence parameters of each other user relative to the user u; The maximum or average value of the influence parameters of each other user relative to the user u is determined as the comprehensive influence parameter of the user u.
可选地,所述影响力矩阵W如下:Optionally, the influence matrix W is as follows:
Figure PCTCN2020097517-appb-000014
W pq∈[0,1];
Figure PCTCN2020097517-appb-000014
W pq ∈[0,1];
其中,Wpq为所述影响力矩阵W中的第p行第q列的元素,且所述Wpq表示用户p相对于用户q的影响力参数,所述p为小于或等于所述n的正整数,所述q为小于或等于所述n的正整数;Wherein, Wpq is the element in the p-th row and q-th column in the influence matrix W, and the Wpq represents the influence parameter of the user p relative to the user q, and the p is a positive integer less than or equal to the n , Said q is a positive integer less than or equal to said n;
如图7所示,所述种子用户选取子模块767,用于:As shown in Figure 7, the seed user selection submodule 767 is used to:
定义一个元素取值为0或1且所有元素之和为K的列向量x:Define a column vector x whose element value is 0 or 1, and the sum of all elements is K:
Figure PCTCN2020097517-appb-000015
x R∈{0,1};
Figure PCTCN2020097517-appb-000015
x R ∈{0,1};
其中,如果xR取值为1,表示用户R被选取为所述种子用户,所述R为小于或等于所述n的正整数;Wherein, if the value of xR is 1, it means that the user R is selected as the seed user, and the R is a positive integer less than or equal to the n;
定义列向量e:Define the column vector e:
Figure PCTCN2020097517-appb-000016
Figure PCTCN2020097517-appb-000016
定义列向量r:Define the column vector r:
Figure PCTCN2020097517-appb-000017
Figure PCTCN2020097517-appb-000018
Figure PCTCN2020097517-appb-000017
or
Figure PCTCN2020097517-appb-000018
基于下述公式计算所述列向量x:The column vector x is calculated based on the following formula:
Figure PCTCN2020097517-appb-000019
Figure PCTCN2020097517-appb-000019
其中,
Figure PCTCN2020097517-appb-000020
表示待求解的变量为x,且所述x的取值使得所述
Figure PCTCN2020097517-appb-000021
后面的表达式取到最大值,λ为大于或等于0的实数。
among them,
Figure PCTCN2020097517-appb-000020
Indicates that the variable to be solved is x, and the value of x is such that
Figure PCTCN2020097517-appb-000021
The following expression takes the maximum value, and λ is a real number greater than or equal to 0.
可选地,所述影响力参数确定模块740,用于:调用影响力计算模型,根据所述第i个用户组的特征数据计算所述第一用户相对于所述第二用户的影响力参数;其中,所述第i个用户组的特征数据包括:所述第一用户的特征数据、所述第二用户的特征数据以及所述第一用户和所述第二用户的关系特征数据。Optionally, the influence parameter determination module 740 is configured to: call an influence calculation model, and calculate the influence parameter of the first user relative to the second user according to the characteristic data of the i-th user group Wherein, the characteristic data of the i-th user group includes: characteristic data of the first user, characteristic data of the second user, and characteristic data of the relationship between the first user and the second user.
可选地,所述影响力计算模型的训练过程如下构建至少一个训练样本,每个所述训练样本包括一个样本用户组;获取所述训练样本的特征数据和标签,所述标签用于表征所述训练样本中的第一样本用户是否向第二样本用户成功推荐过产品;采用所述训练样本对所述影响力计算模型进行训练,得到完成训练的所述影响力计算模型。Optionally, the training process of the influence calculation model is as follows: at least one training sample is constructed, each of the training samples includes a sample user group; the feature data and labels of the training samples are obtained, and the labels are used to characterize all the training samples. Whether the first sample user in the training sample has successfully recommended a product to the second sample user; the training sample is used to train the influence calculation model to obtain the influence calculation model that has completed the training.
综上所述,本申请实施例提供的技术方案,通过根据各个用户组的特征数据,确定用户之间的影响力参数,然后根据用户之间的影响力参数,构建影响力矩阵,并根据该影响力矩阵选取种子用户,扩展了一种种子用户的选取方法。并且,本申请实施例中,影响力参数是根据有关联关系的用户组的特征数据确定的,考虑了用户之间的关联关系,深度挖掘了社交 网络关系,使得影响力参数的预测更为全面准确,解决了相关技术只考虑单个用户的特征数据,过于单一,无法准确确定种子用户的技术问题。In summary, the technical solutions provided by the embodiments of the present application determine the influence parameters between users according to the characteristic data of each user group, and then construct the influence matrix according to the influence parameters between users, and The influence matrix selects seed users, which expands a method of selecting seed users. Moreover, in the embodiment of the present application, the influence parameter is determined based on the characteristic data of the user group with the association relationship, and the association relationship between the users is considered, and the social network relationship is deeply mined, so that the prediction of the influence parameter is more comprehensive. It is accurate, and solves the technical problem that the related technology only considers the characteristic data of a single user, which is too singular to accurately determine the seed user.
另外,本申请实施例提供的技术方案,通过完成训练的影响力计算模型,根据特征数据确定用户之间的影响力参数,使得服务器能够更加简便地计算影响力参数。并且,本申请实施例中,影响力计算模型是基于历史特征数据训练得到的,从而通过该影响力计算模型,服务器可以更加真实准确地预测影响力参数,提升了种子用户的精确性。In addition, in the technical solution provided by the embodiments of the present application, the influence parameter between users is determined according to the characteristic data by completing the trained influence calculation model, so that the server can calculate the influence parameter more simply. In addition, in the embodiment of the present application, the influence calculation model is obtained by training based on historical feature data, so that through the influence calculation model, the server can predict the influence parameters more truthfully and accurately, which improves the accuracy of seed users.
需要说明的是,上述实施例提供的装置,在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that the device provided in the above embodiment, when implementing its functions, only uses the division of the above functional modules as an example. In actual applications, the above functions can be allocated by different functional modules as needed, i.e. The internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the device and method embodiments provided in the above-mentioned embodiments belong to the same conception, and the specific implementation process is detailed in the method embodiments, which will not be repeated here.
请参考图8,其示出了本申请实施例提供的计算机设备的结构框图。该计算机设备可以用于实施上述实施例中提供的种子用户的选取方法。例如,该计算机设备可以上文所述的服务器。具体来讲:Please refer to FIG. 8, which shows a structural block diagram of a computer device provided by an embodiment of the present application. The computer device can be used to implement the seed user selection method provided in the above-mentioned embodiment. For example, the computer device may be the server described above. Specifically:
该计算机设备800包括处理单元(如CPU(Central Processing Unit,中央处理器)、GPU(Graphics Processing Unit,图形处理器)和FPGA(Field Programmable Gate Array,现场可编程逻辑门阵列)等)801、包括RAM(Random-Access Memory,随机存储器)802和ROM(Read-Only Memory,只读存储器)803的系统存储器804,以及连接系统存储器804和中央处理单元801的系统总线805。该计算机设备800还包括帮助计算机设备内的各个器件之间传输信息的基本输入/输出系统(I/O系统)806,和用于存储操作系统813、应用程序814和其他程序模块815的大容量存储设备807。The computer device 800 includes a processing unit (such as a CPU (Central Processing Unit, central processing unit), GPU (Graphics Processing Unit, graphics processor), and FPGA (Field Programmable Gate Array, field programmable logic gate array), etc.) 801, including The system memory 804 of RAM (Random-Access Memory) 802 and ROM (Read-Only Memory) 803, and the system bus 805 connecting the system memory 804 and the central processing unit 801. The computer device 800 also includes a basic input/output system (I/O system) 806 that helps to transfer information between various devices in the computer device, and a large capacity for storing the operating system 813, application programs 814, and other program modules 815. Storage device 807.
该基本输入/输出系统806包括有用于显示信息的显示器808和用于用户输入信息的诸如鼠标、键盘之类的输入设备809。其中,该显示器808和输入设备809都通过连接到系统总线805的输入输出控制器810连接到中央处理单元801。该基本输入/输出系统806还可以包括输入输出控制器810以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器810还提供输出到显示屏、打印机或其他类型的输出设备。The basic input/output system 806 includes a display 808 for displaying information and an input device 809 such as a mouse and a keyboard for the user to input information. Wherein, the display 808 and the input device 809 are both connected to the central processing unit 801 through the input and output controller 810 connected to the system bus 805. The basic input/output system 806 may also include an input and output controller 810 for receiving and processing input from multiple other devices such as a keyboard, a mouse, or an electronic stylus. Similarly, the input and output controller 810 also provides output to a display screen, a printer, or other types of output devices.
该大容量存储设备807通过连接到系统总线805的大容量存储控制器(未示出)连接到中央处理单元801。该大容量存储设备807及其相关联的计算机可读介质为计算机设备800提供非易失性存储。也就是说,该大容量存储设备807可以包括诸如硬盘或者CD-ROM(Compact Disc Read-Only Memory,只读光盘)驱动器之类的计算机可读介质(未示出)。The mass storage device 807 is connected to the central processing unit 801 through a mass storage controller (not shown) connected to the system bus 805. The mass storage device 807 and its associated computer-readable medium provide non-volatile storage for the computer device 800. That is, the mass storage device 807 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM (Compact Disc Read-Only Memory) drive.
不失一般性,该计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、EPROM(Erasable Programmable Read-Only Memory,可擦写可编程只读存储器)、EEPROM(Electrically Erasable Programmable Read-Only Memory,电可擦写可编程只读存储器)、闪存或其他固态存储其技术,CD-ROM、DVD(Digital Video Disc,高密度数字视频光盘)或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知该计算机存储介质不局限于上述几种。上述的系统存储器704和大容量存储设备807可以统称为存储器。Without loss of generality, the computer-readable media may include computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storing information such as computer readable instructions, data structures, program modules or other data. Computer storage media include RAM, ROM, EPROM (Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), flash memory or Other solid-state storage technologies, such as CD-ROM, DVD (Digital Video Disc, high-density digital video disc) or other optical storage, tape cartridges, magnetic tape, disk storage or other magnetic storage devices. Of course, those skilled in the art can know that the computer storage medium is not limited to the above-mentioned types. The aforementioned system memory 704 and mass storage device 807 may be collectively referred to as a memory.
根据本申请实施例,该计算机设备800还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即计算机设备800可以通过连接在该系统总线805上的网络接口单元811连接到网络812,或者说,也可以使用网络接口单元811来连接到其他类型的网络或远程计算机系统(未示出)。According to the embodiment of the present application, the computer device 800 may also be connected to a remote computer on the network through a network such as the Internet to run. That is, the computer device 800 can be connected to the network 812 through the network interface unit 811 connected to the system bus 805, or in other words, the network interface unit 811 can also be used to connect to other types of networks or remote computer systems (not shown) .
该存储器还包括计算机程序,该计算机程序存储于存储器中,且经配置以由一个或者一 个以上处理器执行,以实现上述种子用户的选取方法。The memory also includes a computer program, which is stored in the memory and configured to be executed by one or more processors, so as to implement the above-mentioned method for selecting a seed user.
在示例性实施例中,还提供了一种非临时性计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时以实现上述种子用户的选取方法。In an exemplary embodiment, a non-transitory computer-readable storage medium is also provided, on which a computer program is stored, and the computer program is executed by a processor to implement the above-mentioned method for selecting a seed user.
可选地,该计算机可读存储介质可以包括:ROM(Read-Only Memory,只读存储器)、RAM(Random-Access Memory,随机存储器)、SSD(Solid State Drives,固态硬盘)或光盘等。其中,随机存取记忆体可以包括ReRAM(Resistance Random Access Memory,电阻式随机存取记忆体)和DRAM(Dynamic Random Access Memory,动态随机存取存储器)。Optionally, the computer-readable storage medium may include: ROM (Read-Only Memory), RAM (Random-Access Memory, random access memory), SSD (Solid State Drives, solid state hard disk), or optical disk. Among them, random access memory may include ReRAM (Resistance Random Access Memory) and DRAM (Dynamic Random Access Memory).
在示例性实施例中,还提供一种计算机程序产品,所述计算机程序产品被处理器执行时,用于实现上述种子用户的选取方法。In an exemplary embodiment, a computer program product is also provided, which is used to implement the above-mentioned seed user selection method when the computer program product is executed by a processor.
应当理解的是,在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。另外,本文中描述的步骤编号,仅示例性示出了步骤间的一种可能的执行先后顺序,在一些其它实施例中,上述步骤也可以不按照编号顺序来执行,如两个不同编号的步骤同时执行,或者两个不同编号的步骤按照与图示相反的顺序执行,本申请实施例对此不作限定。It should be understood that the "plurality" mentioned herein refers to two or more. "And/or" describes the association relationship of the associated object, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the associated objects before and after are in an "or" relationship. In addition, the numbering of the steps described in this article only exemplarily shows a possible order of execution among the steps. In some other embodiments, the above steps may also be executed out of the order of the numbers, such as two different numbers. The steps are executed at the same time, or the two steps with different numbers are executed in the reverse order from the figure, which is not limited in the embodiment of the present application.
以上所述仅为本申请的示例性实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above are only exemplary embodiments of this application and are not intended to limit this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the protection of this application. Within range.

Claims (12)

  1. 一种种子用户的选取方法,所述方法包括:A method for selecting seed users, the method includes:
    获取用户集合数据,所述用户集合数据包括n个用户,所述n为大于1的整数;Acquiring user collection data, where the user collection data includes n users, where n is an integer greater than 1;
    根据所述用户集合数据创建m个用户组,每个所述用户组包括具有关联关系的两个用户,所述m为正整数;Creating m user groups according to the user collection data, each of the user groups includes two users having an association relationship, and the m is a positive integer;
    对于所述m个用户组中的第i个用户组,获取所述第i个用户组的特征数据,其中,所述第i个用户组包括具有关联关系的第一用户和第二用户;For the i-th user group in the m user groups, acquiring characteristic data of the i-th user group, where the i-th user group includes a first user and a second user that have an association relationship;
    根据所述第i个用户组的特征数据,确定所述第一用户相对于所述第二用户的影响力参数,所述影响力参数用于表征所述第一用户将产品成功推荐给所述第二用户的概率;According to the characteristic data of the i-th user group, determine the influence parameter of the first user relative to the second user, and the influence parameter is used to characterize that the first user successfully recommends the product to the Probability of the second user;
    构建影响力矩阵,所述影响力矩阵为n行n列矩阵,其中,所述影响力矩阵中的第u行第v列的元素,表示用户u相对于用户v的影响力参数;Constructing an influence matrix, the influence matrix being a matrix of n rows and n columns, wherein the element in the u th row and the v column in the influence matrix represents the influence parameter of the user u relative to the user v;
    根据所述影响力矩阵,从所述n个用户中选取至少一个种子用户;Selecting at least one seed user from the n users according to the influence matrix;
    存储所述种子用户的用户信息。Store the user information of the seed user.
  2. 根据权利要求1所述的方法,所述根据所述影响力矩阵,从所述n个用户中选取至少一个种子用户,包括:The method according to claim 1, wherein the selecting at least one seed user from the n users according to the influence matrix comprises:
    根据所述影响力矩阵,计算所述n个用户中每个用户的综合影响力参数,其中,所述用户u的综合影响力参数用于表征所述用户u将产品成功推荐给各个其它用户的综合概率;According to the influence matrix, the comprehensive influence parameter of each of the n users is calculated, where the comprehensive influence parameter of the user u is used to characterize the success of the user u recommending the product to each other user Comprehensive probability
    构建种子用户集合,所述种子用户集合初始为空;Constructing a set of seed users, the set of seed users is initially empty;
    从非种子用户中,选取所述综合影响力参数符合条件的用户s加入所述种子用户集合中,其中,所述非种子用户是指所述n个用户中未被加入至所述种子用户集合中的用户;From the non-seed users, select the user s that meets the conditions of the comprehensive influence parameter to be added to the seed user set, where the non-seed user refers to the n users that have not been added to the seed user set Users in
    将所述影响力矩阵中的各行元素的值减去第s行元素的值,得到更新后的所述影响力矩阵,其中,所述第s行元素包括所述用户s相对于各个其它用户的影响力参数;The value of each row element in the influence matrix is subtracted from the value of the sth row element to obtain the updated influence matrix, where the sth row element includes the user s relative to each other user Influence parameters;
    若未满足所述种子用户选取的停止条件,则基于更新后的所述影响力矩阵,再次从所述根据所述影响力矩阵计算所述n个用户中每个用户的综合影响力参数的步骤开始执行;If the stop condition selected by the seed user is not satisfied, based on the updated influence matrix, the step of calculating the comprehensive influence parameter of each of the n users from the influence matrix again Begin execution;
    若满足所述种子用户选取的停止条件,则将所述种子用户集合中的用户确定为所述种子用户。If the stop condition selected by the seed user is satisfied, the user in the seed user set is determined as the seed user.
  3. 根据权利要求2所述的方法,所述影响力参数的取值范围为[0,1];The method according to claim 2, wherein the value range of the influence parameter is [0, 1];
    所述将所述影响力矩阵中的各行元素的值减去第s行元素的值,得到更新后的所述影响力矩阵,包括:The subtracting the value of the s-th row element from the value of each row element in the influence matrix to obtain the updated influence matrix includes:
    将所述影响力矩阵中的n行元素的值,分别减去所述第s行元素的值,得到计算后的n行元素的值;Subtracting the value of the element in the sth row from the value of the element in the n rows in the influence matrix to obtain the value of the element in the n row after calculation;
    对于所述计算后的n行元素中值小于零的目标元素,将所述目标元素的值修改为0,得到更新后的所述影响力矩阵。For the target element whose value is less than zero among the calculated n rows of elements, the value of the target element is modified to 0 to obtain the updated influence matrix.
  4. 根据权利要求1所述的方法,所述根据所述影响力矩阵,从所述n个用户中选取至少一个种子用户,包括:The method according to claim 1, wherein the selecting at least one seed user from the n users according to the influence matrix comprises:
    根据所述影响力矩阵,计算所述n个用户中每个用户的综合影响力参数和综合被影响力参数,其中,用户u的综合影响力参数用于表征所述用户u将产品成功推荐给各个其它用户的综合概率,所述用户u的综合被影响力参数用于表征各个其它用户将产品成功推荐给所述用户u的综合概率;According to the influence matrix, calculate the comprehensive influence parameter and the comprehensive influenced parameter of each of the n users, where the comprehensive influence parameter of user u is used to characterize that the user u successfully recommends the product to The comprehensive probability of each other user, where the comprehensive influence parameter of the user u is used to characterize the comprehensive probability of each other user successfully recommending the product to the user u;
    根据所述n个用户中每个用户的综合影响力参数和综合被影响力参数,从所述n个用户中选取至少一个所述种子用户。According to the comprehensive influence parameter and the comprehensive influence parameter of each of the n users, at least one seed user is selected from the n users.
  5. 根据权利要求4所述的方法,所述计算所述n个用户中每个用户的综合影响力参数,包括:The method according to claim 4, the calculating the comprehensive influence parameter of each of the n users includes:
    对于所述n个用户中的所述用户u,获取所述用户u相对于各个其它用户的影响力参数;For the user u among the n users, acquiring the influence parameter of the user u relative to each other user;
    将所述用户u相对于各个其它用户的影响力参数进行求和,得到所述用户u的综合影响力参数。Summing the influence parameters of the user u relative to each other user obtains the comprehensive influence parameters of the user u.
  6. 根据权利要求4所述的方法,所述计算所述n个用户中每个用户的综合被影响力参数,包括:The method according to claim 4, the calculating the comprehensive influence parameter of each of the n users includes:
    对于所述n个用户中的所述用户u,获取各个其它用户相对于所述用户u的影响力参数;For the user u among the n users, obtain the influence parameters of each other user relative to the user u;
    将各个其它用户相对于所述用户u的影响力参数的最大值或平均值,确定为所述用户u的综合被影响力参数。The maximum or average value of the influence parameters of each other user relative to the user u is determined as the comprehensive influence parameter of the user u.
  7. 根据权利要求4所述的方法,所述影响力矩阵W如下:According to the method of claim 4, the influence matrix W is as follows:
    Figure PCTCN2020097517-appb-100001
    Figure PCTCN2020097517-appb-100001
    其中,Wpq为所述影响力矩阵W中的第p行第q列的元素,且所述Wpq表示用户p相对于用户q的影响力参数,所述p为小于或等于所述n的正整数,所述q为小于或等于所述n的正整数;Wherein, Wpq is the element in the p-th row and q-th column in the influence matrix W, and the Wpq represents the influence parameter of the user p relative to the user q, and the p is a positive integer less than or equal to the n , Said q is a positive integer less than or equal to said n;
    所述根据所述n个用户中每个用户的综合影响力参数和综合被影响力参数,从所述n个用户中选取至少一个所述种子用户,包括:The selecting at least one seed user from the n users according to the comprehensive influence parameter and the comprehensive influence parameter of each of the n users includes:
    定义一个元素取值为0或1且所有元素之和为K的列向量x:Define a column vector x whose element value is 0 or 1, and the sum of all elements is K:
    Figure PCTCN2020097517-appb-100002
    Figure PCTCN2020097517-appb-100002
    其中,如果xR取值为1,表示用户R被选取为所述种子用户,所述R为小于或等于所述n的正整数;Wherein, if the value of xR is 1, it means that the user R is selected as the seed user, and the R is a positive integer less than or equal to the n;
    定义列向量e:Define the column vector e:
    Figure PCTCN2020097517-appb-100003
    Figure PCTCN2020097517-appb-100003
    定义列向量r:Define the column vector r:
    Figure PCTCN2020097517-appb-100004
    Figure PCTCN2020097517-appb-100005
    Figure PCTCN2020097517-appb-100004
    or
    Figure PCTCN2020097517-appb-100005
    基于下述公式计算所述列向量x:The column vector x is calculated based on the following formula:
    Figure PCTCN2020097517-appb-100006
    Figure PCTCN2020097517-appb-100006
    其中,
    Figure PCTCN2020097517-appb-100007
    表示待求解的变量为x,且所述x的取值使得所述
    Figure PCTCN2020097517-appb-100008
    后面的表达式取到最大值,λ为大于或等于0的实数。
    among them,
    Figure PCTCN2020097517-appb-100007
    Indicates that the variable to be solved is x, and the value of x is such that
    Figure PCTCN2020097517-appb-100008
    The following expression takes the maximum value, and λ is a real number greater than or equal to 0.
  8. 根据权利要求1至7任一项所述的方法,所述根据所述第i个用户组的特征数据,确定所述第一用户相对于所述第二用户的影响力参数,包括:The method according to any one of claims 1 to 7, wherein the determining the influence parameter of the first user relative to the second user according to the characteristic data of the i-th user group comprises:
    调用影响力计算模型,根据所述第i个用户组的特征数据计算所述第一用户相对于所述第二用户的影响力参数;Call an influence calculation model, and calculate the influence parameter of the first user relative to the second user according to the characteristic data of the i-th user group;
    其中,所述第i个用户组的特征数据包括:所述第一用户的特征数据、所述第二用户的特征数据以及所述第一用户和所述第二用户的关系特征数据。Wherein, the characteristic data of the i-th user group includes: characteristic data of the first user, characteristic data of the second user, and characteristic data of the relationship between the first user and the second user.
  9. 根据权利要求8所述的方法,所述影响力计算模型的训练过程如下:According to the method of claim 8, the training process of the influence calculation model is as follows:
    构建至少一个训练样本,每个所述训练样本包括一个样本用户组;Constructing at least one training sample, each of the training samples includes a sample user group;
    获取所述训练样本的特征数据和标签,所述标签用于表征所述训练样本中的第一样本用户是否向第二样本用户成功推荐过产品;Acquiring feature data and labels of the training samples, where the labels are used to characterize whether the first sample user in the training sample has successfully recommended a product to the second sample user;
    采用所述训练样本对所述影响力计算模型进行训练,得到完成训练的所述影响力计算模型。The training sample is used to train the influence calculation model to obtain the influence calculation model that has completed the training.
  10. 一种种子用户的选取装置,所述装置包括:A device for selecting seed users, the device comprising:
    集合数据获取模块,用于获取用户集合数据,所述用户集合数据包括n个用户,所述n为大于1的整数;The collection data acquisition module is configured to acquire user collection data, where the user collection data includes n users, and the n is an integer greater than 1;
    用户组创建模块,用于根据所述用户集合数据创建m个用户组,每个所述用户组包括具有关联关系的两个用户,所述m为正整数;A user group creation module, configured to create m user groups according to the user collection data, each of the user groups includes two users having an association relationship, and the m is a positive integer;
    特征数据获取模块,用于对于所述m个用户组中的第i个用户组,获取所述第i个用户组的特征数据,其中,所述第i个用户组包括具有关联关系的第一用户和第二用户;The characteristic data acquisition module is configured to acquire characteristic data of the i-th user group for the i-th user group in the m user groups, wherein the i-th user group includes the first user group having an association relationship. User and second user;
    影响力参数确定模块,用于根据所述第i个用户组的特征数据,确定所述第一用户相对于所述第二用户的影响力参数,所述影响力参数用于表征所述第一用户将产品成功推荐给所述第二用户的概率;The influence parameter determination module is configured to determine the influence parameter of the first user relative to the second user according to the characteristic data of the i-th user group, and the influence parameter is used to characterize the first user. The probability that the user successfully recommends the product to the second user;
    矩阵构建模块,用于构建影响力矩阵,所述影响力矩阵为n行n列矩阵,其中,所述影响力矩阵中的第u行第v列的元素,表示用户u相对于用户v的影响力参数;The matrix construction module is used to construct an influence matrix, which is a matrix with n rows and n columns, wherein the element in the u-th row and v-th column in the influence matrix represents the influence of user u relative to user v Force parameter
    种子用户选取模块,用于根据所述影响力矩阵,从所述n个用户中选取至少一个种子用户;A seed user selection module, configured to select at least one seed user from the n users according to the influence matrix;
    信息存储模块,用于存储所述种子用户的用户信息。The information storage module is used to store the user information of the seed user.
  11. 一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有计算机程序,所述计算机程序由所述处理器加载并执行以实现如权利要求1至9任一项所述的种子用户的选取方法。A computer device, the computer device comprising a processor and a memory, and a computer program is stored in the memory, and the computer program is loaded and executed by the processor to implement the method according to any one of claims 1 to 9 How to select seed users.
  12. 一种非临时性计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至9任一项所述的种子用户的选取方法。A non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for selecting a seed user according to any one of claims 1 to 9 is realized.
PCT/CN2020/097517 2019-11-25 2020-06-22 Seed user selection method, apparatus and device, and storage medium WO2021103508A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911168479.9A CN110942345B (en) 2019-11-25 2019-11-25 Seed user selection method, device, equipment and storage medium
CN201911168479.9 2019-11-25

Publications (1)

Publication Number Publication Date
WO2021103508A1 true WO2021103508A1 (en) 2021-06-03

Family

ID=69908504

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/097517 WO2021103508A1 (en) 2019-11-25 2020-06-22 Seed user selection method, apparatus and device, and storage medium

Country Status (2)

Country Link
CN (1) CN110942345B (en)
WO (1) WO2021103508A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942345B (en) * 2019-11-25 2022-02-15 北京三快在线科技有限公司 Seed user selection method, device, equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160125297A1 (en) * 2014-10-30 2016-05-05 Umm Al-Qura University System and method for solving spatiotemporal-based problems
CN106156030A (en) * 2014-09-18 2016-11-23 华为技术有限公司 The method and apparatus that in social networks, information of forecasting is propagated
CN106611100A (en) * 2015-10-19 2017-05-03 重庆邮电大学 Analysis method and device for user behaviors
CN108122168A (en) * 2016-11-28 2018-06-05 中国科学技术大学先进技术研究院 Seed node screening technique and device in social activity network
CN108537569A (en) * 2018-03-07 2018-09-14 西北大学 The advertisement sending method that interpersonal relationships perceives in online social networks
CN108765180A (en) * 2018-05-29 2018-11-06 福州大学 The overlapping community discovery method extended with seed based on influence power
CN110942345A (en) * 2019-11-25 2020-03-31 北京三快在线科技有限公司 Seed user selection method, device, equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103823888B (en) * 2014-03-07 2017-02-08 安徽融数信息科技有限责任公司 Node-closeness-based social network site friend recommendation method
CN106952166B (en) * 2016-01-07 2020-11-03 腾讯科技(深圳)有限公司 User influence estimation method and device of social platform
CN106021289A (en) * 2016-04-29 2016-10-12 天津大学 Method for establishing probability matrix decomposition model based on node user
CN109977979B (en) * 2017-12-28 2021-12-07 中国移动通信集团广东有限公司 Method and device for locating seed user, electronic equipment and storage medium
CN109242710B (en) * 2018-08-16 2022-03-11 北京交通大学 Social network node influence ordering method and system
CN110457387B (en) * 2019-08-19 2023-11-10 腾讯科技(深圳)有限公司 Method and related device applied to user tag determination in network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156030A (en) * 2014-09-18 2016-11-23 华为技术有限公司 The method and apparatus that in social networks, information of forecasting is propagated
US20160125297A1 (en) * 2014-10-30 2016-05-05 Umm Al-Qura University System and method for solving spatiotemporal-based problems
CN106611100A (en) * 2015-10-19 2017-05-03 重庆邮电大学 Analysis method and device for user behaviors
CN108122168A (en) * 2016-11-28 2018-06-05 中国科学技术大学先进技术研究院 Seed node screening technique and device in social activity network
CN108537569A (en) * 2018-03-07 2018-09-14 西北大学 The advertisement sending method that interpersonal relationships perceives in online social networks
CN108765180A (en) * 2018-05-29 2018-11-06 福州大学 The overlapping community discovery method extended with seed based on influence power
CN110942345A (en) * 2019-11-25 2020-03-31 北京三快在线科技有限公司 Seed user selection method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN110942345B (en) 2022-02-15
CN110942345A (en) 2020-03-31

Similar Documents

Publication Publication Date Title
US11593894B2 (en) Interest recommendation method, computer device, and storage medium
Hu et al. Crowdsourced POI labelling: Location-aware result inference and task assignment
CN110266745B (en) Information flow recommendation method, device, equipment and storage medium based on deep network
JP6778699B2 (en) Systems, methods and equipment for responding to comments
US9218630B2 (en) Identifying influential users of a social networking service
WO2018126953A1 (en) Seed population expanding method, device, information releasing system and storing medium
US9342624B1 (en) Determining influence across social networks
CN111177473B (en) Personnel relationship analysis method, device and readable storage medium
CN108304428A (en) Information recommendation method and device
CN113536097B (en) Recommendation method and device based on automatic feature grouping
CN108133390A (en) For predicting the method and apparatus of user behavior and computing device
WO2023036184A1 (en) Methods and systems for quantifying client contribution in federated learning
US20180032899A1 (en) Complex system architecture for sensatory data based decision-predictive profile construction and analysis
WO2021103508A1 (en) Seed user selection method, apparatus and device, and storage medium
US20170286975A1 (en) Data Infrastructure and Method for Estimating Influence Spread in Social Networks
Davoudi et al. Prediction of information diffusion in social networks using dynamic carrying capacity
US11126676B2 (en) Influence rank generation system for enterprise community using social graph
CN109977979B (en) Method and device for locating seed user, electronic equipment and storage medium
US20210209141A1 (en) System and method for thought object selection by custom filtering and computed diversification
US10313457B2 (en) Collaborative filtering in directed graph
Zatonatska et al. Forecasting the behavior of target segments to activate advertising tools: case of mobile operator Vodafone Ukraine
CN112449217A (en) Method and device for pushing video
CN111092804B (en) Information recommendation method, information recommendation device, electronic equipment and storage medium
CN113934612A (en) User portrait updating method and device, storage medium and electronic equipment
CN112818241A (en) Content promotion method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20893906

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20893906

Country of ref document: EP

Kind code of ref document: A1